/rashitamemo/ISBNや出版社の情報も取り込むブックマークレット

generated at 2/17/2025, 1:31:15 AM
ISBNや出版社の情報も取り込むブックマークレット
Amazonで開いている本の情報をScrapboxに取り込むためのブックマークレット
ブックマークレットとしてブラウザに登録するか、このページのスクリプトを発動するブックマークレットを登録することで利用する

2024/1/13
内容紹介が取得できない事例が増えてきた
Div構成が替わっている様子なのでそれに対応


コメントを頂いたのでAmazonからブックマークでページを作るとき内容紹介も取り込むを改修。
https://twitter.com/yoshinon/status/1058814885197271040
> ありがとうございます！！
> ものすごくありがたいです。
> これ、出版社とかISBNも入るようにするのって、難しいですか？

まずは、ISBNの処理から。

要素から取得が面倒だと思ったのだが、そういえばブクログのブックマークレットを思い出した。
 getElementById('ASIN') で、ASIN（アマゾンの商品管理番号）を取得している。
しかし、ページ内にはそういう要素はなかったはず。
ソースを覗いてみて、 id="ASIN" で検索。

隠し要素があるようだ。
紙版とKindel版で少し違う。
紙版
Kindle版

が、両方ASINに関する要素はある。紙はIDが振ってあって、Kindleはnameが割り当てられている。それぞれで処理を切り分ければOK。
getASIN.jsvar asin = document.getElementById('ASIN');//
if(asin){
	var a = 'ISBN:' + asin.value
} else{
	var asin = document.getElementsByName('ASIN.0')[0];
	var a = 'ASIN:' + asin.value
}

これは簡単。

問題は出版社名。上記の隠し要素にはパブリッシャーの情報は無い。

ソースを見ても、登場するのはmetaタグのキーワードか、ページ内の以下の部分。

そして、この要素には特別なid,class,nameは割り当てられていない。

紙版
Kindle版


ということはどこかのidを取れる要素から、正規表現等でテキストを取得する必要がありそう。

 class="bucket" は複数あり、一応これが一番最初っぽい。が、紙版と構造が違うので面倒っぽい予感。

上部要素のinner.Textを取得し（ページ全体からやると、内容紹介などに出版社という言葉があるとややこしくなるので）、そこから、出版社:hogehoge(hogehoge)というテキストを抽出できればいい。

まずは紙版で考える
getDetailText.jsvar detail = document.getElementById('detail_bullets_id');
var detailtext = detail.innerText;

抽出はmatchだろう。
 var result = detailtext.match(/出版社:.+/); 


かっこの前の部分だけを取得すれば、出版社名が抽出できる。
ただし、出版年月日も欲しい場合も考えて、とりあえずこの文字列を処理することにする。
となると、グループ化して取得した方がよい？
たとえば、 var result = detailtext.match(/出版社:(.+)(\(.+\))/); とする。
 result[1] に出版社名、 result[2] に出版年月が入る。
出版社名をリンクにする場合を考えて、パターン内に出版社:は入れないようにする
出版年月日のリンクの作り方は、 [2018] とか [2018/10] とかいろいろありそうなので、とりあえずはスルー
紙版はこれでOK。

Kinde版は、内容紹介がiframeの中にあるので、それで処理を切り分ける。
iframeprocess.jsif (!detail) {
     	var subdoc = document.getElementById("product-description-iframe").contentWindow.document;
      	var detail = subdoc.getElementById("productDetailsTable");
     }
→JavaScriptでiframe内の要素を取得する

とりあえずできた。

こういうメタ情報を、どの場所にどう並べるのか、という問題はある。
上にまとめるか
下にまとめるか
あとは日付リンクの作り方。
できるだけ使う人がカスタマイズしやすいようにコードを欠いておきたい

あと、明らかに重複する処理が出ているのでなんとかしたい。
どこかの段階で、Kindle版かどうかを分けて、その処理を一括する。
あるいは、iframe内の処理だけをまとめる

セルフパブリッシング本で問題が
版数が表記されているものがある
そもそも出版社名がないものがある
これらを切り分けた処理を考えないと

出版社名がない場合は、空白にする。
これは簡単

出版社名に版数が含まれている場合は？
どんな書き方のパターンがあるのかの実例が知りたいが、自分の本しか見つけられなかった
とりあえず、それをベースに考える。
取得した出版社名に  倉下忠憲; 1版 のように;が入っている場合が、版数が入っている場合だと想定する
 regex.test(targetText) を使うとする
 倉下忠憲; 1版 → [倉下忠憲]; 1版 
 倉下忠憲 → [倉下忠憲] 
だいたいできた。

出版年月日も、リンク無し、年だけリンク、年月をリンクのパターンを作っておいた

2019/5/2
 var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText); 
のままだとKindle版のタイトルの最後に空白文字が入るので以下に変更
 var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText.trim()); 

2021/1/4
メールにて複数人の著者がいるときに改行がべらぼーに入ってしまう部分の修正案をいただいた
大感謝
after.jsvar pub = []; //著者情報の処理
var c = document.getElementsByClassName('author');
for (g = 0; g < c.length; g++) {
    var at = c[g].innerText.replace(/\r?\n/g, '').replace(/,/, ''); // ←ココにreplace追記
    console.log(c[g].innerText);
    console.log(at);
    var pu = at.match(/\(.+\)/);
    var ct = at.replace(/\(.+\)/, '').replace(/ /g, '');
    pub.push(pu + ' [' + ct + ']');

旧版
ブックマークレットに直接登録する場合はこちら（文字数多くてエラーになるかも）
（2020/3/8に改定）
script_min.jsjavascript:(function(){var e=document.getElementById("productTitle");e||(e=document.getElementById("ebooksProductTitle"));if(e=window.prompt('Scrap "Amazon" to your scrapbox.',e.innerText.trim())){e="\u300e"+e+"\u300f";var c=document.getElementById("ASIN");c?c="ISBN:"+c.value:(c=document.getElementsByName("ASIN.0")[0],c="ASIN:"+c.value);var a=document.getElementById("detail_bullets_id");if(!a){var b=document.getElementById("product-description-iframe").contentWindow.document;a=b.getElementById("productDetailsTable")}(a=
a.innerText.match(/(\u51fa\u7248\u793e:.+)(\(.+\))/))?(a[1]=a[1].replace(/:/,":["),a[1]=a[1].match(/;/)?a[1].replace(/;/,"];"):a[1]+"]",a[2]=a[2].replace(/\((\d+\/\d+)\//,"([$1]/")+" "):a=["","",""];b=document.getElementById("productDescription");!b&&document.getElementById("product-description-iframe")&&(b=document.getElementById("product-description-iframe").contentWindow.document,b=b.getElementById("productDescription"));if(b){var d=b.getElementsByTagName("p")[0];d||(d=b.getElementsByClassName("productDescriptionWrapper")[0]);
b=d.innerText.replace(/\n/g,"\n>")}else b="";(d=document.getElementById("imageBlockContainer"))||(d=document.getElementById("ebooksImageBlockContainer"));d=d.getElementsByTagName("img")[0].getAttribute("src");var h=[],k=document.getElementsByClassName("author");for(g=0;g<k.length;g++){var f=k[g].innerText.replace(/,/,""),l=f.match(/\(.+\)/);f=f.replace(/\(.+\)/,"").replace(/ /g,"");h.push(l+" ["+f+"]")}c="["+d+" "+window.location.href+"]\n"+h.join(" ")+"\n"+a[1]+a[2]+c+"\n>"+b+"\n#\u672c\n";c=encodeURIComponent(c);
window.open("https://scrapbox.io/rashitaobj/"+encodeURIComponent(e.trim())+"?body="+c)}})();


script_min_old.jsjavascript:(function(){var p=document.getElementById("productTitle");if (!p) var p=document.getElementById("ebooksProductTitle");var title=window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText);if (!title) return;title='『'+title+'』';var asin=document.getElementById('ASIN');if(asin){var a='ISBN:' + asin.value;}else{var asin=document.getElementsByName('ASIN.0')[0],a='ASIN:' + asin.value;}var detail=document.getElementById('detail_bullets_id');if (!detail) {var subdoc=document.getElementById("product-description-iframe").contentWindow.document;var detail=subdoc.getElementById("productDetailsTable");}var detailtext=detail.innerText;var pubdata=detailtext.match(/(出版社:.+)(\(.+\))/);if (pubdata){pubdata[1]=pubdata[1].replace(/:/,':[');pubdata[1]=(pubdata[1].match(/;/)?pubdata[1].replace(/;/,'];'):pubdata[1] + ']');pubdata[2]=pubdata[2].replace(/\((\d+\/\d+)\//, '([$1]/') + ' ';}else{var pubdata=['','',''];}var d=document.getElementById("productDescription");if (!d)  {var subdoc=document.getElementById("product-description-iframe").contentWindow.document;var d=subdoc.getElementById("productDescription");}var d1=d.getElementsByTagName("p")[0];if (!d1) var d1=d.getElementsByClassName("productDescriptionWrapper")[0];var d2=d1.innerText.replace(/\n/g,'\n>');var imagecontainer=document.getElementById("imageBlockContainer");if (!imagecontainer) var imagecontainer=document.getElementById("ebooksImageBlockContainer");var image=imagecontainer.getElementsByTagName("img")[0];var imageurl=image.getAttribute("src");var pub=[];var c=document.getElementsByClassName('author');for (g=0;g < c.length;g++){var at=c[g].innerText.replace(/,/,'');var pu=at.match(/\(.+\)/);var ct=at.replace(/\(.+\)/,'').replace(/ /g,'');pub.push(pu + ' [' + ct + ']');}var lines='['+imageurl+' '+window.location.href+']\n'  + pub.join(' ')+'\n'+pubdata[1]+pubdata[2]+a+'\n>'+d2+'\n#本\n';var body=encodeURIComponent(lines);window.open('https://scrapbox.io/rashitaobj/'+encodeURIComponent(title.trim())+'?body='+body)})();

最新版（2022/6/17）ソースコードはこちら（最後の方のURLは自分のプロジェクトに書き換えてください）
2023/9/30 amazonのimgタグのidが変わっていたので対応
script.js javascript:(function(){
 	var p = document.getElementById("productTitle");//書籍のタイトルの処理
    if (!p) var p = document.getElementById("ebooksProductTitle");
 	var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText.trim());
    if (!title) return;
    title = '『'+ title +'』';
 	var asin = document.getElementById('ASIN');//ASIN番号の処理
  	if(asin){
   		var a = 'ISBN:' + asin.value;
  	}else{
   		var asin = document.getElementsByName('ASIN.0')[0],a = 'ASIN:' + asin.value;
    }
    var detail = document.getElementById('detailBullets_feature_div');//出版社と出版年月の処理
    if (!detail) {
    	var subdoc = document.getElementById("product-description-iframe").contentWindow.document;
     	var detail = subdoc.getElementById("productDetailsTable");
    }
    var detailtext = detail.innerText;
    var pubdata = detailtext.match(/(出版社  : .+) (\(.+\))/);//[1]出版社:シーアンドアール研究所,[2](2018/7/27)
    if (pubdata){
       	pubdata[1] = pubdata[1].replace(/: /,':[');//出版社名をリンクにしないならこの2行は削除する
       	pubdata[1] = (pubdata[1].match(/;/)?pubdata[1].replace(/;/,'];'):pubdata[1].trim() + ']');
        //pubdata[2] = pubdata[2] + ' ';//リンクなし
        //pubdata[2] = pubdata[2].replace(/\((\d+)\//, '([$1]/') + ' ';//年をリンクに
        pubdata[2] = pubdata[2].replace(/\((\d+\/\d+)\//, '([$1]/') + ' ';//年月をリンクに
    }else{
    	var pubdata = ['','',''];
    }
    //内容紹介の処理
    //bookDescription_feature_div
    const isbookDesc = document.getElementById("bookDescription_feature_div")
    if(isbookDesc){
    	if (isbookDesc.innerText == ""){
    		const eDiv = document.getElementById("editorialReviews_feature_div")
    		if (eDiv) d1 = eDiv.innerText.replace(/\n/g,"\n>")
    	}else{
    		var d1 = isbookDesc.innerText.replace(/\n/g,"\n>").replace("続きを読む","")
    	}
    }else{
     const probookDesc = document.getElementById("productDescription_feature_div")
	   	if (probookDesc){
 	   		var decsdocP = document.getElementsByClassName("pInfoTabCExpander-content");
    		var decsdoc = decsdocP[0];
    		var d1 = decsdoc.innerText.replace(/\n/g,"\n>")
    	}
    }
	
		//書影の処理
    var image=document.getElementById("landingImage");
    if (!image) var image = document.getElementById("ebooksImgBlkFront");
    var imageurl = image.getAttribute("src");
    
    //著者情報の処理
    var pub = [];
  	var c = document.getElementsByClassName('author');
  	for (g = 0; g < c.length ;g++){
  		var at = c[g].innerText.replace(/\r?\n/g, '').replace(/,/,'');
  		var pu = at.match(/\(.+\)/);
  		var ct = at.replace(/\(.+\)/,'').replace(/ /g,'');
  		pub.push(pu + ' [' + ct.trim() + ']');
  	}
  	var lines='['+imageurl+' '+window.location.href+']\n'  + pub.join(' ') + '\n' + pubdata[1] +  pubdata[2] + a + '\n>' + d1 + '\n#書籍名\n';//ページへの書き込み内容。ここで順番を変えれば作成されるページの内容も変わります。
    var body = encodeURIComponent(lines);
    window.open('https://scrapbox.io/rashitaobj/'+encodeURIComponent(title.trim())+'?body='+body)
 })();

最新版の縮尺版（2022/1/17）
min.jsjavascript:(function(){var c=document.getElementById("productTitle");c||(c=document.getElementById("ebooksProductTitle"));if(c=window.prompt('Scrap "Amazon" to your scrapbox.',c.innerText.trim())){c="\u300e"+c+"\u300f";var b=document.getElementById("ASIN");b?b="ISBN:"+b.value:(b=document.getElementsByName("ASIN.0")[0],b="ASIN:"+b.value);var a=document.getElementById("detailBullets_feature_div");a||(a=document.getElementById("product-description-iframe").contentWindow.document.getElementById("productDetailsTable"));
(a=a.innerText.match(/(\u51fa\u7248\u793e : .+)(\(.+\))/))?(a[1]=a[1].replace(/:/,":["),a[1]=a[1].match(/;/)?a[1].replace(/;/,"];"):a[1]+"]",a[2]=a[2].replace(/\((\d+\/\d+)\//,"([$1]/")+" "):a=["","",""];if(null!=document.getElementById("bookDescription_feature_div")){var d=document.getElementById("bookDescription_feature_div").firstElementChild.firstElementChild;d=d?d.innerText.replace(/\n/g,"\n>"):""}else d="";var e=document.getElementById("imgBlkFront");e||(e=document.getElementById("ebooksImgBlkFront"));
e=e.getAttribute("src");var h=[],k=document.getElementsByClassName("author");for(g=0;g<k.length;g++){var f=k[g].innerText.replace(/\r?\n/g,"").replace(/,/,""),l=f.match(/\(.+\)/);f=f.replace(/\(.+\)/,"").replace(/ /g,"");h.push(l+" ["+f+"]")}b="["+e+" "+window.location.href+"]\n"+h.join(" ")+"\n"+a[1]+a[2]+b+"\n>"+d+"\n#\u66f8\u7c4d\u540d\n";b=encodeURIComponent(b);window.open("https://scrapbox.io/rashitaobj/"+encodeURIComponent(c.trim())+"?body="+b)}})();

changelog
2020/8/26 
出版社名を取得するための要素名を detailBullets_feature_div に変更

あまりにコードが長いので、ScrapboxのコードブロックからJavaScriptを読み込むようにするとよいかも。

2019/3/15
AmazonのKindle版ページが微妙にタイトル表記を変えていたので、対応。

 	var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerHTML); 
を
 	var title = window.prompt('Scrap "Amazon" to your scrapbox.', p.innerText); 
に変更

2019/6/18 いくつかのページから情報を拾えていない状況を確認
たとえば、以下のAmazonページから情報を取り込めない（途中でコードが止まっている）
https://www.amazon.co.jp/再生産-〔教育・社会・文化〕-ブルデュー・ライブラリー-ピエール・ブルデュー/dp/4938661241/ref=pd_sim_14_2/355-4284581-4471750?_encoding=UTF8&pd_rd_i=4938661241&pd_rd_r=1eaf7cec-9102-11e9-b268-0bb7725c95a0&pd_rd_w=rLS14&pd_rd_wg=PyPwE&pf_rd_p=b88353e4-7ed3-4da1-bc65-341dfa3a88ce&pf_rd_r=Z5AGWYCZJ4MCETFCVBA7&psc=1&refRID=Z5AGWYCZJ4MCETFCVBA7
おそらく中古しか在庫がないので、class名などが異なるのだろう。
 productTitle はある
 id="ASIN" もある
 id="detail_bullets_id" もある
 productDescription がない
 id="imageBlockContainer"  もある
 productDescription が原因だった。
コードでは、 productDescription がなかった場合電子版のページだと判断して、その処理を行っていたが、紙版で内容紹介がない、という場合が切り分けられていなかった。

2020/4/13
「試し読み」の画像が取り込まれてしまう問題。
imgにIDが埋め込まれていたので、それを使う。
imgBlkFront
ebooksImgBlkFront
で探せばいい
old.js     var imagecontainer=document.getElementById("imageBlockContainer");//書影の処理
     if (!imagecontainer) var imagecontainer = document.getElementById("ebooksImageBlockContainer");
     var image = imagecontainer.getElementsByTagName("img")[0];
     var imageurl = image.getAttribute("src");

new.js    var image=document.getElementById("imgBlkFront");//書影の処理
    if (!image) var image = document.getElementById("ebooksImgBlkFront");
    var imageurl = image.getAttribute("src");
    

2020/4/19
内容紹介が取得されないの改修する。
旧コード
old.js     var d = document.getElementById("productDescription");//内容紹介の処理
     
     if (!d)  {
     	if (document.getElementById("product-description-iframe")){//もしKindle版なら
     		var subdoc = document.getElementById("product-description-iframe").contentWindow.document;
     		var d = subdoc.getElementById("productDescription");
     	}
     }
     if (d){//内容紹介が存在しているなら
     	var d1 = d.getElementsByTagName("p")[0];
     	if (!d1) var d1 = d.getElementsByClassName("productDescriptionWrapper")[0];
     	var d2 = d1.innerText.replace(/\n/g,'\n>');
     	}else{
     		var d2 = "";//内容紹介が空っぽの場合
新コード
new.js     var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理
     var d = decsdoc.getElementById("iframeContent");
     if (d){//内容紹介が存在しているなら
     	var d1 = d.innerText.replace(/\n/g,'\n>');
     	}else{
     		var d1 = "";//内容紹介が空っぽの場合
     	}

2020/5/9
Amazonの在庫がないマーケットプレイスの商品がうまく取りこめない。
どこかがカラなのだろう。値段が怪しいが、さて。
imgBlkFrontがどこにはいっているのか？
マーケットプレイスだと、 var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理 がエラーをはくので、フレームが存在するかどうかで場合分けすることにした。フレームがない場合の内容取得はひどく面倒なので、いったんパス。
（考え）
h2要素を取得（複数取得される）
innterTextが「商品の内容」に一致するものを順繰りに探す。
見つかったら、その中身を保存する。

2022/1/17
内容紹介が取りこめなくなった。
AmazonのHTMLが変わっている模様。
以下の部分を修正する。
old20220117.js     var isbookDesc_iframe = document.getElementById("bookDesc_iframe") != null
    	if (isbookDesc_iframe){
     	var decsdoc = document.getElementById("bookDesc_iframe").contentWindow.document;//内容紹介の処理
     	var d = decsdoc.getElementById("iframeContent");
      		if (d){//内容紹介が存在しているなら
      			var d1 = d.innerText.replace(/\n/g,'\n>');
      		}else{
      			var d1 = "";//内容紹介が空っぽの場合
      		}
     	}else{
     	var d1 = "";//内容紹介が空っぽの場合
     }
Amazonページを見ると、Div id="bookDescription_feature_div" の子要素に内容紹介がある。
紙もKindleも同じ様子
Kindleは、子要素の子要素の第一がそれ
紙版も同じっぽい。
 .firstElementChild を使う

jsjavascript:(function(d,s){ s=d.createElement('script');s.src='https://scrapbox.io/api/code/rashitamemo/ISBN%E3%82%84%E5%87%BA%E7%89%88%E7%A4%BE%E3%81%AE%E6%83%85%E5%A0%B1%E3%82%82%E5%8F%96%E3%82%8A%E8%BE%BC%E3%82%80%E3%83%96%E3%83%83%E3%82%AF%E3%83%9E%E3%83%BC%E3%82%AF%E3%83%AC%E3%83%83%E3%83%88/script.js';d.body.appendChild(s);})(document)

お仲間ページ
/noratetsu/AmazonToScrapboxブックマークレット自分用