Javascript bookmarklet：如何获取要运行bookmarklet的页面的html？_Javascript_Data Mining_Bookmarklet

Javascript bookmarklet：如何获取要运行bookmarklet的页面的html？

javascript

Javascript bookmarklet：如何获取要运行bookmarklet的页面的html？,javascript,data-mining,bookmarklet,Javascript,Data Mining,Bookmarklet,javascript bookmarklets可以使用document对象获取选定的文本或链接。但是我如何获得整个html呢我希望得到整个html，以便能够在html标记之间检索特定数据例如，我想从本页获取以下文本兹韦克洛斯 3级终端客户没有手动删除或例外-请求无效这是一个三级考试。安弗拉根·辛德·斯韦克洛斯（Anfragen sind absolut zwecklos）通过document.documentElement.innerHTML获取html后如何从innerhtml获

javascript bookmarklets可以使用document对象获取选定的文本或链接。但是我如何获得整个html呢

我希望得到整个html，以便能够在html标记之间检索特定数据

例如，我想从本页获取以下文本

兹韦克洛斯

3级终端客户没有手动删除或例外-请求无效

这是一个三级考试。安弗拉根·辛德·斯韦克洛斯（Anfragen sind absolut zwecklos）

通过document.documentElement.innerHTML获取html后

如何从innerhtml获取上述特定文本？

您可以使用

元素标记名

获取标记名，并使用

元素.getAttribute（'attributename'）

获取该标记的任何属性。这会让你得到你想要的数据吗

document.documentElement.innerHTML

如果没有，您能给我们一个您试图获取的数据示例吗？

您可以查看我的书签。有一些特点： 1.它使用post请求，所以任何大小的数据都可以发送到服务器 2.它是使用用户从页面选择 3.它是使用boilerpipe库提取页面文本内容

<a class="button" href="javascript:function post_to_url(path,params,method){method=method||'post';var form=document.createElement('form');form.setAttribute('method', method);form.setAttribute('action',path);form.setAttribute('accept-charset','UTF-8');for(var key in params){var hiddenField=document.createElement('input');hiddenField.setAttribute('type','hidden');hiddenField.setAttribute('name',key);hiddenField.setAttribute('value',params[key]);form.appendChild(hiddenField);}document.body.appendChild(form); form.submit();};var t=((window.getSelection&&window.getSelection())||(document.getSelection&&document.getSelection())||(document.selection&&document.selection.createRange&&document.selection.createRange().text)||location.href);if(t=='')t=location.href;post_to_url('http://g-calendar.appspot.com/analyze/analyze', {withKeywords:'true', message:t, sumSize:5, return_type:'list', title:document.title, url:location.href}, 'post')">
    Summarise
</a>

这并不总是与原始源代码完全相同，因为浏览器会在属性中添加引号，如果您没有添加结束标记，则会添加结束标记，等等。。获取原始文件的唯一方法是对当前页面使用xmlhttprequest。即使是XHR请求也不可信，因为页面可能由于各种原因发生了更改。。。状态更改、内容搅动等。我会遍历页面的dom，搜索您试图以这种方式获取内容的节点。我将使用XHR获取页面，然后遍历

.responseXML

<a class="button" href="javascript:function post_to_url(path,params,method){method=method||'post';var form=document.createElement('form');form.setAttribute('method', method);form.setAttribute('action',path);form.setAttribute('accept-charset','UTF-8');for(var key in params){var hiddenField=document.createElement('input');hiddenField.setAttribute('type','hidden');hiddenField.setAttribute('name',key);hiddenField.setAttribute('value',params[key]);form.appendChild(hiddenField);}document.body.appendChild(form); form.submit();};var t=((window.getSelection&&window.getSelection())||(document.getSelection&&document.getSelection())||(document.selection&&document.selection.createRange&&document.selection.createRange().text)||location.href);if(t=='')t=location.href;post_to_url('http://g-calendar.appspot.com/analyze/analyze', {withKeywords:'true', message:t, sumSize:5, return_type:'list', title:document.title, url:location.href}, 'post')">
    Summarise
</a>