如何使用javascript在没有任何html的情况下获取纯文本网页？_Javascript_Jquery_Html

如何使用javascript在没有任何html的情况下获取纯文本网页？

javascript jquery html

如何使用javascript在没有任何html的情况下获取纯文本网页？,javascript,jquery,html,Javascript,Jquery,Html,我正试图找到一种方法，使用javascript或jquery编写一个函数，从页面中删除所有html标记，并只给我这个页面的纯文本如何做到这一点？有什么想法吗？使用。IE&WebKit document.body.innerText 其他： document.body.textContent （根据Amr ElGarhy的建议）大多数js框架实现了一种交叉浏览器方式来实现这一点。这通常是这样实现的： text = document.body.textContent || document.

我正试图找到一种方法，使用javascript或jquery编写一个函数，从页面中删除所有html标记，并只给我这个页面的纯文本

如何做到这一点？有什么想法吗？

使用。

IE&WebKit

document.body.innerText

其他：

document.body.textContent

（根据Amr ElGarhy的建议）

大多数js框架实现了一种交叉浏览器方式来实现这一点。这通常是这样实现的：

text = document.body.textContent || document.body.innerText;

jQuery(document.body).text();

WebKit似乎使用

textContent

保留了一些格式，而使用innerText剥离了所有内容。

这取决于您希望保留多少格式。但使用jQuery，您可以这样做：

text = document.body.textContent || document.body.innerText;

jQuery(document.body).text();

我将使用：

<script language="javascript" type="text/javascript" src="http://code.jquery.com/jquery-1.4.2.js"></script>
<script type="text/javascript">
    jQuery.fn.stripTags = function() { return this.replaceWith( this.html().replace(/<\/?[^>]+>/gi, '') ); };
    jQuery('head').stripTags();

    $(document).ready(function() {
        $("img").each(function() {
            jQuery(this).remove();
        });
    });
</script>


jQuery.fn.stripTags=function（）{返回this.replaceWith（this.html（）.replace（/]+>/gi'，）；}；
jQuery（'head'）.stripTags（）；
$（文档）.ready（函数（）{
$（“img”）。每个（函数（）{
jQuery（this.remove（）；
});
});

这将不会发布任何样式，但会删除所有标记
这就是你想要的吗

[EDIT]现在经过编辑，包括删除图像标记[/EDIT]
textContent或innerText的唯一问题是它们可以将相邻节点的文本塞入到一起，它们之间没有任何空白
如果这很重要，您可以通过body或其他容器诅咒并以数组形式返回文本，并用空格或换行符将它们连接起来

document.deepText= function(hoo){ var A= [], tem, tx; if(hoo){ hoo= hoo.firstChild; while(hoo!= null){ if(hoo.nodeType== 3){ tx= hoo.data || ''; if(/\S/.test(tx)) A[A.length]= tx; } else A= A.concat(document.deepText(hoo)); hoo= hoo.nextSibling; } } return A; } alert(document.deepText(document.body).join(' ')) // return document.deepText(document.body).join('\n')

我必须将HTML电子邮件中的富文本转换为纯文本。以下内容在IE中对我有用（obj是一个jQuery对象）：

那么，您想要返回
文本内容的字符串吗？这看起来很有用：我认为这只适用于internet Explorer，在我的WebKit中也可以。事实上，似乎只有Firefox会出现问题。但在Opera中，当打印innerTextuse document.body.textContent在其他浏览器中时，它仍然会给我提供HTML标记。您的答案已经完成，涵盖了我想要的所有内容，感谢您不要尝试使用正则表达式解析HTML。添加
节点类型
以及4（CDATA）可能是个好主意以防有人把他们的文字包在里面。（至少jQuery是这样做的。）