Javascript 跨所有单词而不跨html_Javascript_Html

Javascript 跨所有单词而不跨html

javascript html

Javascript 跨所有单词而不跨html,javascript,html,Javascript,Html,我有一个2.2mb的html文件，是acrobat生成的纯垃圾。我需要跨越其中的每一个单词。但是我不断发现html页面开始显示部分源代码下面是一个小例子： <p class="s21" style="padding-top: 10pt;padding-left: 31pt;text-indent: 0pt;text-align: left;">CONTINGENCY TIMEL INES.. • • • • • •• • • • • • • • • • • ••

我有一个2.2mb的html文件，是acrobat生成的纯垃圾。我需要跨越其中的每一个单词。但是我不断发现html页面开始显示部分源代码

下面是一个小例子：

<p class="s21" style="padding-top: 10pt;padding-left: 31pt;text-indent: 0pt;text-align: left;">CONTINGENCY TIMEL
        INES.. • • • • • •• • • • • • • • • • • •• • • • • • ••• • •• • • • • •• • • • • •• • •<span class="s25">
        </span><span class="s26"> </span>4-<span class="s27">1</span></p>

有没有什么方法可以跨越每个单词而不必费力地避免使用html标记？

已有一种本地方法可以遍历DOM树，您应该使用。此方法允许您仅对文本节点进行筛选，就像您正在尝试的那样，不包括任何元素：

const root=document.getElementById（'root'）；
const treeWalker=document.createTreeWalker（root，NodeFilter.SHOW_TEXT，null，false）；
让单词=[]；
while（treeWalker.nextNode（））{
words=words.concat（treeWalker.currentNode.textContent.split（/（\s+/）.filter（e=>e.trim（）.length>0））；
}
控制台日志（文字）

意外事件时间
INES.••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4-1
REg-Exps使用HTML标记是错误的做法。似乎您也成功地包装了跨度
.word:hover {
    background-color: rgba(0,0,0,0.1);
}

const walkDOM = function (node, func) {
    func(node);
    node = node.firstChild;
    while(node) {
        walkDOM(node, func);
        node = node.nextSibling;

        if (node && node.nextSibling == undefined) {
            // console.log(node.innerHTML);
        }
    }
};



walkDOM(document.body, function(node) {

    if (node.nodeName == '#text') {

        let pnode = node.parentElement;
        pnode.innerHTML = pnode.innerHTML.replace(/(^|<\/?[^>]+>|\s+)([^\s<]+)/g, '$1<span class="word">$2</span>');

    }

});

• • ••• • •• • • • • •• • • • • •• • •class="s25"> class="s26"> 4-1