将包含HTML的字符串转换为句子，并使用Javascript保留分隔符_Javascript

将包含HTML的字符串转换为句子，并使用Javascript保留分隔符

javascript

将包含HTML的字符串转换为句子，并使用Javascript保留分隔符,javascript,Javascript,这是我的绳子。它包含一些HTML：第一句话。这里是第二句中的链接！第三句可能包含这样的图像，并以！？最后一句是这样的？？？我想将字符串拆分为句子（数组），保留HTML和分隔符。像这样： [0] = First sentence. [1] = Here is a <a href="http://google.com">Google</a> link in the second sentence! [2] = The third sentence might contai

这是我的绳子。它包含一些HTML：

第一句话。这里是第二句中的链接！第三句可能包含这样的图像，并以！？最后一句是这样的？？？

我想将字符串拆分为句子（数组），保留HTML和分隔符。像这样：

[0] = First sentence.
[1] = Here is a <a href="http://google.com">Google</a> link in the second sentence!
[2] = The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !?
[3] = The last sentence looks like <b>this</b>??

[0]=第一句话。
[1] =这是第二句中的链接！
[2] =第三句可能包含这样的图像，并以！？
[3] =最后一句看起来像这样？？

有人能给我建议一个方法吗？可能正在使用Regex和match

这与我所追求的非常接近，但不是真正的HTML位：

最简单的部分是解析；通过在字符串周围包装一个元素，可以很容易地做到这一点。把句子分开有点复杂；这是我第一次尝试：

var s = 'First sentence. Here is a <a href="http://google.com">Google.</a> link in the second sentence! The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !? The last sentence looks like <b>this</b>??';

var wrapper = document.createElement('div');
wrapper.innerHTML = s;

var sentences = [],
buffer = [],
re = /[^.!?]+[.!?]+/g;

[].forEach.call(wrapper.childNodes, function(node) {
  if (node.nodeType == 1) {
    buffer.push(node.outerHTML); // save html
  } else if (node.nodeType == 3) {
    var str = node.textContent; // shift sentences
    while ((match = re.exec(str)) !== null) {
      sentences.push(buffer.join('') + match);
      buffer = [];
      str = str.substr(re.lastIndex + 1);
      re.lastIndex = 0; // reset regexp
    }
    buffer.push(str);
  }
});

if (buffer.length) {
  sentences.push(buffer.join(''));
}

console.log(sentences);

var s=”第一句话。这里是第二句中的链接！第三句可能包含这样的图像，并以！？最后一句是这样的；
var wrapper=document.createElement（'div'）；
wrapper.innerHTML=s；
var语句=[]，
缓冲区=[]，
re=/[^.！？]+[.！？]+/g；
[].forEach.call（wrapper.childNodes，函数（节点）{
if（node.nodeType==1）{
buffer.push（node.outerHTML）；//保存html
}else if（node.nodeType==3）{
var str=node.textContent；//移位语句
while（（match=re.exec（str））！==null）{
push（buffer.join（“”）+匹配）；
缓冲区=[]；
str=str.substr（re.lastIndex+1）；
re.lastIndex=0；//重置regexp
}
缓冲推送（str）；
}
});
if（缓冲区长度）{
push（buffer.join（“”））；
}
控制台日志（句子）；

元素或未完成句子的每个节点都被添加到缓冲区，直到找到完整的句子；然后在结果数组前面加上前缀。

我想您的html可能是嵌套的，即包含一个包含span的p。那么除了解析它，你没有别的解决办法了。如果标签内容中有“句子分隔符”，会发生什么？你的分隔符是什么还是“？”或者说或“？”或以上所有…@Jack这些应该被忽略。我可以用phpThank为你写一个完美的工作示例。不幸的是，它有时会失败。请看一看这个例子：@suprb我忘了在每次找到一个句子时重置RegExp对象；现在应该修好了：）非常感谢你，杰克。工作很好