Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/473.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 将字符串拆分为句子-忽略用于拆分的缩写_Javascript_Regex_String - Fatal编程技术网

Javascript 将字符串拆分为句子-忽略用于拆分的缩写

Javascript 将字符串拆分为句子-忽略用于拆分的缩写,javascript,regex,string,Javascript,Regex,String,我正试图将这个字符串拆分成句子,但我需要处理缩写词(具有固定格式x.y.作为一个单词: content = "This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr

我正试图将这个字符串拆分成句子,但我需要处理缩写词(具有固定格式
x.y.
作为一个单词:

content = "This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool."
我试过这个正则表达式:

content.replace(/([.?!])\s+(?=[A-Za-z])/g, "$1|").split("|");
但正如您所看到的,缩略语存在一些问题。由于所有缩略语的格式都是
x.y.
,因此应该可以将它们作为一个单词处理,而不必在此时拆分字符串

"This is a long string with some numbers 123.456,78 or 100.000 and e.g.", 
"some abbreviations in it, which shouldn't split the sentence."
"Sometimes there are problems, i.e.", 
"in this one.", 
"here and abbr at the end x.y..",
"cool."
结果应该是:

"This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence."
"Sometimes there are problems, i.e. in this one.", 
"here and abbr at the end x.y..",
"cool."

以您的例子为例,我通过使用以下表达式成功实现了您的目标:
(?(示例)

此表达式将查找前面没有字符和句点的句号、问号或感叹号字符


然后需要将它们替换为
字符,最后,将
替换为
\n

解决方案是匹配和捕获缩写,并使用回调构建替换:

var re=/\b(\w\.\w\.)|([.?!])\s+(?=[A-Za-z])/g;
var str='这是一个长字符串,包含一些数字123.456、78或100.000,例如其中的一些缩写,不应拆分句子。有时会出现问题,例如,在这个字符串中。此处和缩写在x.y.结尾处。酷';
变量结果=str.replace(re,函数(m,g1,g2){
返回g1?g1:g2+“\r”;
});
var arr=result.split(“\r”);

document.body.innerHTML=“”+JSON.stringify(arr,0,4)+“;
你说的字符串操作是什么意思?这是一个很好的答案。我添加了几个字符,这样它就可以捕捉到像“先生”、“夫人”、“博士”这样的想法……基本上是一个大写字母,后跟一个或两个小写字母和一个句点:“/\b(\w\.\w\.[a-Z][a-Z]{1,2}\)”([.?!])\s+(?=[a-Za-Z])/g”谢谢!正则表达式将姓名缩写视为句子。试试“你认识来自foo e.g.bar的N.B.夫人吗?她来自纽约。不!你错了。”。
/\B(\w\.\\w\.\w\.\w\.\w\.)([.?!])\s+(?=[A-Za-z])/g
效果更好,但可能还不够完美。