带有完整webaddress的javascript正则表达式标记(文本挖掘、反向匹配)
我对javascript正则表达式的反向匹配有一个问题 示例文本: 有些是苹果、橙色和其他颜色的文字 水果:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件 文本www.address_nr2.pl;带特殊字符的最后一句 !@#$%^&*() 我的正则表达式:带有完整webaddress的javascript正则表达式标记(文本挖掘、反向匹配),javascript,regex,Javascript,Regex,我对javascript正则表达式的反向匹配有一个问题 示例文本: 有些是苹果、橙色和其他颜色的文字 水果:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件 文本www.address_nr2.pl;带特殊字符的最后一句 !@#$%^&*() 我的正则表达式: /(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi 我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加^(www | http:|
/(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi
我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加^
(www | http:| https:)+[^\s]+[\w].[^A-Z0-9]+,我可以在没有web错误的情况下正确地反转所有内容。在这种情况下,我如何反转webaddress
“最后”我将使用google脚本(var keywords=text.split(regex))
将所有keyworld和webaddress推送到数组中
编辑:我添加了正则表达式标志
解决方案:
谢谢你Ryszard捷克语。“宁可匹配,也不要分裂。”效果完美
var关键字=text.matchAll(/(?:www | https?:/)\S*\b |[\p{L}0-9]+/gu);
关键字=数组.from(关键字,x=>x[0])
我将A-Z改为\p{L}以捕获波兰语字母,并添加了“u”标志,因为它必须与\p{L}匹配。而不是拆分
看
解释
NODE EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
:// '://'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
JavaScript:
const string=“一些苹果、橘子和其他水果的文字:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一段文字www.address_nr2.pl;最后一句带有特殊字符!@$%^&*()”;
const results=string.matchAll(/(?:www | https?:\/\/)\S*\b/g);
log(Array.from(results,x=>x[0])代码>
NODE EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
:// '://'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char