Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/472.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
带有完整webaddress的javascript正则表达式标记(文本挖掘、反向匹配)_Javascript_Regex - Fatal编程技术网

带有完整webaddress的javascript正则表达式标记(文本挖掘、反向匹配)

带有完整webaddress的javascript正则表达式标记(文本挖掘、反向匹配),javascript,regex,Javascript,Regex,我对javascript正则表达式的反向匹配有一个问题 示例文本: 有些是苹果、橙色和其他颜色的文字 水果:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件 文本www.address_nr2.pl;带特殊字符的最后一句 !@#$%^&*() 我的正则表达式: /(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi 我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加^(www | http:|

我对javascript正则表达式的反向匹配有一个问题

示例文本:

有些是苹果、橙色和其他颜色的文字 水果:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件 文本www.address_nr2.pl;带特殊字符的最后一句 !@#$%^&*()

我的正则表达式:

/(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi
我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加
^
(www | http:| https:)+[^\s]+[\w].[^A-Z0-9]+,我可以在没有web错误的情况下正确地反转所有内容。在这种情况下,我如何反转webaddress

“最后”我将使用google脚本
(var keywords=text.split(regex))
将所有keyworld和webaddress推送到数组中

编辑:我添加了正则表达式标志

解决方案: 谢谢你Ryszard捷克语。“宁可匹配,也不要分裂。”效果完美

var关键字=text.matchAll(/(?:www | https?:/)\S*\b |[\p{L}0-9]+/gu); 关键字=数组.from(关键字,x=>x[0])

我将A-Z改为\p{L}以捕获波兰语字母,并添加了“u”标志,因为它必须与\p{L}匹配。

而不是拆分

解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    www                      'www'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    http                     'http'
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ://                      '://'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
JavaScript:

const string=“一些苹果、橘子和其他水果的文字:https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一段文字www.address_nr2.pl;最后一句带有特殊字符!@$%^&*()”;
const results=string.matchAll(/(?:www | https?:\/\/)\S*\b/g);
log(Array.from(results,x=>x[0])
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    www                      'www'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    http                     'http'
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ://                      '://'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char