带有完整webaddress的javascript正则表达式标记（文本挖掘、反向匹配）_Javascript_Regex

带有完整webaddress的javascript正则表达式标记（文本挖掘、反向匹配）

javascript regex

带有完整webaddress的javascript正则表达式标记（文本挖掘、反向匹配）,javascript,regex,Javascript,Regex,我对javascript正则表达式的反向匹配有一个问题示例文本：有些是苹果、橙色和其他颜色的文字水果：https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件文本www.address_nr2.pl；带特殊字符的最后一句 !@#$%^&*（）我的正则表达式： /(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi 我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加^（www | http:|

我对javascript正则表达式的反向匹配有一个问题

示例文本：

有些是苹果、橙色和其他颜色的文字水果：https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一件文本www.address_nr2.pl；带特殊字符的最后一句 !@#$%^&*（）

我的正则表达式：

/(www|http:|https:)+[^\s]+[\w]|[A-Z0-9]+/gmi

我想反向匹配这个正则表达式。如果我在正则表达式的第二部分中添加

（www | http:| https:）+[^\s]+[\w].[^A-Z0-9]+，我可以在没有web错误的情况下正确地反转所有内容。在这种情况下，我如何反转webaddress

“最后”我将使用google脚本

（var keywords=text.split（regex））

将所有keyworld和webaddress推送到数组中

编辑：我添加了正则表达式标志

解决方案：谢谢你Ryszard捷克语。“宁可匹配，也不要分裂。”效果完美

var关键字=text.matchAll（/（？：www | https？：/）\S*\b |[\p{L}0-9]+/gu）；关键字=数组.from（关键字，x=>x[0]）

我将A-Z改为\p{L}以捕获波兰语字母，并添加了“u”标志，因为它必须与\p{L}匹配。

而不是拆分

看

解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    www                      'www'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    http                     'http'
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ://                      '://'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

JavaScript:

const string=“一些苹果、橘子和其他水果的文字：https://www.address_nr1.com/watch?v=dQw4w9WgXcQ. 另一段文字www.address_nr2.pl；最后一句带有特殊字符！@$%^&*（）”；
const results=string.matchAll（/（？：www | https？：\/\/）\S*\b/g）；
log（Array.from（results，x=>x[0]）
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    www                      'www'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    http                     'http'
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ://                      '://'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char