Javascript JS从字符串数组中提取特定字符串_Javascript_Regex_String

Javascript JS从字符串数组中提取特定字符串

javascript regex string

Javascript JS从字符串数组中提取特定字符串,javascript,regex,string,Javascript,Regex,String,我试图理解以下代码： function extractLinks(input) { var html = input.join('\n'); var regex = /<a\s+([^>]+\s+)?href\s*=\s*('([^']*)'|"([^"]*)|([^\s>]+))[^>]*>/g; var match; while (match = regex.exec(html)) { var hrefValue

我试图理解以下代码：

function extractLinks(input) {
    var html = input.join('\n');
    var regex = /<a\s+([^>]+\s+)?href\s*=\s*('([^']*)'|"([^"]*)|([^\s>]+))[^>]*>/g;
    var match;
    while (match = regex.exec(html)) {
        var hrefValue = match[3];
        if (hrefValue == undefined) {
            var hrefValue = match[4];
        }
        if (hrefValue == undefined) {
            var hrefValue = match[5];
        }
        console.log(hrefValue);
    }
}

为了理解这个正则表达式正在做什么，我在其中添加了内联注释，您可以查看。我还在这里复制它：

<a\s+            # Look for '<a' followed by whitespace
([^>]+\s+)?      # Look for anything else that isn't 'href='
                 # such as 'class=' or 'id='
href\s*=\s*      # locate the 'href=' with any whitespace around the '=' character
(
  '([^']*)'      # Look for '...'
|                # ...or...
  "([^"]*)       # Look for "..."
|                # ...or...
  ([^\s>]+)      # Look anything NOT '>' or spaces
)
[^>]*>           # Match anything else up to the closing '>'

]+\s+？#查找其他不是“href=”的内容
#例如“class=”或“id=”等
href\s*=\s*#找到在“=”字符周围有空格的“href=”
(
“（[^']*）”#寻找“…”
|#……或。。。
（[^“]*）#寻找“…”
|#……或。。。
（[^\s>]+）#查看任何内容，而不是“>”或空格
)
[^>]*>#将任何其他内容匹配到结束'>'

这只是将其分解，以便您可以看到这些部分中的每一部分都在做什么。至于你关于

匹配的问题

，我不完全理解你的问题。

好吧，谢谢你的正则表达式，我来看看。while循环的一部分是，为什么我们使用数组匹配的第三个元素，如果它没有定义，我们就使用第四个，然后是第五个。我认为这里发生的事情是，URL的某些部分被“捕获”，而这些部分不必保留。仅捕获

href=

部分的一些更改。那样的话，你可以在那一页的底部看到替代品。先生，我感谢你。

<a\s+            # Look for '<a' followed by whitespace
([^>]+\s+)?      # Look for anything else that isn't 'href='
                 # such as 'class=' or 'id='
href\s*=\s*      # locate the 'href=' with any whitespace around the '=' character
(
  '([^']*)'      # Look for '...'
|                # ...or...
  "([^"]*)       # Look for "..."
|                # ...or...
  ([^\s>]+)      # Look anything NOT '>' or spaces
)
[^>]*>           # Match anything else up to the closing '>'