.net 破译正则表达式

.net 破译正则表达式,.net,regex,.net,Regex,请有人帮我理解这个正则表达式,它用来匹配HTML中img标记的src属性 src=(?:(['""])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+)) src= this is easy (?:(['""])(?<src>(?:(?!\1).)*) ?: is unknown (['""]) matches either single or double

请有人帮我理解这个正则表达式,它用来匹配HTML中
img
标记的
src
属性

src=(?:(['""])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))


src=                               this is easy
(?:(['""])(?<src>(?:(?!\1).)*)     ?: is unknown (['""]) matches either single or double quotes, followed by a named group "src" that matches unknown strings
\1                                 unknown
|                                  "or"
(?<src>[^\s>]+))                   named group "src" matches one or more of line start or whitespace
src=(?:(['”)(?:(!\1)。*)\1 |(?[^\s>]+)
src=这很容易
(?:(['”“])(?(?:(!\1)。)*)?:未知(['”“])匹配单引号或双引号,后跟匹配未知字符串的命名组“src”
\1未知
|“或”
(?[^\s>]+)命名的组“src”匹配一个或多个行起始或空格
简而言之,
?:
是什么意思

因此,
(?:…)
是普通括号的非捕获版本。匹配括号内的任何正则表达式,但在执行匹配或稍后在模式中引用后,无法检索组匹配的子字符串

谢谢@embratch

\1是什么意思


最后,感叹号在这里有什么特别的意义吗?(否定?

这可能有助于您理解正则表达式

(?:(['""])((?:(?!\1).)*)\1|([^\s>]+))


1>它首先捕获组1中的任何一个
['''']
,即
(['''])

2> 然后将0与组1中未捕获的字符进行匹配,即
(?:(?!\1)。*

3> 它执行步骤2,直到与组1中捕获的匹配,即
\1

上述3个步骤类似于
(['''')[^\1]*\1

1> 它匹配
src=
之后的所有非空格、>字符,即
[^\s>]+


注意 我会使用
src=(['''').*?\1

*
是贪婪的,它尽可能地匹配

*?
是惰性的,它尽可能少地匹配

例如,考虑这个字符串<代码> Hello HyWord < /Calp>

对于正则表达式
^h.*l
输出将是
hello hi worl


对于regex
^h.*l
输出将是
hel

我使用RegexBuddy获得此输出:

Match the characters “src=” literally «src=»
Match the regular expression below «(?:(['""])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))»
   Match either the regular expression below (attempting the next alternative only if this one fails) «(['""])(?<src>(?:(?!\1).)*)\1»
      Match the regular expression below and capture its match into backreference number 1 «(['""])»
         Match a single character present in the list “'"” «['""]»
      Match the regular expression below and capture its match into backreference with name “src” «(?<src>(?:(?!\1).)*)»
         Match the regular expression below «(?:(?!\1).)*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
            Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!\1)»
               Match the same text as most recently matched by capturing group number 1 «\1»
            Match any single character that is not a line break character «.»
      Match the same text as most recently matched by capturing group number 1 «\1»
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?<src>[^\s>]+)»
      Match the regular expression below and capture its match into backreference with name “src” «(?<src>[^\s>]+)»
         Match a single character NOT present in the list below «[^\s>]+»
            Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
            A whitespace character (spaces, tabs, line breaks, etc.) «\s»
            The character “>” «>»
匹配字符“src=”字面上的«src=»
匹配下面的正则表达式«(?:(['))(?:(!\1)。*)\1 |(?[^\s>]+)»
匹配下面的正则表达式(仅当此正则表达式失败时才尝试下一个替换)«(['“”])(?(?:(?!\1)。*)\1»
匹配下面的正则表达式,并将其匹配捕获到反向引用编号1«(['“”)]»中
匹配列表“«[””“]”中的单个字符
匹配下面的正则表达式,并将其匹配捕获到名为“src”«(?:(?!\1)。*)»的反向引用中
匹配下面的正则表达式«(?:(!\1)。*»
在零次和无限次之间,尽可能多次,根据需要回馈(贪婪)«*»
断言不可能从该位置开始匹配下面的正则表达式(负前瞻)«(?!\1)»
通过捕获组号1«\1»匹配与最近匹配的相同文本
匹配不是换行符的任何单个字符«。»
通过捕获组号1«\1»匹配与最近匹配的相同文本
或匹配下面第2个正则表达式(如果该组不匹配,则整个组将失败)«(?[^\s>]+)»
匹配下面的正则表达式,并将其匹配捕获到名为“src”«([^\s>]+)»的反向引用中
匹配以下列表中不存在的单个字符«[^\s>]+»
在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)«+»
空白字符(空格、制表符、换行符等)«\s»
字符“>”«>»

<>这个正则表达式对于你所描述的非常糟糕。<代码> SRC=“< /Cord>”是一个有效的输入。

< P>例如,考虑<代码> SRC=“img.jpg”< /c>作为解析的文本< /p> 在正则表达式中,
\1
表示第一个捕获组。在这种特殊情况下,第一个捕获组是
(['“”])
。在我们的示例中,
(?:([”)(?:(!\1)。*)
节是一个非捕获组,它与
“img.jpg
匹配。特别是,
([”)
匹配任何引号字符。然后,
(?!\1)
是第一个组中匹配的引号字符的负前瞻,因此
(?:(?!\1)。)
匹配任何不是第一组匹配的引号字符的字符,
(?(?:(?!\1)。*)*
在命名捕获组中匹配结束引号字符之前的字符序列。然后以下
\1
匹配结束引号字符。

src=#匹配文字“src=”
src=      # matches literal "src="
(?:       # the ?: suppresses capturing. generally a good practice if capturing
          # is not explicitly necessary
  (['"])  # matches either ' or ", and captures what was matched in group 1
          # (because this is the first set of parentheses where capturing is not
          # suppressed)
  (?<src> # start another (named) capturing group with the name "src"
    (?:   # start non-capturing group
      (?!\1)
          # a negative lookahead, if its contents match, the lookahead causes the
          # pattern to fail
          # the \1 is a backreference and matches what was matched in capturing
          # group no. 1
    .)*   # match any character, end of non-capturing group, repeat
          # summary of this non-capturing group: for each character, check that
          # it is not the kind of quote we matched at the start. if it's not,
          # then consume it. repeat as long as possible.

  )       # end of capturing group "src"
  \1      # again a backreference to what was matched inside capturing group 1
          # i.e. match the same kind of quote that started the attribute value
|         # or
  (?<src> # again a capturing group with the name "src"
    [^\s>]+
          # match as many non-space, non-> character as possible (at least one)
  )       # end of capturing group. this case treats unquoted attribute values.
)         # end of non-capturing group (which was used to group the alternation)
(?:#the?:禁止捕获。通常,如果捕获 #没有明确的必要 (['“])#匹配“或”,并捕获组1中匹配的内容 #(因为这是第一组不需要捕获的括号 #抑制) (?#启动另一个名为“src”的(命名)捕获组 (?:#启动非捕获组 (?!\1) #负前瞻,如果其内容匹配,则前瞻会导致 #失败模式 #\1是一个反向引用,与捕获中匹配的内容相匹配 #第一组 )*#匹配任何字符,结束非捕获组,重复 #此非捕获组的摘要:对于每个角色,请检查 #这不是我们一开始匹配的报价如果不是, #然后吃掉它。尽可能长时间地重复。 )#捕获组“src”结束 \1#再次返回到捕获组1内匹配的内容 #即,匹配开始属性值的相同类型的引号 |#或 (?#还是一个名为“src”的捕获组 [^\s>]+ #匹配尽可能多的非空格、非->字符(至少一个) )#队长结束