.net 正则词匹配_.net_Regex - Fatal编程技术网

.net 正则词匹配

.net regex

.net 正则词匹配,.net,regex,.net,Regex,如何以独立于文化的方式匹配单词而不是字母 \w匹配单词或数字，但我想忽略数字。因此，带有\w\s的“111或此”将不起作用我只想得到“或者这个”？我猜{^[A-Za-z]+$}不是解决方案，因为德语字母表中有一些额外的字母。我认为正则表达式应该是[^\d\s]+。i、 e.不是数字或空格字符。这适用于匹配单词： \b[^\d\s]+\b 细分： \b - word boundary [ - start of character class ^ - negation withi

如何以独立于文化的方式匹配单词而不是字母

\w

匹配单词或数字，但我想忽略数字。因此，带有

\w\s

的“111或此”将不起作用

我只想得到“或者这个”？我猜

{^[A-Za-z]+$}

不是解决方案，因为德语字母表中有一些额外的字母。

我认为正则表达式应该是[^\d\s]+。i、 e.不是数字或空格字符。

这适用于匹配单词：

\b[^\d\s]+\b

细分：

\b  -  word boundary
[   -  start of character class
^   -  negation within character class
\d  -  numerals
\s  -  whitespace
]   -  end of character class
+   -  repeat previous character one or more times
\b  -  word boundary

\b    -  word boundary
[     -  start of character class
\p{L} -  single code point in the category "letter"
\p{M} -  code point that is a combining mark (such as diacritics)
]     -  end of character class
+     -  repeat previous character one or more times
\b    -  word boundary

这将匹配任何由单词边界（特别是数字和空格除外）分隔的单词（因此将匹配“aa？aa！aa”之类的“单词”）

或者，如果您还想排除这些，可以使用：

\b[\p{L}\p{M}]+\b

细分：

\b  -  word boundary
[   -  start of character class
^   -  negation within character class
\d  -  numerals
\s  -  whitespace
]   -  end of character class
+   -  repeat previous character one or more times
\b  -  word boundary

\b    -  word boundary
[     -  start of character class
\p{L} -  single code point in the category "letter"
\p{M} -  code point that is a combining mark (such as diacritics)
]     -  end of character class
+     -  repeat previous character one or more times
\b    -  word boundary

我建议使用以下方法：

foundMatch = Regex.IsMatch(SubjectString, @"\b[\p{L}\p{M}]+\b");

它将只匹配所有unicode字母
尽管@Oded的答案也可能有效，但它也符合这一点：
p+ü+ü++ü++ü++ü
，这并不是一个确切的词
说明：

" \b # Assert position at a word boundary [\p{L}\p{M}] # Match a single character present in the list below # A character with the Unicode property “letter” (any kind of letter from any language) # A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)) + # Between one and unlimited times, as many times as possible, giving back as needed (greedy) \b # Assert position at a word boundary "

使用此表达式
\b[\p{L}\p{M}]+\b
。它使用不太广为人知的符号来匹配指定类别的unicode字符（代码点）。因此，
\p{L}
将匹配所有字母，
\p{M}
将匹配所有组合标记。后者是必需的，因为有时重音字符可能用两个代码点（字母本身+组合标记）编码，在这种情况下，
\p{L}
单独将只匹配其中一个

另外请注意，这是用于匹配可能包含国际字符的单词的通用表达式。例如，如果您需要一次匹配多个单词或允许单词以数字结尾，则必须相应地修改此模式。
Good call。我以前从未使用过单词边界。现在我会。：）这也将匹配“aaa？”、“aaa！”、“aaa#”等单词。@mifki-标点符号将不匹配。您需要使用除
\b
以外的内容来包含这些内容。是的，很抱歉，它将匹配整个“aa？aa！aa”作为单个单词，如下面提到的@FailedDev。它不仅将
[^\d\s]
匹配标点字符，还将匹配整个Unicode指令库中的标点字符。它还将匹配控制字符、方框图字符、丁巴字符和任何其他非数字或空白字符（当然包括字母）。我不认为这是OP的想法。应该将
或这个视为一个匹配项还是两个匹配项？我想获得模式“word1 word2”的匹配项。请注意，“mark1是1”应该为“mark1是”提供1个匹配项。另外，“我的生日是2000年8月11日”应该在“我的生日”和“生日是”（日期不应该匹配）之间进行匹配。您还需要包括\p{M} ，因为重音可能被编码为单独的代码点。好的，但为什么？你应该总是解释为什么你的解决方案在OP不起作用的时候起作用。阿兰摩尔我在对Failedev的回答的评论中解释道：“在这里，像这样的开车经过的回答是不受欢迎的。”。我也会更新我的答案。