使用regex-python摆脱几个实体_Python_Regex

使用regex-python摆脱几个实体

python regex

使用regex-python摆脱几个实体,python,regex,Python,Regex,我对Regex不熟悉。给出下面的短语，我想去掉I和由于使用两个正则表达式操作而出现的额外字段 text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like Interna

我对Regex不熟悉。给出下面的短语，我想去掉I和由于使用两个正则表达式操作而出现的额外字段

text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "

比如说我希望将“国际商业机器”保留为“国际商业机器”，而不是将“资本I’s”保留为“资本I’s”，而是保留为“资本”

我使用了以下正则表达式：

re.findall('([A-Z][\w\']*(?:\s+[A-Z][\w|\']*)+)|([A-Z][\w]*)', text)

我收到的输出是

[('', 'I'),
 ('', 'Regex'),
 ('', 'How'),
 ('', 'I'),
 ("Capital I's", ''),
 ('', 'I'),
 ('', 'Capital'),
 ('International Business Machine', '')]

但是，我希望我的输出为：

[('Regex'),
 ('How'),
 ("Capital"),
 ('Capital'),
 ('International Business Machine')]

如何消除由于使用两个正则表达式操作而出现的“I”和额外字段

谢谢

只需匹配以大写字母开头的单词，后跟一个或多个单词字符，然后添加一个模式，以匹配与前一个单词相同的单词（以大写字母开头），并使该模式重复零次或多次。这样它就可以匹配像

Foo

或

Foo-Bar-Buzz

这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

只需匹配以大写字母开头的单词，后跟一个或多个单词字符，然后添加一个模式，以匹配与前一个单词（以大写字母开头）相同的以下单词，并使该模式重复零次或多次。这样它就可以匹配像

Foo

或

Foo-Bar-Buzz

这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

Foo

或

Foo-Bar-Buzz

这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

Foo

或

Foo-Bar-Buzz

这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

如果还想匹配撇号（如示例中所示），可以尝试使用：

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+

还将给出一个结果：

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号（如示例中所示），可以尝试使用：

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+

还将给出一个结果：

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号（如示例中所示），可以尝试使用：

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+

还将给出一个结果：

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号（如示例中所示），可以尝试使用：

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+

还将给出一个结果：

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

@AvinashRaj是的，是的，但似乎OP的撇号匹配是decisive@AvinashRaj是的，是的，但似乎OP的撇号匹配是decisive@AvinashRaj是的，是的，但似乎OP的撇号匹配是decisive@AvinashRaj是的，但是对于OP来说，撇号匹配似乎是决定性的

[A-Z]\B\w*

意味着

[A-Z]\w+

我认为您应该删除旧的解决方案，并为新的解决方案添加解释。

[A-Z]\B\w*

意味着

[A-Z]\w+

我认为您应该删除旧的解决方案并为新的解决方案添加解释。

[A-Z]\B\w*

意味着

[A-Z]\w+

我认为您应该删除旧的解决方案并为新的解决方案添加解释。

[A-Z]\B\w*

意味着

[A-Z]\w+

我认为您应该删除旧的解决方案，并为新的解决方案添加解释。