使用regex-python摆脱几个实体

使用regex-python摆脱几个实体,python,regex,Python,Regex,我对Regex不熟悉。给出下面的短语,我想去掉I和由于使用两个正则表达式操作而出现的额外字段 text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like Interna

我对Regex不熟悉。给出下面的短语,我想去掉I和由于使用两个正则表达式操作而出现的额外字段

text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
比如说 我希望将“国际商业机器”保留为“国际商业机器”,而不是将“资本I’s”保留为“资本I’s”,而是保留为“资本”

我使用了以下正则表达式:

re.findall('([A-Z][\w\']*(?:\s+[A-Z][\w|\']*)+)|([A-Z][\w]*)', text)  
我收到的输出是

[('', 'I'),
 ('', 'Regex'),
 ('', 'How'),
 ('', 'I'),
 ("Capital I's", ''),
 ('', 'I'),
 ('', 'Capital'),
 ('International Business Machine', '')]
但是,我希望我的输出为:

[('Regex'),
 ('How'),
 ("Capital"),
 ('Capital'),
 ('International Business Machine')] 
如何消除由于使用两个正则表达式操作而出现的“I”和额外字段


谢谢

只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词相同的单词(以大写字母开头),并使该模式重复零次或多次。这样它就可以匹配像
Foo
Foo-Bar-Buzz
这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
Foo-Bar-Buzz
这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
Foo-Bar-Buzz
这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
Foo-Bar-Buzz
这样的字符串

>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']

如果还想匹配撇号(如示例中所示),可以尝试使用:

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号(如示例中所示),可以尝试使用:

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号(如示例中所示),可以尝试使用:

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

如果还想匹配撇号(如示例中所示),可以尝试使用:

(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:

['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']

@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,但是对于OP来说,撇号匹配似乎是决定性的
[A-Z]\B\w*
意味着
[A-Z]\w+
我认为您应该删除旧的解决方案,并为新的解决方案添加解释。
[A-Z]\B\w*
意味着
[A-Z]\w+
我认为您应该删除旧的解决方案并为新的解决方案添加解释。
[A-Z]\B\w*
意味着
[A-Z]\w+
我认为您应该删除旧的解决方案并为新的解决方案添加解释。
[A-Z]\B\w*
意味着
[A-Z]\w+
我认为您应该删除旧的解决方案,并为新的解决方案添加解释。