使用regex-python摆脱几个实体
我对Regex不熟悉。给出下面的短语,我想去掉I和由于使用两个正则表达式操作而出现的额外字段使用regex-python摆脱几个实体,python,regex,Python,Regex,我对Regex不熟悉。给出下面的短语,我想去掉I和由于使用两个正则表达式操作而出现的额外字段 text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like Interna
text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
比如说
我希望将“国际商业机器”保留为“国际商业机器”,而不是将“资本I’s”保留为“资本I’s”,而是保留为“资本”
我使用了以下正则表达式:
re.findall('([A-Z][\w\']*(?:\s+[A-Z][\w|\']*)+)|([A-Z][\w]*)', text)
我收到的输出是
[('', 'I'),
('', 'Regex'),
('', 'How'),
('', 'I'),
("Capital I's", ''),
('', 'I'),
('', 'Capital'),
('International Business Machine', '')]
但是,我希望我的输出为:
[('Regex'),
('How'),
("Capital"),
('Capital'),
('International Business Machine')]
如何消除由于使用两个正则表达式操作而出现的“I”和额外字段
谢谢只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词相同的单词(以大写字母开头),并使该模式重复零次或多次。这样它就可以匹配像
Foo
或Foo-Bar-Buzz
这样的字符串
>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']
只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
或Foo-Bar-Buzz
这样的字符串
>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']
只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
或Foo-Bar-Buzz
这样的字符串
>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']
只需匹配以大写字母开头的单词,后跟一个或多个单词字符,然后添加一个模式,以匹配与前一个单词(以大写字母开头)相同的以下单词,并使该模式重复零次或多次。这样它就可以匹配像
Foo
或Foo-Bar-Buzz
这样的字符串
>>> text= "I have a problem in Regex, How do I get rid of the Capital I's provided I want to retain words occurring together as logical entity with a Capital letter in the beginning of each word like International Business Machine "
>>> import re
>>> re.findall(r'\b[A-Z]\w+(?:\s+[A-Z]\w+)*', text)
['Regex', 'How', 'Capital', 'Capital', 'International Business Machine']
如果还想匹配撇号(如示例中所示),可以尝试使用:
(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:
['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']
如果还想匹配撇号(如示例中所示),可以尝试使用:
(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:
['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']
如果还想匹配撇号(如示例中所示),可以尝试使用:
(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:
['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']
如果还想匹配撇号(如示例中所示),可以尝试使用:
(?:[A-Z](?:[\w]|(?<=\w\w)\')+\s?)+
还将给出一个结果:
['Regex', 'How ', 'Capital ', 'Capital ', 'International Business Machine']
@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,是的,但似乎OP的撇号匹配是decisive@AvinashRaj是的,但是对于OP来说,撇号匹配似乎是决定性的
[A-Z]\B\w*
意味着[A-Z]\w+
我认为您应该删除旧的解决方案,并为新的解决方案添加解释。[A-Z]\B\w*
意味着[A-Z]\w+
我认为您应该删除旧的解决方案并为新的解决方案添加解释。[A-Z]\B\w*
意味着[A-Z]\w+
我认为您应该删除旧的解决方案并为新的解决方案添加解释。[A-Z]\B\w*
意味着[A-Z]\w+
我认为您应该删除旧的解决方案,并为新的解决方案添加解释。