Python最短正则表达式匹配似乎不能给出正确的答案,

Python最短正则表达式匹配似乎不能给出正确的答案,,python,regex,Python,Regex,用于文本 In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere <TIMEX3 tid="t5" type="DATE" value="2013-03-21">2

用于文本

In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere <TIMEX3 tid="t5" type="DATE" value="2013-03-21">24 hours</TIMEX3> - women remain an anomaly in the upper chamber.

非常感谢您对我的错误给予指导。

*?
替换为
[^>]+

import re

text = '''
In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere <TIMEX3 tid="t5" type="DATE" value="2013-03-21">24 hours</TIMEX3> - women remain an anomaly in the upper chamber.
'''
print re.sub(r"<TIMEX3 [^>]+>24 hours</TIMEX3>", "24 hours", text)
重新导入
文本='''
自乔治亚州的丽贝卡·费尔顿成为美国参议院第一位女性(宣誓就职仅24小时)以来的90年里,女性在上议院仍然是一个异类。
'''
打印re.sub(r“]+>24小时”,“24小时”,文本)
输出:

 In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere 24 hours - women remain an anomaly in the upper chamber.
自乔治亚州的丽贝卡·费尔顿(Rebecca Felton)成为美国参议院第一位女性(宣誓就职仅24小时)以来的90年间,女性在上议院仍然是一个异类。

您没有得到正确的输出,因为您已经给出了需要以错误的方式替换的子字符串

必须在子字符串结尾之间使用任意字符运算符('.')

更新代码:


我希望这会有帮助。

您的模式在结束
'
标记之前缺少
'/'
。您还可以针对仅匹配shortes标记对其进行优化:

r"<TIMEX3[^>]+?>24 hours</TIMEX3>"
仅匹配
''
之后最短的字母数

text=”““自乔治亚州的丽贝卡·费尔顿成为美国参议院第一位女性——宣誓就职仅24小时——以来的90年中,女性在上议院仍然是一个异类。”
进口稀土
r=re.sub(r“]+?>24小时”,“24小时”,文本)
印刷品(r)
输出:

In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere 24 hours - women remain an anomaly in the upper chamber.
自乔治亚州的丽贝卡·费尔顿(Rebecca Felton)成为美国参议院第一位女性(宣誓就职仅24小时)以来的90年间,女性在上议院仍然是一个异类。

输入与正则表达式不匹配?什么是
,什么是你想要的,什么是你想要的,似乎和你开始时一样,所以重点是什么?
*?
并不意味着“尽可能短的匹配”。这是一个过于简单化的说法,导致了这种误解。@Booboo:我基本上是想把24个月的时间从中剔除出来。@BarunPatra:不客气,很高兴这有帮助。请随意将答案标记为已接受,
 In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere 24 hours - women remain an anomaly in the upper chamber.
import re   #import regex library

#define sample text
text = 'In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere <TIMEX3 tid="t5" type="DATE" value="2013-03-21">24 hours</TIMEX3> - women remain an anomaly in the upper chamber.'

#performing substitution
result_text = re.sub(r"<TIMEX3 .*.>24 hours</TIMEX3>", "24 hours", text)
print(result_text)    #displaying resulting text
In 24 hours - women remain an anomaly in the upper chamber.
r"<TIMEX3[^>]+?>24 hours</TIMEX3>"
[^>]+?  
text = """In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere <TIMEX3 tid="t5" type="DATE" value="2013-03-21">24 hours</TIMEX3> - women remain an anomaly in the upper chamber."""

import re

r = re.sub(r"<TIMEX3[^>]+?>24 hours</TIMEX3>", "24 hours", text)

print(r)
In <TIMEX3 tid="t4" type="DATE" value="2013-03-21">the 90 years</TIMEX3> since Rebecca Felton of Georgia became the first woman in the United States Senate - sworn in for a mere 24 hours - women remain an anomaly in the upper chamber.