Python正则表达式-使用re.sub清理字符串
我在使用Python正则表达式-使用re.sub清理字符串,python,regex,Python,Regex,我在使用regex sub从字符串中删除数字时遇到一些问题。输入字符串可以如下所示: "The Term' means 125 years commencing on and including 01 October 2015." "125 years commencing on 25th December 1996" "the term of 999 years from the 1st January 2011" 我想做的是删除数字和单词'years'-我也在使用DateFinder解
regex sub
从字符串中删除数字时遇到一些问题。输入字符串可以如下所示:
"The Term' means 125 years commencing on and including 01 October 2015."
"125 years commencing on 25th December 1996"
"the term of 999 years from the 1st January 2011"
我想做的是删除数字和单词'years'
-我也在使用DateFinder
解析日期字符串,但是DateFinder
将数字解释为日期-因此我想删除数字
关于删除数字和单词
'years'
的regex
表达式,您有什么想法吗?尝试此操作以删除数字和单词years
:
re.sub(r'\s+\d+|\s+years', '', text)
例如:
text="The Term' means 125 years commencing on and including 01 October 2015."
那么输出将是:
"The Term' means commencing on and including October."
我想这正是你想要的:
import re
my_list = ["The Term' means 125 years commencing on and including 01 October 2015.",
"125 years commencing on 25th December 1996",
"the term of 999 years from the 1st January 2011",
]
for item in my_list:
new_item = re.sub("\d+\syears", "", item)
print(new_item)
结果:
The Term' means commencing on and including 01 October 2015.
commencing on 25th December 1996
the term of from the 1st January 2011
请注意,您最终会得到一些额外的空白(也许您想要)?但你也可以将此添加到“清理”中:
new_item = re.sub("\s+", " ", new_item)
因为我喜欢正则表达式:new_item=re.sub(“^\s+|\s+$”,“”,new_item)
你期望的输出是什么?你尝试过什么?
re.sub(r'\d+\s+years?','',string)
?@FHTMitchell有什么问题?这很好。非常感谢。这在某种程度上实现了我想要实现的目标,但同时也删除了日期上的数字。有没有办法只删除单词“years”前的一个数字?啊哈,那么这应该是一个选项:re.sub(r'\s+\d+\s+years','',text)
最后一行代码不就是说new\u item.strip()
?为了避免“清洗”的需要,您可以在年后数之前在正则表达式中添加一个\s
。。啊,是的,一个简单的新项目.strip()
可能会更好。我被regex的乐趣迷住了:)这太完美了,解决了问题。不幸的是,datefinder在将数字解释为日期时有点过于活跃,所以我需要去掉与日期无关的数字。
new_item = new_item.strip()