Python正则表达式-使用re.sub清理字符串_Python_Regex

Python正则表达式-使用re.sub清理字符串

python regex

Python正则表达式-使用re.sub清理字符串,python,regex,Python,Regex,我在使用regex sub从字符串中删除数字时遇到一些问题。输入字符串可以如下所示： "The Term' means 125 years commencing on and including 01 October 2015." "125 years commencing on 25th December 1996" "the term of 999 years from the 1st January 2011" 我想做的是删除数字和单词'years'-我也在使用DateFinder解

我在使用

regex sub

从字符串中删除数字时遇到一些问题。输入字符串可以如下所示：

"The Term' means 125 years commencing on and including 01 October 2015."

"125 years commencing on 25th December 1996"

"the term of 999 years from the 1st January 2011"

我想做的是删除数字和单词

'years'

-我也在使用

DateFinder

解析日期字符串，但是

DateFinder

将数字解释为日期-因此我想删除数字

关于删除数字和单词

'years'

的

regex

表达式，您有什么想法吗？

尝试此操作以删除数字和单词

years

：

re.sub(r'\s+\d+|\s+years', '', text)

例如：

text="The Term' means 125 years commencing on and including 01 October 2015."

那么输出将是：

"The Term' means commencing on and including October."

我想这正是你想要的：

import re

my_list = ["The Term' means 125 years commencing on and including 01 October 2015.",
"125 years commencing on 25th December 1996",
"the term of 999 years from the 1st January 2011",
]

for item in my_list:
    new_item = re.sub("\d+\syears", "", item)
    print(new_item)

结果:

The Term' means  commencing on and including 01 October 2015.
 commencing on 25th December 1996
the term of  from the 1st January 2011

请注意，您最终会得到一些额外的空白（也许您想要）？但你也可以将此添加到“清理”中：

new_item = re.sub("\s+", " ", new_item)

因为我喜欢正则表达式：new_item=re.sub（“^\s+|\s+$”，“”，new_item）

你期望的输出是什么？你尝试过什么？

re.sub（r'\d+\s+years？'，''，string）

？@FHTMitchell有什么问题？这很好。非常感谢。这在某种程度上实现了我想要实现的目标，但同时也删除了日期上的数字。有没有办法只删除单词“years”前的一个数字？啊哈，那么这应该是一个选项：

re.sub（r'\s+\d+\s+years'，''，text）

最后一行代码不就是说

new\u item.strip（）

？为了避免“清洗”的需要，您可以在年后数之前在正则表达式中添加一个

\s

。。啊，是的，一个简单的

新项目.strip（）

可能会更好。我被regex的乐趣迷住了：）这太完美了，解决了问题。不幸的是，datefinder在将数字解释为日期时有点过于活跃，所以我需要去掉与日期无关的数字。

new_item = new_item.strip()