Python 基于掩码删除日期子字符串
我有以下案文:Python 基于掩码删除日期子字符串,python,regex,Python,Regex,我有以下案文: Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety 我需要将日期+幻灯片替换为(点),以获得以下结果: Filling a gap. Small parts example. Padded details. Adds to safety 可能可以使用掩码来标识要删除的文本: {month} {day}, {
Filling a gap December 6, 2018 Slide 6 Small parts example. Padded details May 22, 2020 Slide 21 Adds to safety
我需要将日期+幻灯片替换为
(点),以获得以下结果:
Filling a gap. Small parts example. Padded details. Adds to safety
可能可以使用掩码来标识要删除的文本:
{month} {day}, {year} {Slide} {slide number}
我可以使用regex删除month,如下所示:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
但是我如何定义面具并把所有的东西放在一起呢?
不确定正则表达式是否是一个合适的解决方案,或者它是一个过度的解决方案。试试这个
(?:\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}\D?)?\D?(?:(?:19[7-9]\d|20\d{2})|\d{2}) Slide \d+
将日期从1到31进行匹配,使其更具体一点,然后滑动1位或更多数字 如果匹配前后的空格,并替换为一个点和一个空格,则将忽略双空格间隙 替换为
代码>
\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?) \b(?:[1-9]|[12]\d|3[01])\b,\s+\d{4} Slide \d+\s*
输出
Filling a gap. Small parts example. Padded details. Adds to safety
这里有什么问题?在几个地方添加\d+
?在上面的正则表达式正常工作后添加\s\d+,\s\d{4}\sSlide\s\d+
pat=re.compile(regex_pat)
<代码>重新分配(pat),s)
Filling a gap. Small parts example. Padded details. Adds to safety