如何在Python中删除具有重复模式的子字符串？_Python_Regex_String

如何在Python中删除具有重复模式的子字符串？

python regex string

如何在Python中删除具有重复模式的子字符串？,python,regex,string,Python,Regex,String,因此，我有以下清单： texts = ["Vol. 1, No. 2, 2020 other text yes bla bla", "Vol. 2, No. 2, 2020 another text", "Vol. 1, No. 1, 2020 yet another one"] 看，我想得到另一个文本，其他文本，等等，并删除“Vol.x No.x，2020”子字符串。我如何使用正则表达式来实现它？我原

因此，我有以下清单：

texts = ["Vol. 1, No. 2, 2020 other text yes bla bla",
        "Vol. 2, No. 2, 2020 another text",
        "Vol. 1, No. 1, 2020 yet another one"]

看，我想得到另一个文本，其他文本，等等，并删除“Vol.x No.x，2020”子字符串。我如何使用正则表达式来实现它？我原以为使用

{}

可以帮助我删除它，但似乎我真的不明白它是如何工作的

def remove_header_footer(text):
    pattern1 = "Vol. {}, No. {}, 2020"
    temp = text.replace(pattern1, text, "")

我犯了个错误。有人知道吗？谢谢。

考虑到Vol.和No.后面的字符串是一位数字，您可以尝试使用此模式

'Vol.\d，No.\d，2020'

。对于多个数字，可以使用\d+

import re
texts = ["Vol. 1, No. 2, 2020 other text yes bla bla",
         "Vol. 2, No. 2, 2020 another text",
         "Vol. 1, No. 1, 2020 yet another one"]
for text in texts:
    new_text = re.sub('Vol. \d, No. \d, 2020', '', texts[0])
    print(new_text)

如果路线始终相同，您可以尝试以下简单方法：

result = []
for text in texts:
    text_split = text.split(" ")
    result.append(text_split[5:])

这将在列表中的每个“；”处分开，然后在每个

空白处分开。之后，在添加到结果列表时，前5个条目将被忽略。如果愿意，您可以将列表展平：
flat_result = [item for sublist in result for item in sublist]

如果文本总是以这种方式格式化，您可能会在2020
将其拆分一次，并将最后一部分即
texts = ["Vol. 1, No. 2, 2020 other text yes bla bla",
        "Vol. 2, No. 2, 2020 another text",
        "Vol. 1, No. 1, 2020 yet another one"]
for t in texts:
    print(t.split(" 2020 ", 1)[-1])

输出
other text yes bla bla
another text
yet another one

请注意，我在Space2020空间而不是2020空间进行了拆分，并执行了一次（split

中的第二个参数为1），因此如果在进一步的测试中出现2020，则没有问题。

这里的预期输出列表是什么？1）

str.replace

不支持基于正则表达式的替换，

re.sub

用于此，2）

{}>

与数字不匹配，

\d+

与数字不匹配。