如何在python中删除两个特定单词之间的文本

如何在python中删除两个特定单词之间的文本,python,regex,analysis,Python,Regex,Analysis,我已经使用beautiful soup包解析了一个url以获取其文本。我想删除条款和条件部分中的所有文字,即“关键条款:……T&C适用”一段中的所有文字 以下是我尝试过的: import re #"text" is part of the text contained in the url text="Welcome to Company Key.

我已经使用beautiful soup包解析了一个url以获取其文本。我想删除条款和条件部分中的所有文字,即“关键条款:……T&C适用”一段中的所有文字

以下是我尝试过的:

import re

#"text" is part of the text contained in the url
text="Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"
rex=re.compile('Key\ (.*?)T&Cs.')"""to remove words between "Key" and 
"T&Cs" """
terms_and_cons=rex.findall(text)
text=re.sub("|".join(terms_and_cons)," ",text)
#I also tried: text=re.sub(terms_and_cons[0]," ",text)
print(text)
上面只保留字符串“text”不变,即使“terms”和“cons”列表不是空的。我如何才能成功删除“Key”和“T&Cs”之间的单词?请帮帮我。我已经在这段被认为是简单的代码上呆了很长一段时间了,它真的令人沮丧。谢谢。

您的正则表达式中缺少将换行符与点匹配的标志

方法1:使用re.sub

import re

text="""Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
text = rex.sub("Key T&Cs", text)
print(text)
方法2:使用组

将文本与组匹配,并从原始文本中删除该组的文本

import re

text="""Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
matches = re.search(rex, text)
text = text.replace(matches.group(1), "")
print(text)

你能在正则表达式的开头添加一个“^”符号来进行负面展望吗?这样terms n cons variables只会得到正则表达式没有过滤的东西?谢谢你的帮助。如果我有两个段落,每个段落中都写有术语和条件,一个段落之间有有用的文本呢。在这种情况下,代码也将删除它们之间的段落。如何解决这个问题?您可以创建两个捕获组:
“Key\s(.*)WORD(.*)&Cs”
,其中WORD是有用文本开始之前的某个正则表达式。然后,您可以使用goup(1)和group(2)进行两次替换,如方法2中所示