在python中，如何使用正则表达式删除括号内的文本？_Python_Regex

在python中，如何使用正则表达式删除括号内的文本？

python regex

在python中，如何使用正则表达式删除括号内的文本？,python,regex,Python,Regex,我指的是但它不起作用我如何解决我的问题 def clean_text(text): pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)' text = re.sub(pattern=pattern, repl='', string=text) pattern = '(http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+' text = re.sub

我指的是

但它不起作用

我如何解决我的问题

def clean_text(text):
    pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)' 
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = '(http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = '([ㄱ-ㅎㅏ-ㅣ]+)'  
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = '<[^>]*>'        
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = '[^\w\s]'        
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = '\([^)]*\)'  ## not working!!
    text = re.sub(pattern=pattern, repl='', string=text)
    return text   

text = '(abc_def) 좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ<H1>thank you</H1>'
clean_text(text)

def clean_文本（文本）：
模式='（[a-zA-Z0-9+-]+@[a-zA-Z0-9-]+\[a-zA-Z0-9-]+]
text=re.sub（pattern=pattern，repl=''，string=text）
模式='（http | ftp | https）：/（？：[-\w.]|）（？：%[\da-fA-F]{2}））+'
text=re.sub（pattern=pattern，repl=''，string=text）
图案([ㄱ-ㅎㅏ-ㅣ]+)'  
text=re.sub（pattern=pattern，repl=''，string=text）
模式=']*>'
text=re.sub（pattern=pattern，repl=''，string=text）
模式=“[^\w\s]”
text=re.sub（pattern=pattern，repl=''，string=text）
模式=“\（[^）]*\）”##不工作！！
text=re.sub（pattern=pattern，repl=''，string=text）
返回文本
文本='（abc_def）좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ谢谢你
清除文本（文本）

结果是abc\u def좋은글 이것도 지워조 감사합니다 谢谢

我的目标是좋은글 감사합니다 谢谢你

试试这个：

    def clean_text(text):
        pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)'
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '(http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+'
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '([ㄱ-ㅎㅏ-ㅣ]+)'
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '<[^>]*>'
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '\([^)]*\)\s'  ## not working!!
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '[^\w\s+]'
        text = re.sub(pattern=pattern, repl='', string=text)
        pattern = '\s{2,}'
        text = re.sub(pattern=pattern, repl=' ', string=text)
        return text

def clean_文本（文本）：
模式='（[a-zA-Z0-9+-]+@[a-zA-Z0-9-]+\[a-zA-Z0-9-]+]
text=re.sub（pattern=pattern，repl=''，string=text）
模式='（http | ftp | https）：/（？：[-\w.]|）（？：%[\da-fA-F]{2}））+'
text=re.sub（pattern=pattern，repl=''，string=text）
图案([ㄱ-ㅎㅏ-ㅣ]+)'
text=re.sub（pattern=pattern，repl=''，string=text）
模式=']*>'
text=re.sub（pattern=pattern，repl=''，string=text）
模式='\（[^）]*\）\s'##不工作！！
text=re.sub（pattern=pattern，repl=''，string=text）
模式=“[^\w\s+]”
text=re.sub（pattern=pattern，repl=''，string=text）
模式='\s{2，}'
text=re.sub（pattern=pattern，repl=''，string=text）
返回文本

结果将是准确的좋은글 감사합니다 谢谢

您的

[^\w\s]

re.sub删除了括号，因此最后一个正则表达式不匹配。您可以交换最后两个re.sub并使用

import re
def clean_text(text):
    pattern = '([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)' 
    text = re.sub(pattern=pattern, repl='', string=text) 
    pattern = r'(?:http|ftp|https)://(?:[-\w.]|(?:%[\da-fA-F]{2}))+' 
    text = re.sub(pattern=pattern, repl='', string=text) 
    pattern = r'[ㄱ-ㅎㅏ-ㅣ]+' 
    text = re.sub(pattern=pattern, repl='', string=text) 
    pattern = r'<[^>]*>' 
    text = re.sub(pattern=pattern, repl='', string=text)  
    pattern = r'\s*\([^)]*\)' 
    text = re.sub(pattern=pattern, repl='', string=text)
    pattern = r'[^\w\s]' 
    text = re.sub(pattern=pattern, repl='', string=text)
    return text.strip()

text = '(abc_def) 좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ<H1>thank you</H1>' 
print(clean_text(text))

重新导入
def清洁_文本（文本）：
模式='（[a-zA-Z0-9+-]+@[a-zA-Z0-9-]+\[a-zA-Z0-9-]+]
text=re.sub（pattern=pattern，repl=''，string=text）
pattern=r'（？：http | ftp | https）：/（？：[-\w.]|（？：%[\da-fA-F]{2}））+'
text=re.sub（pattern=pattern，repl=''，string=text）
图案=r'[ㄱ-ㅎㅏ-ㅣ]+' 
text=re.sub（pattern=pattern，repl=''，string=text）
模式=r']*>'
text=re.sub（pattern=pattern，repl=''，string=text）
模式=r'\s*\（[^）]*\）'
text=re.sub（pattern=pattern，repl=''，string=text）
模式=r'[^\w\s]'
text=re.sub（pattern=pattern，repl=''，string=text）
返回text.strip（）
文本='（abc_def）좋은글! (이것도 지워조) http://1234.com 감사합니다. aaa@goggle.comㅋㅋ谢谢你
打印（纯文本）

看

我建议使用原始字符串文字（注意

r'

前缀），并用

text.strip（）

去除不必要的空格。

r'\s*（[^）]*\）中的\s*
，

将删除括号前的0个或更多空格。

您的问题与预期值不匹配？您希望如何清理

文本

？请更新您的“目标”交换最后两个re.SUB。首先，使用

text=re.sub（pattern=r'\（[^）]*\），repl=''，string=text）

，然后使用

'[^\w\s]'

regex替换。非常感谢。你真是个天才！！！我在下面贴了一个答案，在手机上花时间。