使用正则表达式Python从表达式中删除字符_Python_Regex_Text

使用正则表达式Python从表达式中删除字符

python regex text

使用正则表达式Python从表达式中删除字符,python,regex,text,Python,Regex,Text,我的代码是 text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex' alphabets = set(string.ascii.lowercase) punctuation = ['!', ',', '.', ':', ';', '?'] allowed_chars = alphbets.union(punctuation, ' ') regex = re.compile('[^allowed_stri

我的代码是

text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex'
alphabets = set(string.ascii.lowercase)
punctuation = ['!', ',', '.', ':', ';', '?']
allowed_chars = alphbets.union(punctuation, ' ')
regex = re.compile('[^allowed_string]')
text = regex.sub(' ', text)

根据我的理解，上述代码应该删除任何给定文本中除小写ascii和标点符号以外的所有其他字符

但当我执行它时，结果是：

is e es s e e li    ses and redo inates t e w ole o  er se

我做错了什么？

谢谢

首先，

string.ascii.小写

无效。我想你是说

其次，您不能像这样使用带有

re.compile

的变量。它将只是一个常规字符串

这里有一个更好的解决方案

>>>import re
>>>text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex'
>>>re_cmp = re.compile("[^a-z!,.:;?]+")
>>>re_cmp.sub(' ',text)
'his eyes she eclip ses and predominates the whole of her sex.'

首先，

string.ascii.lowercase

无效。我想你是说

其次，您不能像这样使用带有

re.compile

的变量。它将只是一个常规字符串

这里有一个更好的解决方案

>>>import re
>>>text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex'
>>>re_cmp = re.compile("[^a-z!,.:;?]+")
>>>re_cmp.sub(' ',text)
'his eyes she eclip ses and predominates the whole of her sex.'

你能告诉我你的预期结果吗？你能告诉我你的预期结果吗？不需要在一个类中转义所有这些字符，另外使用一个量词，这样你就得到了

re\u cmp=re.compile（[^a-z！，.：；？]+”

）。这更具可读性和有效性+1尽管是正则表达式解决方案。@Jan感谢您的建议。感谢@Himal的解决方案，这就解决了问题，现在我也明白了我的代码在做什么。它用一个空格替换除“allowed_string”字符串（非变量）之外的所有字符。不需要在一个类中转义所有这些字符，另外使用一个量词，因此，您最终拥有

re_cmp=re.compile（[^a-z！，.：；？]+”）

。这更具可读性和有效性+1尽管是正则表达式解决方案。@Jan谢谢你的建议。谢谢你的解决方案@Himal这就解决了问题，现在我也明白了我的代码在做什么，它用一个空格替换了除“allowed_string”字符串（非变量）之外的所有字符。