Python 从字符串、保留、字母表、数字和标点符号中去除特殊字符

Python 从字符串、保留、字母表、数字和标点符号中去除特殊字符,python,regex,string,Python,Regex,String,我试图清除字符串中的所有特殊字符,并保留所有其他字符,包括标点符号 mystring = "Q18. On a scale from 0 to 10 where 0 means ‘not at all interested' and 10 means ‘very interested', how interested are you in helping to address problems that affect poor people in poor countries?" 我迄

我试图清除字符串中的所有特殊字符,并保留所有其他字符,包括标点符号

mystring = "Q18. On a scale from 0 to 10 where 0 means ‘not at all interested' and 10 means ‘very interested', how interested are you in helping to address problems that affect poor people in poor countries?"
我迄今为止的努力:

newlabel = re.sub('[^A-Za-z0-9]+', ' ', newstring)
输出:

Q18 On a scale from 0 to 10 where 0 means not at all interested and 10 means very interested how interested are you in helping to address problems that affect poor people in poor countries 
Q18. On a scale from 0 to 10 where 0 means not at all interested' and 10 means very interested', how interested are you in helping to address problems that affect poor people in poor countries?
如何在我当前拥有的正则表达式中保留标点符号,或者有更好的解决方案吗?

已解决

print (newstring.decode('unicode_escape').encode('ascii','ignore'))
输出:

Q18 On a scale from 0 to 10 where 0 means not at all interested and 10 means very interested how interested are you in helping to address problems that affect poor people in poor countries 
Q18. On a scale from 0 to 10 where 0 means not at all interested' and 10 means very interested', how interested are you in helping to address problems that affect poor people in poor countries?
解决了,

print (newstring.decode('unicode_escape').encode('ascii','ignore'))
输出:

Q18 On a scale from 0 to 10 where 0 means not at all interested and 10 means very interested how interested are you in helping to address problems that affect poor people in poor countries 
Q18. On a scale from 0 to 10 where 0 means not at all interested' and 10 means very interested', how interested are you in helping to address problems that affect poor people in poor countries?

如果你需要改变的只是保留圆点,那么把它添加到正则表达式中就可以解决这个问题

re.sub('[^A-Za-z0-9\.]+', ' ', mystring)

如果你需要改变的只是保留圆点,那么把它添加到正则表达式中就可以解决这个问题

re.sub('[^A-Za-z0-9\.]+', ' ', mystring)

只需在正则表达式中的每个标点符号之前添加反斜杠…..

只需在正则表达式中的每个标点符号之前添加反斜杠…..

您真的在
非常感兴趣的“
之前有unicode字符吗?是的。不知道为什么或者如何,但它就在那里,我想把它去掉:)你真的有那个unicode字符吗。不知道为什么或如何,但它就在那里,我想摆脱它:)你有没有参考其他东西来提出解决方案?一旦你说它们是“独角兽”,我快速搜索了一下,在这里找到了答案:太好了。我把这个问题看作是那个问题的重复。这对未来的读者会很有帮助。你有没有参考其他东西来提出这个解决方案?一旦你说它们是“unicodes”,我就快速搜索了一下,在这里找到了答案:太好了。我把这个问题看作是那个问题的重复。这将对未来的读者有所帮助。