Python 是否有从字符串(\u0410)中删除特殊字符的方法?
我正在尝试清理字符串,但删除特殊字符时遇到问题。已注意小写字母和删除\nPython 是否有从字符串(\u0410)中删除特殊字符的方法?,python,string,re,Python,String,Re,我正在尝试清理字符串,但删除特殊字符时遇到问题。已注意小写字母和删除\n my_string = "\u0410\u041d\u041e\u0422\u0410\u0426\u0418\u042f: \n* Lectures \u2013 20 hours \n* Workshops \u2013 8 hours (these workshops are so designed as to provoke active student participation;" 预期结
my_string = "\u0410\u041d\u041e\u0422\u0410\u0426\u0418\u042f: \n* Lectures \u2013 20 hours \n* Workshops \u2013 8 hours (these workshops are so designed as to provoke active student participation;"
预期结果:
"lectures 20 hours workshops 8 hours these workshops are so designed as to provoke active student participation"
是否有删除所有特殊字符(\u0410等)的方法?提取单词(使用正则表达式),并:
输出
lectures 20 hours workshops 8 hours these workshops are so designed as to provoke active student participation
或者更好,正如@tobias_k所建议的,首先使用lower:
result = " ".join(re.findall(r"[a-z0-9]+", my_string.lower()))
或先降低,然后仅使用a-z
result = " ".join(re.findall(r"[a-z0-9]+", my_string.lower()))