Python 替换列中的非字母字符_Python_Regex_Pandas

Python 替换列中的非字母字符

python regex pandas

Python 替换列中的非字母字符,python,regex,pandas,Python,Regex,Pandas,目标是只保留单词并删除任何非字母字符我从一个包含括号内字符串的列开始 (Pdb) test['userTweets'].head() 0 [the SELU function to verify that the mean/variance is ~ 0/1... 1 [trump is really @#$#@%@#@$@# 2 [Yo Hillary! should have @*&(@#$@ Trump... 3 [When are we going to

目标是只保留单词并删除任何非字母字符

我从一个包含括号内字符串的列开始

(Pdb) test['userTweets'].head()
0    [the SELU function to verify that the mean/variance is ~ 0/1...
1    [trump is really @#$#@%@#@$@#
2    [Yo Hillary! should have @*&(@#$@ Trump...
3    [When are we going to see those memos?????...
...

因为它们包含方括号，但实际上并不是一个包含列表的列，所以我按照下面的步骤去掉方括号

test['userTweets'] = test['userTweets'].str.extract(r'\[(.*)\]')

然后我使用python正则表达式功能：

(Pdb) regex = re.compile('[^a-zA-Z]')
(Pdb) test['userTweets'] = test['userTweets'].str.replace(regex,'')

但是我得到了

***TypeError:类型为“\u sre.sre\u Pattern”的对象没有len（）

但regex已成功构建：

(Pdb) regex
<_sre.SRE_Pattern object at 0x11159f6a8>

（Pdb）正则表达式

是否有更好的方法将正则表达式函数应用于字符串列以替换/删除任何非字母字符

import string
test['userTweets'] = "".join([c for c in test['userTweets'] if c in string.ascii_letters])

我做过类似于上述的事情

您的代码看起来可能会有所不同，但您已经了解了基本思路。

尝试传递字符串模式，而不是正则表达式编译的对象。另外，将

regex=True

参数添加到

replace