Python 要从数据帧中删除数字并实现CountVectorizer吗

Python 要从数据帧中删除数字并实现CountVectorizer吗,python,pandas,dataframe,nlp,Python,Pandas,Dataframe,Nlp,我有以下格式的数据: author text 0 garyvee A lot of people misunderstand Gary’s message o... 1 jasonfried "I can’t remember having a goal. An actual goa... 2 biz "Tools that can create media that looks and so... 我尝试了以下方法来清理文本: text_da

我有以下格式的数据:

    author  text
0   garyvee     A lot of people misunderstand Gary’s message o...
1   jasonfried  "I can’t remember having a goal. An actual goa...
2   biz         "Tools that can create media that looks and so...

我尝试了以下方法来清理文本:

text_data.loc[:,"text"] = text_data.text.apply(lambda x : str.lower(x))
text_data.loc[:,"text"] = text_data.text.apply(lambda x : " ".join(re.findall('[\w]+',x)))
我得到了输出,但它包含数字,我不希望用于文本分析

0    a lot of people misunderstand gary s message o...
1    i can t remember having a goal an actual goal ...
2    tools that can create media that looks and sou...
Name: text, dtype: object
但在删除文本字符串中的数字时:

text_data.loc[:,"text"] = text_data.text.apply(lambda x : " ".join(re.sub('^[0-9\.]*$','',x)))
我得到了输出:

0    a l o t o f p e o p l e m i s u n d e r s t a ...
1    i c a n t r e m e m b e r h a v i n g a g o a ...
2    t o o l s t h a t c a n c r e a t e m e d i a ...
Name: text, dtype: object

如何避免呢?如何实现CountVectorizer?

我在这个阶段确实犯了错误:

text_data.loc[:,"text"] = text_data.text.apply(lambda x : " ".join(re.sub('^[0-9\.]*$','',x)))
应该是

text_data.loc[:,"text"] = text_data.text.apply(lambda x : re.sub('^[0-9\.]*$','',x))

为什么要使用
“”。join
?已删除,但文本数据中仍有数字,但现在所有单词都是离散的。您的正则表达式正确吗?手动检查您的正则表达式是否正确。'000','100','12','16','1st','20','200','20s','2nd','30s','3rd','50','5000','503c','52','57','a12zracs8z',如何删除这些单词?哦,算出了np