Python 使用pandas删除停止字_Python_Pandas_Dataframe_Text_Stop Words

Python 使用pandas删除停止字

python pandas dataframe text

Python 使用pandas删除停止字,python,pandas,dataframe,text,stop-words,Python,Pandas,Dataframe,Text,Stop Words,我想从数据帧的列中删除停止字。列中有需要拆分的文本例如，我的数据框如下所示： ID Text 1 eat launch with me 2 go outside have fun ID Text 1 eat launch 2 go fun 我想在文本列上应用stopword，因此应该将其拆分我试过这个： for item in cached_stop_words: if item in df_from_each_file[['text']]:

我想从数据帧的列中删除停止字。列中有需要拆分的文本

例如，我的数据框如下所示：

ID   Text
1    eat launch with me
2    go outside have fun

ID   Text
1    eat launch 
2    go fun

我想在

文本列

上应用stopword，因此应该将其拆分

我试过这个：

for item in cached_stop_words:
    if item in df_from_each_file[['text']]:
        print(item)
        df_from_each_file['text'] = df_from_each_file['text'].replace(item, '')

所以我的输出应该是这样的：

ID   Text
1    eat launch with me
2    go outside have fun

ID   Text
1    eat launch 
2    go fun

这意味着停止字已被删除。但它不能正常工作。我也尝试了相反的方法，使我的数据帧为系列，然后循环通过，但iy也没有工作

谢谢您的帮助。

replace

（本身）不适合这里，因为您想执行部分字符串替换。您需要基于正则表达式的替换

一个简单的解决方案是，当你有一个可控数量的停止词时，使用

str.replace

p = re.compile("({})".format('|'.join(map(re.escape, cached_stop_words))))
df['Text'] = df['Text'].str.lower().str.replace(p, '')

df
   ID               Text
0   1       eat launch  
1   2   outside have fun

如果绩效很重要，请使用列表

cached_stop_words = set(cached_stop_words)
df['Text'] = [' '.join([w for w in x.lower().split() if w not in cached_stop_words]) 
    for x in df['Text'].tolist()]

df
   ID              Text
0   1        eat launch
1   2  outside have fun

你对此的预期结果是什么？谢谢你的评论，我更新了这个问题：）非常感谢你，它起了作用，除了你能不能也申请更低的价格。与镁和镁一样，镁也应被同等对待