Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 删除像列中的单词一样的停止词_Python_Pandas_Nltk_Gensim_Stop Words - Fatal编程技术网

Python 删除像列中的单词一样的停止词

Python 删除像列中的单词一样的停止词,python,pandas,nltk,gensim,stop-words,Python,Pandas,Nltk,Gensim,Stop Words,我有一个dataframe,它有一个对象列和超过100000行,如下所示: df['words'] 0 the 1 to 2 of 3 a 4 with 5 as 6 job 7 mobil 8 market 9 think 10.... 无停止字的所需输出: df['words'] 0 way 1 http 2 internet 3 car 4 do 5 want 6 work 7 uber 8.... 有没有一种方法可以使用gensi

我有一个dataframe,它有一个对象列和超过100000行,如下所示:

    df['words']
 0 the
 1 to
 2 of
 3 a
 4 with
 5 as
 6 job
 7 mobil
 8 market
 9 think
 10....
无停止字的所需输出:

   df['words']
 0 way
 1 http
 2 internet
 3 car
 4 do
 5 want
 6 work
 7 uber
 8....
有没有一种方法可以使用gensim、spacy或nltk在一列中遍历常用的停止字

我试过:

from gensim.parsing.preprocessing import remove_stopwords
stopwords.words('english')

df['words'] = df['words'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(" ".join(x)))
但结果是:

TypeError: can only join an iterable

使用nltk删除停止词。 导入包

import pandas as pd
from nltk.corpus import stopwords
创建停止字列表

stop_words = stopwords.words('english')
stop_words[:10]
那么


什么类型是
x
?对象,但已更改为字符串。原始Columbia对象类型。您只能联接列表和iterables。您需要首先转换到列表我可以通过以下方式删除stopwords:
stop\u words=set(stopwords.words('english'))用于新单词中的单词:如果单词不在stop\u words:print(word)
如何将其放回df中的新列@mousetailRunning:
df['words'].to_list()
stop_words=stopwords.words('english')df['words']=df_freq['words'].apply(lambda x:[x.split中的单词对单词,如果单词不在stop_words中])
给出
属性错误:“int”对象没有属性“split”
错误。
df['newword'] = list(map(lambda line: list(filter(lambda word: word not in stop_words, line)), df.words))
df