Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中从常用词列表中删除停止词_Python_List - Fatal编程技术网

如何在python中从常用词列表中删除停止词

如何在python中从常用词列表中删除停止词,python,list,Python,List,我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示: sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), ('how', 368), ('tha

我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示:

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
非常感谢您的帮助。

您应该首先创建一组停止词,然后可以使用类似以下的方法将其过滤掉:

>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]
您应该首先创建一组停止词,然后可以使用以下类似的方法将它们过滤掉:

>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]

set将在O(1)中获得搜索结果,out\u tup将具有所需的输出

in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}

out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup

set将在O(1)中获得搜索结果,out\u tup将具有所需的输出

in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}

out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup

如果您想要一套完整的停止词,可以使用nltk中的列表,如下所示:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()
这将为您提供
语句

[('book',427),('java',289),('applications',248),('web',231),('new',218),('use',185),('development',182),('code',180),('programming',172),('application',170),('action',163),('developers',150),('features',141),('examples',139),('learn',135),('using',132),('data',131),('like',115),('build 110),('net',106),(“语言”,105)]
您可以使用
pip install nltk
获取库。然后,您可能需要首先安装停止字,如下所示:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()
这将显示一个下载实用程序,允许您按如下方式获取stopwords:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()

如果你想要一整套好的停止词,你可以使用nltk中的列表,如下所示:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()
这将为您提供
语句

[('book',427),('java',289),('applications',248),('web',231),('new',218),('use',185),('development',182),('code',180),('programming',172),('application',170),('action',163),('developers',150),('features',141),('examples',139),('learn',135),('using',132),('data',131),('like',115),('build 110),('net',106),(“语言”,105)]
您可以使用
pip install nltk
获取库。然后,您可能需要首先安装停止字,如下所示:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()
这将显示一个下载实用程序,允许您按如下方式获取stopwords:

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence  
import nltk

nltk.download()

什么是“停止词”你需要一个停止词列表,然后你可以过滤掉它们。另外,@Larissa,如果你想进行自然语言处理,我建议你检查一下
ntlk
nltk
有一个内置的列表,其中包含多种语言的数百个停止词。“停止词”的含义是什么?您需要一个停止词列表,然后可以将其过滤掉。另外,@Larissa,如果您的目的是进行自然语言处理,我建议您查看
ntlk
nltk
有一个内置的列表,其中包含数百个使用多种语言的停止词。您应该创建一个
set
O(1)
查找时间,而不是
O(n)
@acushner当然,谢谢!我已经编辑了我的答案。你应该创建一个
集合
O(1)
查找时间,而不是
O(n)
@acushner当然,谢谢!我已经编辑了我的答案