如何在python中从常用词列表中删除停止词
我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示:如何在python中从常用词列表中删除停止词,python,list,Python,List,我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示: sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), ('how', 368), ('tha
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
非常感谢您的帮助。您应该首先创建一组停止词,然后可以使用类似以下的方法将其过滤掉:
>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]
您应该首先创建一组停止词,然后可以使用以下类似的方法将它们过滤掉:
>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]
set将在O(1)中获得搜索结果,out\u tup将具有所需的输出
in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}
out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup
set将在O(1)中获得搜索结果,out\u tup将具有所需的输出
in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}
out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup
如果您想要一套完整的停止词,可以使用nltk中的列表,如下所示:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
这将为您提供语句
:
[('book',427),('java',289),('applications',248),('web',231),('new',218),('use',185),('development',182),('code',180),('programming',172),('application',170),('action',163),('developers',150),('features',141),('examples',139),('learn',135),('using',132),('data',131),('like',115),('build 110),('net',106),(“语言”,105)]
您可以使用pip install nltk
获取库。然后,您可能需要首先安装停止字,如下所示:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
这将显示一个下载实用程序,允许您按如下方式获取stopwords:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
如果你想要一整套好的停止词,你可以使用nltk中的列表,如下所示:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
这将为您提供语句
:
[('book',427),('java',289),('applications',248),('web',231),('new',218),('use',185),('development',182),('code',180),('programming',172),('application',170),('action',163),('developers',150),('features',141),('examples',139),('learn',135),('using',132),('data',131),('like',115),('build 110),('net',106),(“语言”,105)]
您可以使用pip install nltk
获取库。然后,您可能需要首先安装停止字,如下所示:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
这将显示一个下载实用程序,允许您按如下方式获取stopwords:
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912),
('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427),
('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289),
('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219),
('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183),
('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169),
('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132),
('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]
sentence = [(word, count) for word, count in sentence if word not in stop_words]
print sentence
import nltk
nltk.download()
什么是“停止词”你需要一个停止词列表,然后你可以过滤掉它们。另外,@Larissa,如果你想进行自然语言处理,我建议你检查一下
ntlk
nltk
有一个内置的列表,其中包含多种语言的数百个停止词。“停止词”的含义是什么?您需要一个停止词列表,然后可以将其过滤掉。另外,@Larissa,如果您的目的是进行自然语言处理,我建议您查看ntlk
nltk
有一个内置的列表,其中包含数百个使用多种语言的停止词。您应该创建一个set
,O(1)
查找时间,而不是O(n)
@acushner当然,谢谢!我已经编辑了我的答案。你应该创建一个集合,O(1)
查找时间,而不是O(n)
@acushner当然,谢谢!我已经编辑了我的答案