如何在python中从常用词列表中删除停止词_Python_List

如何在python中从常用词列表中删除停止词

python list

如何在python中从常用词列表中删除停止词,python,list,Python,List,我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示： sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), ('how', 368), ('tha

我想知道如何从最常用的单词列表中删除停止词。我只想得到文字。示例结构如下所示：

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

非常感谢您的帮助。

您应该首先创建一组停止词，然后可以使用类似以下的方法将其过滤掉：

>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]

您应该首先创建一组停止词，然后可以使用以下类似的方法将它们过滤掉：

>>> stopList = {'the','and','to','in'}
>>> [(word, count) for word, count in sentence if word not in stopList]

set将在O（1）中获得搜索结果，out\u tup将具有所需的输出

in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}

out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup

set将在O（1）中获得搜索结果，out\u tup将具有所需的输出

in_tup = [('the', 2112), ('and', 1914), ('to', 1505)]
stop_list = {"to","the"}

out_tup = [i for i in in_tup if i[0] not in stop_list]
print out_tup

如果您想要一套完整的停止词，可以使用nltk中的列表，如下所示：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

这将为您提供

语句

：

[（'book'，427），（'java'，289），（'applications'，248），（'web'，231），（'new'，218），（'use'，185），（'development'，182），（'code'，180），（'programming'，172），（'application'，170），（'action'，163），（'developers'，150），（'features'，141），（'examples'，139），（'learn'，135），（'using'，132），（'data'，131），（'like'，115），（'build 110），（'net'，106），（“语言”，105）]

您可以使用

pip install nltk

获取库。然后，您可能需要首先安装停止字，如下所示：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

这将显示一个下载实用程序，允许您按如下方式获取stopwords：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

如果你想要一整套好的停止词，你可以使用nltk中的列表，如下所示：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

这将为您提供

语句

：

[（'book'，427），（'java'，289），（'applications'，248），（'web'，231），（'new'，218），（'use'，185），（'development'，182），（'code'，180），（'programming'，172），（'application'，170），（'action'，163），（'developers'，150），（'features'，141），（'examples'，139），（'learn'，135），（'using'，132），（'data'，131），（'like'，115），（'build 110），（'net'，106），（“语言”，105）]

您可以使用

pip install nltk

获取库。然后，您可能需要首先安装停止字，如下所示：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

这将显示一个下载实用程序，允许您按如下方式获取stopwords：

from nltk.corpus import stopwords

stop_words = stopwords.words('english')

sentence = [('the', 2112), ('and', 1914), ('to', 1505), ('of', 1086), ('a', 986), ('you', 912), 
     ('in', 754), ('with', 549), ('is', 536), ('for', 473), ('it', 461), ('book', 427), 
     ('how', 368), ('that', 347), ('as', 304), ('on', 301), ('this', 290), ('java', 289), 
     ('s', 267), ('your', 263), ('applications', 248), ('web', 231), ('can', 219), 
     ('new', 218), ('an', 206), ('are', 197), ('will', 187), ('from', 185), ('use', 185), ('ll', 183), 
     ('development', 182), ('code', 180), ('by', 177), ('programming', 172), ('application', 170), ('or', 169), 
     ('action', 163), ('developers', 150), ('features', 141), ('examples', 139), ('learn', 135), ('using', 132), 
     ('be', 132), ('data', 131), ('more', 118), ('like', 115), ('build', 110), ('into', 109), ('net', 106), ('language', 105)]

sentence = [(word, count) for word, count in sentence if word not in stop_words]     

print sentence

import nltk

nltk.download()

什么是“停止词”你需要一个停止词列表，然后你可以过滤掉它们。另外，@Larissa，如果你想进行自然语言处理，我建议你检查一下

ntlk

nltk

有一个内置的列表，其中包含多种语言的数百个停止词。“停止词”的含义是什么？您需要一个停止词列表，然后可以将其过滤掉。另外，@Larissa，如果您的目的是进行自然语言处理，我建议您查看

ntlk

nltk

有一个内置的列表，其中包含数百个使用多种语言的停止词。您应该创建一个

set

，

O（1）

查找时间，而不是

O（n）

@acushner当然，谢谢！我已经编辑了我的答案。你应该创建一个

集合，O（1）
查找时间，而不是O（n）
@acushner当然，谢谢！我已经编辑了我的答案