Python 删除不带NLTK的文本文件中的停止字_Python_Python 3.x

Python 删除不带NLTK的文本文件中的停止字

python python-3.x

Python 删除不带NLTK的文本文件中的停止字,python,python-3.x,Python,Python 3.x,我有两个文件：stopwords.txt和a.txt 我想从文件a.txt中的文件stopwords.txt中删除停止字，并用空格分隔我该怎么做？这就是我试图做的： def remove_stopwords(review_words): with open('stopwords.txt') as stopfile: stopwords = stopfile.read() list = stopwords.split() print(list) with open

我有两个文件：

stopwords.txt

和

a.txt

我想从文件

a.txt

中的文件

stopwords.txt

中删除停止字，并用空格分隔

我该怎么做？这就是我试图做的：

def remove_stopwords(review_words):
with open('stopwords.txt') as stopfile:
    stopwords = stopfile.read()
    list = stopwords.split()
    print(list)
    with open('a.txt') as workfile:
        read_data = workfile.read()
        data = read_data.split()
        print(data)
        for word1 in list:
            for word2 in data:
                if word1 == word2:
                    return data.remove(list)
                    print(remove_Stopwords)

提前感谢

a.txt

：

good great bad

good bad

stopwords.txt

：

good great bad

good bad

也许：

with open('a.txt','r') as f, open('stopwords.txt','r') as f2:
   a=f.read().split();b=f2.read().split()
   print(' '.join(i for i in a if i.lower() not in (x.lower() for x in b)))

以下是一个例子：

k = []
z = []
with open('stopWords.txt', 'r') as f:
   for word in f:
        word = word.split('\n')
        k.append(word[0])

with open('a.txt', 'r') as f_obj:
    for u in f_obj:
        u = u.split('\n')
        z.append(u[0])

p = [t for t in z if t not in k]
print(p)

遍历stop word文件中的每个单词并将其附加到列表中，然后遍历另一个文件中的每个单词。执行列表理解并删除“停止单词”列表中出现的每个单词

可能重复的欢迎。请退房。希望你至少展示一下你自己已经尝试过的东西嗨，我查看了这个链接并尝试使用它的解决方案。然而，我只有2个文件，它不工作。只是一个普通的事情（而不是真正的答案，你的问题，但至少有一个评论，你可能会想包括）当你打算删除不使用一个框架，如NLTK停止的话，有一些事情你需要考虑。1）考虑堵塞。i、你可能想删除复数，这样如果“dog”被认为是停止词，那么dog也应该被删除。有几种方法可以做到这一点…==>删除单词末尾的所有S，或者重复你的词尾，并给每个单词添加一个S，或者使用Le（）方法来查看一个部分是否是从0开始的一个精确匹配。@ CulelsIsHeaveGook OK很多东西，编辑我的答案，如果你想要的话，我会批准你想考虑的第二件事（这是最好的方法）。是取消所有资本化；在Python中，“Hello”==“Hello”将返回“False”，因此不认为是相同的。另一件你可以尝试的事情是删除所有的短单词。@clunessful\u船长编辑了我的