Python中csv文件的词干分析
好的,我用Python编写了这段代码,其中导入了两个csv文件。第一个csv文件名为“claims”(一列,多行),另一个名为“sexualharsament”(一列,多行)。程序现在检查所有“claims”行,查看是否包含“sexualharsament”中的任何单词,如果包含,则将该行输出到名为“output”的新csv文件中它还消除了我选择的某些停止词。对行中的每个单词进行迭代,并在其上调用stem方法,这到底是如何工作的@帕德雷坎宁厄姆Python中csv文件的词干分析,python,csv,stop-words,stemming,Python,Csv,Stop Words,Stemming,好的,我用Python编写了这段代码,其中导入了两个csv文件。第一个csv文件名为“claims”(一列,多行),另一个名为“sexualharsament”(一列,多行)。程序现在检查所有“claims”行,查看是否包含“sexualharsament”中的任何单词,如果包含,则将该行输出到名为“output”的新csv文件中它还消除了我选择的某些停止词。对行中的每个单词进行迭代,并在其上调用stem方法,这到底是如何工作的@帕德雷坎宁厄姆 from nltk import Port
from nltk import PorterStemmer
PorterStemmer().stem_word('discriminated')
>>>discriminate
import csv
with open("claims.csv") as file1, open("masterlist.csv") as file2,
open("stopwords.csv") as file3, open("output.csv", "wb+") as file4:
writer = csv.writer(file4)
key_words = [word.strip() for word in file2.readlines()]
stop_words = [' also ', ' although ', ' always ', ' and ', ' any ', ' are ', ' as ', ' at ',\
' around ', ' be ', ' by ', ' for ', ' from ', ' has ', ' on ', ' that ', ' were ', ' will ',\
' with ' ' can ', ' cannot ', ' if ', ' it ', ' the ', ' there ', ' which ', ' in ', ' is ',\
' its ', ' me ', ' of ', ' was ', ' then ', ' with ', ' a ', ' an ', ' to ', ' to ', ' when ',\
' however ', '"', ',', '.', '-', '?', '!', '(', ')']
for row in file1:
row = row.strip()
row = row.lower()
for stopword in stop_words:
if stopword in row:
row = row.replace(stopword," ")
for key in key_words:
if key in row:
writer.writerow([key, row])
break