Python 将标记化单词列表与一组单词进行比较

Python 将标记化单词列表与一组单词进行比较,python,nlp,Python,Nlp,我想知道评论是否与主题相关,所以我建立了一套与主题相关的词汇 effi_set = {"reminders","medication", "Alarm" "diet", "carbohydrate","nutrition","weight","IBM", "sport", "activity", &

我想知道评论是否与主题相关,所以我建立了一套与主题相关的词汇

effi_set = {"reminders","medication", "Alarm"
"diet", "carbohydrate","nutrition","weight","IBM", "sport", "activity", "fitbit","blood","insulin",
"Hb1ac" , "data exportation","feedback", "monitoring","recording ","monitor", "record",
"passwords","security","backup","protection",
"information","education","complication","risk","prevent","contact","consultation",
"facebook","twitter","social media","mail","FAQ","doctor",
"data","offline","language","location","region","country",
"devise","glucometer","bluetooth","automation","carb","barcode","food","syncronize","PHR","import"}
我将每个评论标记化,以将标记化的单词与主题集进行比较

for line in df["content"]:
    tokenized_words =word_tokenize(line)
    for item in tokenized_words:
        if item not in effi_set:
            df["efficient"] = False
        else:
            df["efficient"] = True
结果是所有的评论都是假的,但事实并非如此。

df[“高效”]=false
将显示整个列

您必须一次修改一行

df["efficient"] = False
for index, line in df["content"].iteritems():
    tokenized_words =word_tokenize(line)
    for item in tokenized_words:
        if item in effi_set:
            df.at[index, "efficient"] = True
            continue

如果您只是用
df[“effective”]=false
更新布尔指示符,您怎么知道
that all reviews all false
?df.head()df.to_csv(“Final_comments.csv”,index=false)不起作用类型错误:预期的字符串或字节,如objectEdited!忘记了循环中的
索引
,但结果是相同的,有效列都是FalsOK。你的代码应该做什么?如果在
effi\u集合
中只找到一个标记化的单词,那么effic应该是
True
?因为这里它保持循环,所以如果未找到
标记化\u单词
的最后一个元素,当在effi\u集中发现至少一个标记化\u单词时,它会将
efficient
设置为Falseya efficient True,因此一旦他发现一个单词集efficient为True,并执行下一个标记化\u单词列表