Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用卡方检验列出语料库中所有拒绝零假设的词_Python_Scikit Learn_Nlp_Chi Squared - Fatal编程技术网

Python 用卡方检验列出语料库中所有拒绝零假设的词

Python 用卡方检验列出语料库中所有拒绝零假设的词,python,scikit-learn,nlp,chi-squared,Python,Scikit Learn,Nlp,Chi Squared,我有一个脚本,其中列出了前n个单词(具有较高卡方值的单词)。但是,不是提取固定数量的单词,而是提取p值小于0.05的所有单词,即拒绝无效假设 这是我的密码: from sklearn.feature_selection import chi2 #vectorize top 100000 words tfidf = TfidfVectorizer(max_features=100000,ngram_range=(1, 3)) X_tfidf = tfidf.fit_transform(df.re

我有一个脚本,其中列出了前n个单词(具有较高卡方值的单词)。但是,不是提取固定数量的单词,而是提取p值小于0.05的所有单词,即拒绝无效假设

这是我的密码:

from sklearn.feature_selection import chi2

#vectorize top 100000 words
tfidf = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
X_tfidf = tfidf.fit_transform(df.review_text)
y = df.label
chi2score = chi2(X_tfidf, y)[0]
scores = list(zip(tfidf.get_feature_names(), chi2score))
chi2 = sorted(scores, key=lambda x:x[1])
allchi2 = list(zip(*chi2))

#lists top 20 words
allchi2 = allchi2[0][-20:]
因此,在这种情况下,我不想列出前20个单词,而是想要所有拒绝无效假设的单词,即评论中依赖于情绪类别(积极或消极)的所有单词 #矢量化前100000个单词 tfidf=TFIDFvectorier(最大特性=100000,ngram范围=(1,3)) X\u tfidf=tfidf.fit\u转换(df.review\u文本) y=df.label chi2_分数,pval_分数=chi2(X_tfidf,y)
feature\u pval\u items=filter(lambda x:x[1]问题与
keras
无关-请不要垃圾邮件发送不相关的标签(已删除)。
from sklearn.feature_selection import chi2

#vectorize top 100000 words
tfidf = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
X_tfidf = tfidf.fit_transform(df.review_text)
y = df.label
chi2_score, pval_score = chi2(X_tfidf, y)
feature_pval_items = filter(lambda x:x[1]<0.05, zip(tfidf.get_feature_names(), pval_score))
you_want_feature_pval_items = sorted(feature_pval_items, key=lambda x:x[1])