Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取TF-IDF值_Python_List_Loops_Set_Tf Idf - Fatal编程技术网

Python 获取TF-IDF值

Python 获取TF-IDF值,python,list,loops,set,tf-idf,Python,List,Loops,Set,Tf Idf,我的新闻数据集有TF-IDF的代码: #Pake yang ini vectorizer = TfidfVectorizer() vectors = vectorizer.fit_transform(text) terms = vectorizer.get_feature_names() # sum tfidf frequency of each term through documents sums = vectors.sum(axis=0) # connecting term to i

我的新闻数据集有TF-IDF的代码:

#Pake yang ini
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(text)

terms = vectorizer.get_feature_names()

# sum tfidf frequency of each term through documents
sums = vectors.sum(axis=0)

# connecting term to its sums frequency
data = []
for col, term in enumerate(terms):
    data.append( (term, sums[0,col] ))

ranking = pd.DataFrame(data, columns=['term','rank'])
#print(ranking.sort_values('rank', ascending=False))
tfrank = ranking.sort_values('rank', ascending=False)
tf = tfrank['term'].values.tolist()
tflist = tfrank.values.tolist()
tflist
结果是这样的:

[['kompascom', 1.4017648244641259],
 ['yang', 1.3134948993732996],
 ['lembaga', 0.9450441338264206],
 ['sumber', 0.9201203935242713],
 ['di', 0.8774768633619345],
 ['fakta', 0.7941379426583972],
 ['dan', 0.7820675768624364],
 ['ini', 0.7721892264143173],
 ['bisa', 0.7215355604434974],
 ['informasi', 0.7038273489379546],
 ['hoaks', 0.6443546898427824],
 ['ifcn', 0.6310537233704365],
 ['atau', 0.6094359873139008],
 ['penguji', 0.5945524698582002],
 ['internasional', 0.5945524698582002],
 ['rubrik', 0.5534905743539935],
 ['khusus', 0.5534905743539935],
 ['masyarakat', 0.5473499161901632],
 ['dalam', 0.5325014351825453],...]
我有三盘,我在做交叉

LDA_set = set(ldasort)
NMF_set = set(nmsort)
TFIDF_set = set(tf)
itrsect = LDA_set.intersection(NMF_set, TFIDF_set)
itrsect
相交结果如下所示:

{'14',
 '2018',
 '23',
 '49',
 'acara',
 'ada',
 'adalah',
 'agar',
 'antara',
 'atas',
 'atau',
 'awal',
 'banjir',
 'baru',
 'belum',
 'beredar',
 'berisi',
 'beritanya',
 'berpartisipasi',...}
实际上,我希望交叉口的结果也与TF-IDF分数一致。我如何处理循环? 因此,预期输出将是,例如:

[['14', 1.4017648244641259],
 ['2018', 1.3134948993732996],
 ['23', 0.9450441338264206],
 ['49', 0.9201203935242713],
 ['acara', 0.8774768633619345],
 ['ada', 0.7941379426583972],
 ['adalah', 0.7820675768624364],
 ['agar', 0.7721892264143173],
 ['atas', 0.7215355604434974],
 ['atau', 0.7038273489379546],
 ['awal', 0.6443546898427824],
 ['banjir', 0.6310537233704365],
 ['baru', 0.6094359873139008],
 ['belum', 0.5945524698582002],
 ['beredar', 0.5945524698582002],
 ['berisi', 0.5534905743539935],
 ['beritanya', 0.5534905743539935],
 ['berpartisipasi', 0.5473499161901632],...]

您在ldasort中有什么
set()
无法将列表作为元素获取。如果它是dictionary,那么它只获取
键()
,您可能需要手动将其转换为元组列表。-
ldasort.items()
BTW:如果在一个数据中有
(“14”,1.0)
,在另一个
(“14”,0.1)
中,则
set()
将把这两个元素视为不同的元素,并且
交叉点
将删除它们。也许你应该保持当前的中间状态,并使用它从
ldasort
@furas ldasort是一个列表可能使用
itrsect
for
-循环从
tfrank
获取分数你想给出详细的代码吗@福拉斯