Python 词语间的相似性

Python 词语间的相似性,python,cosine-similarity,Python,Cosine Similarity,我有以下问题: 假设我有一个给定的字符串: A = "Today, the weather is fine. The sun is shining and we'd love to go swimming. Since the water is cold, we only walk around. We love weekends." A中的一组单词将描述我的单词向量基础 假设我们有另一组单词,我想计算它们的余弦相似度: B = "Friday, morning, tomorrow, toda

我有以下问题:

假设我有一个给定的字符串:

A = "Today, the weather is fine. The sun is shining and we'd love to go swimming. Since the water is cold, we only walk around. We love weekends."
A中的一组单词将描述我的单词向量基础

假设我们有另一组单词,我想计算它们的余弦相似度:

B = "Friday, morning, tomorrow, today, sun, moon, fun, swim"
然后,我使用逐点互信息计算向量中特征的权重。(让我们假设它们是给定的。)

如何计算B中单词之间的相似性分数?结果应为BxB矩阵

为了计算cos相似性,我已经做了以下工作:

def counter_cosine_similarity(c1, c2):
    terms = set(c1).union(c2)
    dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
    magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
    magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
    return dotprod / (magA * magB)
其中c1、c2是计数器对象

但是:关于cos计算,我的函数是正确的。但是我如何计算B中每个单词的相似性呢? 使用给定的解决方案,我只能计算整个字符串/列表与另一个字符串/列表的相似性

非常感谢您的帮助