Python 词语间的相似性
我有以下问题: 假设我有一个给定的字符串:Python 词语间的相似性,python,cosine-similarity,Python,Cosine Similarity,我有以下问题: 假设我有一个给定的字符串: A = "Today, the weather is fine. The sun is shining and we'd love to go swimming. Since the water is cold, we only walk around. We love weekends." A中的一组单词将描述我的单词向量基础 假设我们有另一组单词,我想计算它们的余弦相似度: B = "Friday, morning, tomorrow, toda
A = "Today, the weather is fine. The sun is shining and we'd love to go swimming. Since the water is cold, we only walk around. We love weekends."
A中的一组单词将描述我的单词向量基础
假设我们有另一组单词,我想计算它们的余弦相似度:
B = "Friday, morning, tomorrow, today, sun, moon, fun, swim"
然后,我使用逐点互信息计算向量中特征的权重。(让我们假设它们是给定的。)
如何计算B中单词之间的相似性分数?结果应为BxB矩阵
为了计算cos相似性,我已经做了以下工作:
def counter_cosine_similarity(c1, c2):
terms = set(c1).union(c2)
dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
return dotprod / (magA * magB)
其中c1、c2是计数器对象
但是:关于cos计算,我的函数是正确的。但是我如何计算B中每个单词的相似性呢?
使用给定的解决方案,我只能计算整个字符串/列表与另一个字符串/列表的相似性
非常感谢您的帮助