Python 3.x wordnet python nltk接口是否包含任何语义相关性度量?

Python 3.x wordnet python nltk接口是否包含任何语义相关性度量?,python-3.x,nlp,nltk,wordnet,Python 3.x,Nlp,Nltk,Wordnet,我知道我可以使用 sim=wn.synset(name_1).path_similarity(wn.synset(name_2)) 我也知道我可以使用向量空间模型和共现矩阵评估单词的语义相关性,但我无法在nltk界面中找到任何解决方案。nltk WordNet有许多基于WordNet分类法的单词相似性算法,尽管没有一个是基于向量空间模型或共生矩阵的 from nltk.corpus import wordnet as wn from nltk.corpus import wordnet_ic

我知道我可以使用

sim=wn.synset(name_1).path_similarity(wn.synset(name_2))

我也知道我可以使用向量空间模型和共现矩阵评估单词的语义相关性,但我无法在nltk界面中找到任何解决方案。

nltk WordNet有许多基于WordNet分类法的单词相似性算法,尽管没有一个是基于向量空间模型或共生矩阵的

from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic

# Wordnet information content file
brown_ic = wordnet_ic.ic('ic-brown.dat')

cat = wn.synsets('cat')[0]
dog = wn.synsets('dog')[0]


'''
Path Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses
in the is-a (hypernym/hypnoym) taxonomy.
The score is in the range 0 to 1.
'''
print(wn.path_similarity(cat, dog))
# 0.2

'''
Leacock-Chodorow Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses (as above)
and the maximum depth of the taxonomy in which the senses occur.
The relationship is given as -log(p/2d)
where p is the shortest path length and d the taxonomy depth.
'''
print(wn.lch_similarity(cat, dog))
# 2.0281482472922856

'''
Wu-Palmer Similarity:
Return a score denoting how similar two word senses are,
based on the depth of the two senses in the taxonomy
and that of their Least Common Subsumer (most specific ancestor node).
'''
print(wn.wup_similarity(cat, dog))
# 0.8571428571428571

'''
Lin Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
'''
print(wn.lin_similarity(cat, dog, ic=brown_ic))
# 0.8768009843733973

'''
Resnik Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
Note that for any similarity measure that uses information content,
the result is dependent on the corpus used to generate the information content
and the specifics of how the information content was created.
'''
print(wn.res_similarity(cat, dog, ic=brown_ic))
# 7.911666509036577

'''
Jiang-Conrath Similarity
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
'''
print(wn.jcn_similarity(cat, dog, ic=brown_ic))
# 0.4497755285516739

这是一个很好的答案-我不知道NLTK中有所有这些措施!然而,据我所知,关系的概念有点不同。我遗漏了什么吗?WordNet没有提供你想要的相似性类型,因为它不是从统计上派生出来的,而是硬编码到分类法/数据库中。