文章/答案/技术大牛

发布

社区首页 >问答首页 >wordnet python-nltk接口是否包含任何与语义相关的度量？

问wordnet python-nltk接口是否包含任何与语义相关的度量？
EN

Stack Overflow用户

提问于 2020-08-21 08:25:16

回答 1查看 452关注 0票数 1

我知道我可以在nltk接口中使用语义相似度

sim=wn.synset(name_1).path_similarity(wn.synset(name_2))

我还知道我可以使用向量空间模型和共生矩阵来评估单词的语义相关性，但我无法在nltk界面中找到任何解决方案。

nltk

wordnet

python-3.x

nlp

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-08-21 09:18:59

NLTK-WordNet有许多基于WordNet分类的单词相似度算法，但没有一个是基于向量空间模型或共现矩阵的。

from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic

# Wordnet information content file
brown_ic = wordnet_ic.ic('ic-brown.dat')

cat = wn.synsets('cat')[0]
dog = wn.synsets('dog')[0]


'''
Path Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses
in the is-a (hypernym/hypnoym) taxonomy.
The score is in the range 0 to 1.
'''
print(wn.path_similarity(cat, dog))
# 0.2

'''
Leacock-Chodorow Similarity:
Return a score denoting how similar two word senses are,
based on the shortest path that connects the senses (as above)
and the maximum depth of the taxonomy in which the senses occur.
The relationship is given as -log(p/2d)
where p is the shortest path length and d the taxonomy depth.
'''
print(wn.lch_similarity(cat, dog))
# 2.0281482472922856

'''
Wu-Palmer Similarity:
Return a score denoting how similar two word senses are,
based on the depth of the two senses in the taxonomy
and that of their Least Common Subsumer (most specific ancestor node).
'''
print(wn.wup_similarity(cat, dog))
# 0.8571428571428571

'''
Lin Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
'''
print(wn.lin_similarity(cat, dog, ic=brown_ic))
# 0.8768009843733973

'''
Resnik Similarity:
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
Note that for any similarity measure that uses information content,
the result is dependent on the corpus used to generate the information content
and the specifics of how the information content was created.
'''
print(wn.res_similarity(cat, dog, ic=brown_ic))
# 7.911666509036577

'''
Jiang-Conrath Similarity
Return a score denoting how similar two word senses are,
based on the Information Content (IC) of the Least Common Subsumer
and that of the two input Synsets.
The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
'''
print(wn.jcn_similarity(cat, dog, ic=brown_ic))
# 0.4497755285516739

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63514884

复制

相似问题

问wordnet python-nltk接口是否包含任何与语义相关的度量？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wordnet python-nltk接口是否包含任何与语义相关的度量？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问wordnet python-nltk接口是否包含任何与语义相关的度量？
EN