首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >WordNet Python词汇相似性

WordNet Python词汇相似性
EN

Stack Overflow用户
提问于 2017-01-22 17:14:46
回答 1查看 10.1K关注 0票数 6

我试图找到一种可靠的方法来度量两个术语的语义相似性。第一个度量可以是次地名/超限图上的路径距离(最终,2-3度量的线性组合可能会更好)。

代码语言:javascript
复制
from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
print(dog.path_similarity(cat))
  • 我仍然不明白n.01的意思和为什么它是必要的。
  • 有一种方法可以直观地显示两个术语之间的计算路径吗?
  • 我还可以使用其他哪一种语义度量?
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-01-22 18:36:44

1.我仍然不明白n.01是什么意思以及为什么它是必要的。

,而这里nltk来源显示的结果是"WORD.PART-OF-SPEECH.SENSE-NUMBER"

引用来文提交人的话:

代码语言:javascript
复制
Create a Lemma from a "<word>.<pos>.<number>.<lemma>" string where:
<word> is the morphological stem identifying the synset
<pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
<number> is the sense number, counting from 0.
<lemma> is the morphological form of interest

N的意思是名词,我也建议阅读关于wordnet数据集的文章。

2.有一种方法可以直观地显示两个术语之间的计算路径?

请查看nltk wordnet文档 on 相似性部分。对于路径算法,您有几种选择(您可以尝试混合几种)。

很少有来自nltk文档的例子:

代码语言:javascript
复制
from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')

print(dog.path_similarity(cat))
print(dog.lch_similarity(cat))
print(dog.wup_similarity(cat))

对于可视化,您可以构建一个距离矩阵M[i,j],其中:

M[i,j] = word_similarity(i, j)

并使用下面的stackoverflow回答绘制可视化。

3.我还可以使用哪个nltk语义度量?

如前所述,有几种方法可以计算单词的相似性。我还建议调查一下gensim。我使用它的word2vec实现来实现word相似性,它对我来说很好。

如果你需要任何帮助,选择算法,请提供更多的信息,你面临的问题。

更新:

有关单词sense number含义的更多信息可以找到这里

WordNet中的感官通常是从最常用到最不常用的,最常见的意思是1.

问题是“狗”是模棱两可的,你必须选择正确的含义。

您可能会选择第一种感觉作为天真的方法,或者根据您的应用程序或研究找到自己选择正确意义的算法。

要从wordnet获取单词的所有可用定义(称为wordnet上的同步),只需调用wn.synsets(word)即可。

我鼓励您为每个定义深入研究这些同步集中包含的元数据。

下面的代码展示了一个获取元数据并很好地打印它的简单示例。

代码语言:javascript
复制
from nltk.corpus import wordnet as wn

dog_synsets = wn.synsets('dog')

for i, syn in enumerate(dog_synsets):
    print('%d. %s' % (i, syn.name()))
    print('alternative names (lemmas): "%s"' % '", "'.join(syn.lemma_names()))
    print('definition: "%s"' % syn.definition())
    if syn.examples():
        print('example usage: "%s"' % '", "'.join(syn.examples()))
    print('\n')

代码输出:

代码语言:javascript
复制
0. dog.n.01
alternative names (lemmas): "dog", "domestic_dog", "Canis_familiaris"
definition: "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds"
example usage: "the dog barked all night"


1. frump.n.01
alternative names (lemmas): "frump", "dog"
definition: "a dull unattractive unpleasant girl or woman"
example usage: "she got a reputation as a frump", "she's a real dog"


2. dog.n.03
alternative names (lemmas): "dog"
definition: "informal term for a man"
example usage: "you lucky dog"


3. cad.n.01
alternative names (lemmas): "cad", "bounder", "blackguard", "dog", "hound", "heel"
definition: "someone who is morally reprehensible"
example usage: "you dirty dog"


4. frank.n.02
alternative names (lemmas): "frank", "frankfurter", "hotdog", "hot_dog", "dog", "wiener", "wienerwurst", "weenie"
definition: "a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll"


5. pawl.n.01
alternative names (lemmas): "pawl", "detent", "click", "dog"
definition: "a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward"


6. andiron.n.01
alternative names (lemmas): "andiron", "firedog", "dog", "dog-iron"
definition: "metal supports for logs in a fireplace"
example usage: "the andirons were too hot to touch"


7. chase.v.01
alternative names (lemmas): "chase", "chase_after", "trail", "tail", "tag", "give_chase", "dog", "go_after", "track"
definition: "go after with the intent to catch"
example usage: "The policeman chased the mugger down the alley", "the dog chased the rabbit"
票数 12
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41793842

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档