我想为一个单词创建一组替代单词。替代的单词必须适当地不同,以便将'dog‘替换为'dalmatian’过于相似--我想将'dog‘替换为'cat’。虽然不是万无一失,但我认为我可以通过获取一个单词的上位词和十个那个上位词的上位词(即祖父母同义词集),最终获得该祖父母的所有孙子单词。
希望这是有意义的。在伪代码中,它应该读作
for each i as hypernym (synset)
for each j as i.hypernym
get all the holonyms for j as s
for each s get all the holonyms as x
print x这可行吗?
发布于 2014-06-28 02:16:07
from itertools import chain
from collections import defaultdict
from nltk.corpus import wordnet as wn
gflemma_holonym = defaultdict(set)
for ss in wn.all_synsets():
if ss.part_holonyms() and ss.hypernyms() and ss.hypernyms()[0].hypernyms():
grandfather = ss.hypernyms()[0].hypernyms()[0] # grandfather concept.
holonyms = list(chain(*[i.lemma_names() for i in ss.part_holonyms()]))
for lemma in grandfather.lemma_names():
gflemma_holonym[lemma].update(holonyms)
print gflemma_holonym[u'edible_nut']
print
print gflemma_holonym[u'geographical_area']输出
set([u'black_hickory', u'black_walnut', u'Juglans_nigra', u'black_walnut_tree'])
set([u'battlefield', u'fair', u'infield', u'field_of_honor', u'field_of_battle', u'battleground', u'city', u'bowl', u'field', u'stadium', u'funfair', u'outfield', u'diamond', u'urban_area', u'populated_area', u'desert', u'arena', u'carnival', u'baseball_diamond', u'sports_stadium', u'ball_field', u'baseball_field'])请注意wordnet库存是有限的。尤其是当你在寻找概念/引理的关系时(例如,从同义词集的祖父到同义词集的全义词)
发布于 2014-06-14 21:37:48
您可以使用以太列表或字典来完成此操作(字典更具pythonic风格)。以dictionnary为例,如下所示:
dictionnary={"dog": {"dalmatian","stuff"}, "singer": {"rihanna","eminem"}, "country": {"United states","England"}}
print(dictionnary['dog'])https://stackoverflow.com/questions/24217776
复制相似问题