我想知道如何用list traning_data来计算unigram、bigram、cooc和wordcount。
我是蟒蛇新来的,请耐心点,me.Thanks!
您需要实现HMM postagger的两个部分。
这是training_dataset和预期结果如下:
# The tiny example.
training_dataset = [(['dog', 'chase', 'cat'], ['NN', 'VV', 'NN']),
(['I', 'chase', 'dog'], ['PRP', 'VV', 'NN']),
(['cat', 'chase', 'mouse'], ['NN', 'VV', 'NN'])
]
hmm = HMM(training_data=training_dataset)
# Testing if the parameter are correctly estimated.
assert hmm.unigram['NN'] == 5
assert hmm.bigram['VV', 'NN'] == 3
assert hmm.bigram['NN', 'VV'] == 2
assert hmm.cooc['dog', 'NN'] == 2发布于 2015-08-04 11:33:13
在列表中使用Counter()是非常直接的。Counter.update()完全能满足你的需要。
from nltk.util import bigrams
...
for words, tags in training_data:
self.unigram.update(tags)
self.bigram.update(bigrams(tags))
self.cooc.update(zip(words,tags))
self.wordcount.update(words)
...https://stackoverflow.com/questions/31666958
复制相似问题