我有一个数据帧DF,它看起来像
index posts
0 <div class="content">A number of <br/><br/>three ... </div>
1 <div class="content">Stack ... <br/><br/>overflow ... </div>
...然后,我尝试使用以下命令对每个posts进行标记:
sentences=[]
for post in DF["posts"]:
sentences += utility.tosentences(post, tokenizer)然后,我使用以下代码运行Word2Vec:
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',\
level=logging.INFO)
num_features = 100
min_word_count = 7
num_workers = 2
context = 5
downsampling = 1e-5
print "Training model..."
model = word2vec.Word2Vec(sentences, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
model.init_sims(replace=True)
Word2Vec.load()
model_name = "what"
model.save(model_name)
print "finished"然后,我测试了以下内容
model.doesnt_match("travel no Warning health".split())但是,它根本没有产生输出
我不明白我上面得到的大量输出的含义。为什么这不起作用?
发布于 2016-09-16 00:17:53
函数model.doesnt_match()不打印任何内容;它返回一个值。打印返回值以查看输出。
如果您正在从这个word2vec tutorial复制粘贴:它将显示您在交互式控制台中运行这些命令时将看到的输出。(此外,它还假设您了解自己在做什么。)
https://stackoverflow.com/questions/39380424
复制相似问题