假设我有两个文本文件。文件1包含培训集,它主要用于定义词汇表。文件2是用户输入的单词。
d1 = (
"Project 1 details on Machine learning",
"Project 2 detail on machine learning and statics",
"Project 3 is on mach learn as well"
)
d2 = (
"Projects related to machine learning"
)现在使用sklearn,我们找到了d1的tfidf
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
print( tfidf_matrix.shape)现在,对于查询d2,我想根据从d1中学到的词汇来计算tfidf向量。我该怎么做?
发布于 2019-08-20 11:09:45
与SKLearn中的任何变压器一样,在火车上使用.fit (在本例中使用.fit_transform(d1) )之后,您可以使用tfidf_vectorizer.transform(d2)对测试集进行transform。
发布于 2019-08-20 11:29:47
您可以将_属性从第一个向量化器作为参数传递给第二个向量化器:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer1 = TfidfVectorizer()
vectorizer2 = TfidfVectorizer()
vectorizer1.fit_transform(d1)
vectorizer2 = TfidfVectorizer(vocabulary=vectorizer1.vocabulary_)
vectorizer2.fit_transform(d2)https://stackoverflow.com/questions/57572184
复制相似问题