我试着在语料库上使用TfidfVectorizer,但是每次我遇到这个错误
File "sparsefuncs.pyx", line 117, in sklearn.utils.sparsefuncs.inplace_csr_row_normalize_l2 (sklearn\utils\sparsefuncs.c:2328)
ValueError: Buffer dtype mismatch, expected 'int' but got 'long long'这是我的密码
corpus = []
testCorpus = []
trainType = []
testType = []
with open("stone_sku.csv") as f:
cr = csv.DictReader(f)
for row in cr:
corpus.append(row['sku'])
trainType.append(row['sku'])
with open("stone_sku.csv") as f:
crTest = csv.DictReader(f)
for row in crTest:
testCorpus.append(row['sku'])
testType.append(row['sku'])
cv = TfidfVectorizer(min_df=1, analyzer='char', ngram_range=(2,3))
trainCounts = cv.fit_transform(corpus)它在CountVectorizer中运行良好,如果我尝试使用TfidfTransformer转换数据,也会发生相同的错误。
发布于 2014-04-01 07:37:19
您运行64位Windows吗?这可能是由最近在主分支中解决的一个已知问题引起的。
https://stackoverflow.com/questions/22775997
复制相似问题