我想了解如何处理稀疏矩阵。我有这个代码来生成多标签分类数据集为稀疏矩阵.
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse = True, n_labels = 20, return_indicator = 'sparse', allow_unlabeled = False)此代码以下列格式给出X:
<100x20 sparse matrix of type '<class 'numpy.float64'>'
with 1797 stored elements in Compressed Sparse Row format>是:
<100x5 sparse matrix of type '<class 'numpy.int64'>'
with 471 stored elements in Compressed Sparse Row format>现在我需要把X和y分割成X_train,X_test,y_train和y_test,这样火车就会有70%的落点。我该怎么做呢?
这就是我试过的:
X_train, X_test, y_train, y_test = train_test_split(X.toarray(), y, stratify=y, test_size=0.3)并得到错误消息:
TypeError:传递稀疏矩阵,但需要密集数据。使用X.toarray()将其转换为密集的numpy数组。
发布于 2019-09-09 20:34:51
错误消息本身似乎提出了解决方案。需要将X和y转换为密集矩阵。
请做以下几点,
X = X.toarray()
y = y.toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)发布于 2019-11-03 08:07:14
这个问题是由stratify=y造成的。如果您查看拆分的文档,我们可以看到
*arrays:
stratify:
不幸的是,即使将数据集转换为密集数组,该数据集在stratify中也不能很好地工作:
>>> X_tr, X_te, y_tr, y_te = train_test_split(X, y, stratify=y.toarray(), test_size=0.3)
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.https://stackoverflow.com/questions/57860726
复制相似问题