文章/答案/技术大牛

发布

社区首页 >问答首页 >问题提交分类问题

问问题提交分类问题
EN

Data Science用户

提问于 2019-11-05 21:00:42

回答 1查看 171关注 0票数 0

我正在尝试提交，所以我有一个没有标签的测试集，我正在尝试测试我的分类模型。特别是，我还必须以csv的形式提交这一预测。我有以下没有标签的测试集，这是pd.read_json()的输出，所以它是来自测试数据集的输出：

分类问题的重点是从指令中预测压路机的类型。分类问题已经解决了，我只需要提交。

因此，我必须从测试集预测这些指令，但是如果我尝试这样做的话：

test = pd.read_json('test_dataset_blind.jsonl',lines = True)
test

X_new = test['instructions']
new_pred_class = clf.predict(X_new)

clf是我的模型，在这种情况下，我使用的是随机森林。

我收到以下错误消息：

ValueError: setting an array element with a sequence.

有人能帮帮我吗？谢谢已经提前了。

编辑完全错误跟踪如下：

    ValueError                                Traceback (most recent call last)
<ipython-input-21-ba881bb9e0fe> in <module>
----> 1 new_pred_class = clf.predict(X_new)

~\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in predict(self, X)
    543             The predicted classes.
    544         """
--> 545         proba = self.predict_proba(X)
    546 
    547         if self.n_outputs_ == 1:

~\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in predict_proba(self, 
X)
    586         check_is_fitted(self, 'estimators_')
    587         # Check data
--> 588         X = self._validate_X_predict(X)
    589 
    590         # Assign chunk of trees to jobs

~\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in 
_validate_X_predict(self, X)
    357                                  "call `fit` before exploiting the 
model.")
    358 
--> 359         return self.estimators_[0]._validate_X_predict(X, 
check_input=True)
    360 
    361     @property

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in _validate_X_predict(self, 
X, check_input)
    389         """Validate X whenever one tries to predict, apply, 
predict_proba"""
    390         if check_input:
--> 391             X = check_array(X, dtype=DTYPE, accept_sparse="csr")
    392             if issparse(X) and (X.indices.dtype != np.intc or
    393                                 X.indptr.dtype != np.intc):

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, 
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, 
ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, 
estimator)
    494             try:
    495                 warnings.simplefilter('error', ComplexWarning)
--> 496                 array = np.asarray(array, dtype=dtype, order=order)
    497             except ComplexWarning:
    498                 raise ValueError("Complex data not supported\n"

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

~\Anaconda3\lib\site-packages\pandas\core\series.py in __array__(self, dtype)
    946             warnings.warn(msg, FutureWarning, stacklevel=3)
    947             dtype = "M8[ns]"
--> 948         return np.asarray(self.array, dtype)
    949 
    950     # ------------------------------------------------------------------

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

~\Anaconda3\lib\site-packages\pandas\core\arrays\numpy_.py in __array__(self, 
dtype)
    164 
    165     def __array__(self, dtype=None):
--> 166         return np.asarray(self._ndarray, dtype=dtype)
    167 
    168     _HANDLED_TYPES = (np.ndarray, numbers.Number)

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

ValueError: setting an array element with a sequence.

编辑2带有标签的数据集如下：

我所做的是：

我只考虑了操作人员的推动.然后我用熊猫创建了以下数据集：

在这样做后，我只考虑了值，我使用了tfμ-以色列国防军矢量器。然后我将数据分割成；

X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, 
    test_size=0.2, random_state=15)

并以支持向量机为模型。

编辑3已经消除了错误，并且作为输出，我实现了：

array(['icc', 'gcc', 'gcc', ..., 'clang', 'clang', 'clang'], dtype=object)

现在我做以下工作：

pd.DataFrame({'instructions': test['instructions'],'compiler':new_pred_class})

我得到了错误信息：

ValueError                                Traceback (most recent call last)
<ipython-input-41-da853bce8ce2> in <module>
----> 1 pd.DataFrame({'instructions': 
test['instructions'],'compiler':new_pred_class})

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, 
index, columns, dtype, copy)
    409             )
    410         elif isinstance(data, dict):
--> 411             mgr = init_dict(data, index, columns, dtype=dtype)
    412         elif isinstance(data, ma.MaskedArray):
    413             import numpy.ma.mrecords as mrecords

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in 
init_dict(data, index, columns, dtype)
255             arr if not is_datetime64tz_dtype(arr) else arr.copy() for  
arr in arrays
    256         ]
--> 257     return arrays_to_mgr(arrays, data_names, index, columns, 
dtype=dtype)
    258 
    259 

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in 
arrays_to_mgr(arrays, arr_names, index, columns, dtype)
 75     # figure out the index, if necessary
    76     if index is None:
---> 77         index = extract_index(arrays)
    78     else:
    79         index = ensure_index(index)

~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in 
extract_index(data)
379                         "length {idx_len}".format(length=lengths[0], 
idx_len=len(index))
    380                     )
--> 381                     raise ValueError(msg)
    382             else:
    383                 index = ibase.default_index(lengths[0])

ValueError: array length 30000 does not match index length 3000

显然，new_pred_class有30000个元素，而测试数据集是3000行。在这种情况下我该怎么办？谢谢已经提前了。

classification

machine-learning

回答 1

Data Science用户

回答已采纳

发布于 2019-11-06 12:32:57

您要加载到模型中的数据格式不正确。

通常，当数组中的所有元素共享相同的大小时，就会出现此错误。您需要确保每个行/数组包含相同数量的元素(否则，您无法从数据中形成2D数组)。也许读数据有问题吗？

票数 0

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/62734

复制

相似问题

问问题提交分类问题
EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问问题提交分类问题EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问问题提交分类问题
EN