基本上,我试图使用下面的脚本来使用这个函数,以便了解一些有关离散特性的内容:
from sklearn.feature_selection import mutual_info_classif
mutual_info_classif(r1[to_consider].values, r1['Y'].values, discrete_features='True')但是,它会引发一个错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-52-c20422a21616> in <module>()
1 from sklearn.feature_selection import mutual_info_classif
2
----> 3 mutual_info_classif(np.array(r1[to_consider].values), np.array(r1['Y'].values), discrete_features='True')
~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in mutual_info_classif(X, y, discrete_features, n_neighbors, copy, random_state)
448 check_classification_targets(y)
449 return _estimate_mi(X, y, discrete_features, True, n_neighbors,
--> 450 copy, random_state)
~\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\feature_selection\mutual_info_.py in _estimate_mi(X, y, discrete_features, discrete_target, n_neighbors, copy, random_state)
259 if discrete_features.dtype != 'bool':
260 discrete_mask = np.zeros(n_features, dtype=bool)
--> 261 discrete_mask[discrete_features] = True
262 else:
263 discrete_mask = discrete_features
IndexError: arrays used as indices must be of integer (or boolean) type
[1]: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html我想弄明白为什么会发生这种事。以下是我的数据(热编码)的预览:
r1[to_consider].values
array([[ 0, 7, 1, 1, 2, 5],
[ 1, 0, 1, 0, 0, 5],
[ 0, 0, 1, 1, 6, 5],
...,
[ 0, 0, 1, 1, 6, 3],
[ 3, 11, 2, 2, 10, 5],
[ 0, 0, 1, 1, 9, 0]], dtype=int64)和:
r1['Y'].values
array([0, 0, 1, ..., 0, 0, 0], dtype=int8)发布于 2018-04-22 14:43:26
我假设你的
r1
是一个DataFrame。尝试这样做,而不是在值中提供信息。
from sklearn.feature_selection import mutual_info_classif
mutual_info_classif(r1[to_consider], r1['Y'], discrete_features='True')https://stackoverflow.com/questions/49238160
复制相似问题