文章/答案/技术大牛

发布

问Sklearn +管道= TypeError
EN

Stack Overflow用户

提问于 2021-03-19 22:57:27

回答 3查看 591关注 0票数 1

我试图使用正确的管道和柱变压器从滑雪，但总是以一个错误结束。我在下面的例子中复制了它。

# Data to reproduce the error
X = pd.DataFrame([[1,  2 , 3,  1 ],
                  [1, '?', 2,  0 ],
                  [4,  5 , 6, '?']],
                 columns=['A', 'B', 'C', 'D'])

#SimpleImputer to change the values '?' with the mode
impute = SimpleImputer(missing_values='?', strategy='most_frequent')

#Simple one hot encoder
ohe = OneHotEncoder(handle_unknown='ignore', sparse=False)

col_transfo = ColumnTransformer(transformers=[
    ('missing_vals', impute, ['B', 'D']),
    ('one_hot', ohe, ['A', 'B'])],
    remainder='passthrough'
)

然后按以下方式调用变压器：

col_transfo.fit_transform(X)

返回以下错误：

TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']

pandas

scikit-learn

preprocessor

one-hot-encoding

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-03-22 14:53:58

ColumnTransformer将其变压器并行应用，而不是按顺序执行。因此，OneHotEncoder看到未计算的列B，并对混合类型犹豫不决。

在您的示例中，只对所有列进行加密，然后对A, B进行编码，可能会很好。

encoder = ColumnTransformer(transformers=[
    ('one_hot', ohe, ['A', 'B'])],
    remainder='passthrough'
)
preproc = Pipeline(steps=[
    ('impute', impute),
    ('encode', encoder),
    # optionally, just throw the model here...
])

如果A,C中未来丢失的值会导致错误是很重要的，那么同样地将impute包装到自己的ColumnTransformer中。

另见Apply multiple preprocessing steps to a column in sklearn pipeline

票数 1

Stack Overflow用户

发布于 2021-03-19 23:12:15

这会给您一个错误，因为OneHotEncoder只接受一种数据格式。在您的例子中，它是numbers和object的混合体。为了克服这个问题，您可以在imputer和OneHotEncoder之后分离管道，以便在imputing的输出上使用astype方法。类似于：

ohe.fit_transform(imputer.fit_transform(X[['A','B']]).astype(float))

票数 0

Stack Overflow用户

发布于 2021-03-20 00:18:07

错误不是来自ColumnTransformer，而是来自OneHotEncoder对象。

col_transfo = ColumnTransformer(transformers=[
    ('missing_vals', impute, ['B', 'D'])],
    remainder='passthrough'
)

col_transfo.fit_transform(X)

数组([ 2，1，1，3，2，0，1，2，5，0，4，6]，dtype=object)

ohe.fit_transform(X)

TypeError:参数必须是字符串或数字

OneHotEncoder抛出此错误，因为对象获得了值的混合类型(int + string)以在同一列上编码，您需要将浮点列强制转换为string，以便应用它。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66716586

复制

相似问题

问Sklearn +管道= TypeError
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Sklearn +管道= TypeErrorEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Sklearn +管道= TypeError
EN