文章/答案/技术大牛

发布

社区首页 >问答首页 >错误: step_impute_linear()使用的数据没有任何行，其中的计算值都是完整的。

问错误: step_impute_linear()使用的数据没有任何行，其中的计算值都是完整的。
EN

Stack Overflow用户

提问于 2022-01-25 15:35:19

回答 1查看 87关注 0票数 0

我正在使用配方函数，当使用食谱函数中的step_impute_linear()来计算NA时会出现错误。请注意，step_impute_median或step_impute_mean工作时没有问题。此外，如果我使用：

step_impute_linear(all_predictors())或，
step_impute_linear(all_numeric(),.)等

所有的组合都不起作用。

也不是像这样的其他方法：

step_impute_knn(all_nominal(),impute_with = all_predictors(),-has_role("ID"))

也失败了。

我还检查了数据，并不是所有的行都包含缺失的数据，也不是所有的列都是这样。

dt_rec <- recipe(
  OFFER_STATUS~ ., data = dt_training) %>% 
  # 1. Define Role
  update_role(MO_ID, new_role = "ID") %>% 
  update_role(SO_ID, new_role = "ID") %>% 
  # turn dates into decimals
  step_mutate_at(where(is.Date), fn = decimal_date) %>% 
  # impute all numeric columns with their median
  # 2. Impute
  # step_impute_median(all_numeric(),-has_role("ID"))%>%
  step_impute_linear(all_numeric(),impute_with = .,-has_role("ID"))
  # ignoring novel factors
  # 3. Handle factor levels
  step_novel(all_predictors(), -all_numeric())  %>%
  # impute all other nominal (character + factor) columns with the value "none"
  step_unknown(all_nominal(), new_level = "none") %>% 
  step_string2factor(all_nominal(), -all_outcomes(), -has_role("ID")) %>% 
  # remove constant columns
  step_zv(all_predictors()) %>% 
  # 4. Discretize
  # remove variables that have a high correlation with each other
  # as this will lead to multicollinearity
  step_corr(all_numeric(), threshold = 0.99) %>% 
  # normalization --> centering and scaling numeric variables
  # mean = 0 and Sd = 1
  step_normalize(all_numeric()) %>%
  # 5. Dummy variables
  # creating dummary variables for nominal predictors
  step_dummy(all_nominal(), -all_outcomes())
# 6. Normalization
# 7. Multivariate transformation
step_pca(all_numeric_predictors())

dt_rec

dt_rec %>% summary()

machine-learning

linear-regression

recipe

tidymodels

回答 1

Stack Overflow用户

发布于 2022-01-28 20:31:16

当您使用function like step_impute_linear()时，您说的是“用其他变量来计算我的变量的值”。如果这些其他变量中的一些也有丢失的数据，那么模型将无法成功地适应。如果您有一组变量，比如x、y和z，这些变量都有丢失的数据，并且希望相互使用，那么我建议您：

similar

impute

使用只依赖于变量的方法(例如x)对一个或多个变量进行加密，就像使用中间变量或其他变量一样，仅使用现在已完成且没有丢失数据的预测器(例如，基于x)的估算y和z )

如果你试图为一组相互使用的变量建立一套线性模型，这是行不通的，所有这些变量都有缺失的数据。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70851445

复制

相似问题

问错误: step_impute_linear()使用的数据没有任何行，其中的计算值都是完整的。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问错误: step_impute_linear()使用的数据没有任何行，其中的计算值都是完整的。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问错误: step_impute_linear()使用的数据没有任何行，其中的计算值都是完整的。
EN