> impute_sample <- function(x) {
ifelse(is.na(x),
sample(x[!is.na(x)], size = sum(is.na(x)), replace = T),
x
)
}
> dd <- tibble(date=as.Date(c("2010-2-1",NA,"2020-3-2")),
value = c(1,NA,2),
grp = c("df","s",NA))
> dd
# A tibble: 3 x 3
date value grp
<date> <dbl> <chr>
1 2010-02-01 1 df
2 NA NA s
3 2020-03-02 2 NA
> dd%>%modify(impute_sample)
# A tibble: 3 x 3
date value grp
<dbl> <dbl> <chr>
1 14641 1 df
2 18323 1 s
3 18323 2 s
> dd%>%map_df(impute_sample)
# A tibble: 3 x 3
date value grp
<dbl> <dbl> <chr>
1 14641 1 df
2 18323 2 s
3 18323 2 df 其他类型可以很好地保存,但是日期被转换为dbl类型;在示例计算之后,我如何仍然保留日期类型?
发布于 2020-03-19 05:47:29
这里的问题在于impute_sample()函数对ifelse的依赖,它以产生不可预测的输出而臭名昭著,因为它将很高兴地安静地转换数据类型,而不会得到用户的任何明确的支持。如果我们简单地将ifelse替换为dplyr::if_else,那么这个例程就会像预期的那样工作。正如帮助文件所述:
Compared to the base ifelse(), this function is more strict. It checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.
您可以在下面的reprex中看到这种差异。
# load required libraries
library(tidyverse)
# define original function (base::ifelse)
impute_sample <- function(x) {
ifelse(is.na(x),
sample(x[!is.na(x)], size = sum(is.na(x)), replace = T),
x)
}
# define type-consistent version of function (dplyr::if_else)
impute_sample_consistent <- function(x) {
if_else(is.na(x),
sample(x[!is.na(x)], size = sum(is.na(x)), replace = T),
x)
}
# define data
dd <- tibble(
date = as.Date(c("2010-2-1", NA, "2020-3-2")),
value = c(1, NA, 2),
grp = c("df", "s", NA)
)
# apply original version using purrr::modify
dd %>% modify(impute_sample)
#> # A tibble: 3 x 3
#> date value grp
#> <dbl> <dbl> <chr>
#> 1 14641 1 df
#> 2 18323 1 s
#> 3 18323 2 df
# apply type-consistent version using purrr::modify
dd %>% modify(impute_sample_consistent)
#> # A tibble: 3 x 3
#> date value grp
#> <date> <dbl> <chr>
#> 1 2010-02-01 1 df
#> 2 2020-03-02 1 s
#> 3 2020-03-02 2 s最后,尽管您没有明确地问到这一点,但可能值得强调purrr::modify的一些基本内容,因为它往往会让人们在实践中绊倒。帮助文件声明:
...the modify() family always returns the same type as the input object.
当给定一个对象作为输入时,没有理由期望purrr::modify保留输入中各个元素的类型,只保留输入本身的类型。换句话说,如果您将x类型的数据帧(例如,tibble、data.table等)作为输入传递给purrr::modify,它将生成一个x类型的数据帧作为输出。但是,完全有可能(并与purrr::modify的定义一致)更改该数据框架中各个列的类型。
# it is possible to use modify to convert column types
dd %>% modify(type.convert)
#> # A tibble: 3 x 3
#> date value grp
#> <fct> <int> <fct>
#> 1 2010-02-01 1 df
#> 2 <NA> NA s
#> 3 2020-03-02 2 <NA>
# but the class of the input and output to modify remains the same
class(dd)
#> [1] "tbl_df" "tbl" "data.frame"
class(dd %>% modify(type.convert))
#> [1] "tbl_df" "tbl" "data.frame"https://stackoverflow.com/questions/60610550
复制相似问题