考虑下面的示例,它总结了使用dplyr的summarise管道来识别与某些CHAR关联的minimum DATE的数据框架。
library('tidyverse')
library('lubridate')
temp <- data.frame(
CHAR = c(
'A',
'B',
'C'
),
DATE = c(
'20090101',
'20100101',
NA
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()识别minimum是否为NA将使用is.na进行测试。
temp %>% mutate(
DATE_lgl = DATE %>% is.na() # Identify dates that are missing/NA
)输出量
# A tibble: 3 x 3
CHAR DATE DATE_lgl
<chr> <date> <lgl>
1 A 2009-01-01 FALSE
2 B 2010-01-01 FALSE
3 C NA FALSE DATE_lgl错误地显示了FALSE,其中DATE是NA。为什么会这样呢?
删除na.rm = TRUE修复了这个问题,但不适用于以下配置,在这种配置中,需要na.rm = TRUE来消除缺少的条目:
temp <- data.frame(
CHAR = c(
'A',
'B',
'C',
'C'
),
DATE = c(
'20090101',
'20100101',
NA,
'20110101'
) %>% ymd(), # Turn character strings to dates
stringsAsFactors = FALSE
) %>% group_by(
CHAR
) %>% summarise(
DATE = min(DATE, na.rm = TRUE) # Extract minimum date
) %>% ungroup()> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 lubridate_1.7.4 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.5 purrr_0.2.5
[7] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 cellranger_1.1.0 pillar_1.2.3 compiler_3.5.0 plyr_1.8.4 bindr_0.1.1
[7] tools_3.5.0 jsonlite_1.5 nlme_3.1-137 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1
[13] rlang_0.2.1 psych_1.8.4 cli_1.0.0 rstudioapi_0.7 yaml_2.1.19 parallel_3.5.0
[19] haven_1.1.1 xml2_1.2.0 httr_1.3.1 hms_0.4.2 grid_3.5.0 tidyselect_0.2.4
[25] glue_1.2.0 R6_2.2.2 readxl_1.1.0 foreign_0.8-70 modelr_0.1.2 reshape2_1.4.3
[31] magrittr_1.5 scales_0.5.0 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] utf8_1.1.4 stringi_1.1.7 lazyeval_0.2.1 munsell_0.4.3 broom_0.4.4 crayon_1.3.4 发布于 2018-06-08 18:17:09
问题是你在评估
min(NA, na.rm=TRUE)
# Inf对于第3行,这将导致
dput(temp$DATE[3])
# structure(Inf, class = "Date")将is.finite添加到mutate中
temp %>%
mutate(DATE_lgl = is.finite(DATE) | is.na(DATE) # Identify dates that are missing/NA)
# A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 TRUE
# 2 B 2010-01-01 TRUE
# 3 C NA FALSE打印NA可能是日期类的打印限制。
as.Date(Inf, origin="1970-01-01")
# NA
dput(as.Date(Inf, origin="1970-01-01"))
# structure(Inf, class = "Date")发布于 2018-06-08 18:19:28
解决方法是将Date列转换为字符,然后计算它是否为NA。
temp %>% mutate(
DATE_lgl = is.na(as.character(DATE))
)
# # A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 FALSE
# 2 B 2010-01-01 FALSE
# 3 C NA TRUE https://stackoverflow.com/questions/50766089
复制相似问题