我想得到关于每个国家(Iso3)、指标(Gho_id)、性别(性别)、居住类型(地点)的最新可用数据的年份。数据存储在"gho_value“中。我的数据集看起来如下:
# A tibble: 16 x 6
iso3 gho_id sex location year gho_value
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 AFG BP_04 FMLE <NA> 2000 28.6
2 AFG BP_04 FMLE <NA> 2001 28.8
3 AFG BP_04 FMLE <NA> 2002 29
4 AFG BP_04 FMLE <NA> 2003 29.2
5 AFG BP_04 FMLE <NA> 2004 29.4
6 AFG BP_04 FMLE <NA> 2005 29.5
7 AFG BP_04 FMLE <NA> 2006 29.7
8 AFG BP_04 FMLE <NA> 2007 29.8
9 AFG BP_04 FMLE <NA> 2008 29.9
10 AFG BP_04 FMLE <NA> 2009 30.1
11 AFG BP_04 FMLE <NA> 2010 30.2
12 AFG BP_04 FMLE <NA> 2011 30.3
13 AFG BP_04 FMLE <NA> 2012 30.4
14 AFG BP_04 FMLE <NA> 2013 30.5
15 AFG BP_04 FMLE <NA> 2014 NA
16 AFG BP_04 FMLE <NA> 2015 NA我想要这样做:
# A tibble: 16 x 7
iso3 gho_id sex location year gho_value last_year
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 AFG BP_04 FMLE <NA> 2000 28.6 2013
2 AFG BP_04 FMLE <NA> 2001 28.8 2013
3 AFG BP_04 FMLE <NA> 2002 29 2013
4 AFG BP_04 FMLE <NA> 2003 29.2 2013
5 AFG BP_04 FMLE <NA> 2004 29.4 2013
6 AFG BP_04 FMLE <NA> 2005 29.5 2013
7 AFG BP_04 FMLE <NA> 2006 29.7 2013
8 AFG BP_04 FMLE <NA> 2007 29.8 2013
9 AFG BP_04 FMLE <NA> 2008 29.9 2013
10 AFG BP_04 FMLE <NA> 2009 30.1 2013
11 AFG BP_04 FMLE <NA> 2010 30.2 2013
12 AFG BP_04 FMLE <NA> 2011 30.3 2013
13 AFG BP_04 FMLE <NA> 2012 30.4 2013
14 AFG BP_04 FMLE <NA> 2013 30.5 2013
15 AFG BP_04 FMLE <NA> 2014 NA 2013
16 AFG BP_04 FMLE <NA> 2015 NA 2013我试过使用tidyverse包(dpylr),并这样做了:
ungho2 <- left_join(ungho,
ungho %>%
arrange(iso3,gho_id,sex,location,year) %>%
group_by(iso3,gho_id,sex,location) %>%
filter(!(is.na(gho_value))) %>%
mutate(latest_year_gho = last(year)) %>%
ungroup(iso3,gho_id,sex,location))我的问题是,我得到了这个(当我使用left_join时得到的是NA而不是2013年):
# A tibble: 16 x 7
iso3 gho_id sex location year gho_value last_year
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 AFG BP_04 FMLE <NA> 2000 28.6 2013
2 AFG BP_04 FMLE <NA> 2001 28.8 2013
3 AFG BP_04 FMLE <NA> 2002 29 2013
4 AFG BP_04 FMLE <NA> 2003 29.2 2013
5 AFG BP_04 FMLE <NA> 2004 29.4 2013
6 AFG BP_04 FMLE <NA> 2005 29.5 2013
7 AFG BP_04 FMLE <NA> 2006 29.7 2013
8 AFG BP_04 FMLE <NA> 2007 29.8 2013
9 AFG BP_04 FMLE <NA> 2008 29.9 2013
10 AFG BP_04 FMLE <NA> 2009 30.1 2013
11 AFG BP_04 FMLE <NA> 2010 30.2 2013
12 AFG BP_04 FMLE <NA> 2011 30.3 2013
13 AFG BP_04 FMLE <NA> 2012 30.4 2013
14 AFG BP_04 FMLE <NA> 2013 30.5 2013
15 AFG BP_04 FMLE <NA> 2014 NA NA
16 AFG BP_04 FMLE <NA> 2015 NA NA 我能做什么?
非常感谢您的帮助!
发布于 2019-12-05 12:21:36
既然您已经在使用dplyr,下面是一个简单的解决方案:
df <- data.frame(iso3 = rep("AFG", 16),
gho_id = rep("BP04", 16),
sex = rep("FMLE", 16),
location = rep(NA, 16),
year = seq.int(2000, 2015),
gho_value = as.numeric(c(28.6, 28.8, 29, 29.2, 29.4, 29.5, 29.7, 29.8,
29.9, 30.1, 30.2, 30.3, 30.4, 30.5, NA, NA)))
library(dplyr)
df %>%
# Group by the variables you wanted to
group_by(iso3, gho_id, sex, location) %>%
# Get the max year when `gho_value` is not NA for each group defined above
mutate(last_year = max(year[!is.na(gho_value)]))发布于 2019-12-05 10:54:11
R基解决方案:
df <- data.frame(do.call("rbind", lapply(split(df, paste0(df$iso3, df$gho_id, df$sex, df$location)),
function(x){x$last_year <- max(x$year[!(is.na(x$gho_value))]); return(x)})), row.names = NULL)Tidyverse解决方案(当整个系列都是NA时处理案例):
require(dplyr)
df %>%
group_by(iso3, gho_id, sex, location) %>%
mutate(last_year = ifelse(all(is.na(gho_value)), NA, max(year[!is.na(gho_value)]))) %>%
ungroup()数据:
df <- data.frame(iso3 = rep("AFG", 16),
gho_id = rep("BP04", 16),
sex = rep("FMLE", 16),
location = rep(NA, 16),
year = seq.int(2000, 2015),
gho_value = as.numeric(c(28.6, 28.8, 29, 29.2, 29.4, 29.5, 29.7, 29.8,
29.9, 30.1, 30.2, 30.3, 30.4, 30.5, NA, NA)))发布于 2019-12-05 11:03:26
使用base R,我想下面的代码可能会有所帮助(假设df是数据的data.frame )
df$last_year <- df$year[tail(which(rowSums(!is.na(df[-4]))==ncol(df[-1])),1)]这给
> df
iso3 gho_id sex location year gho_value last_year
1 AFG BP_04 FMLE <NA> 2000 28.6 2013
2 AFG BP_04 FMLE <NA> 2001 28.8 2013
3 AFG BP_04 FMLE <NA> 2002 29.0 2013
4 AFG BP_04 FMLE <NA> 2003 29.2 2013
5 AFG BP_04 FMLE <NA> 2004 29.4 2013
6 AFG BP_04 FMLE <NA> 2005 29.5 2013
7 AFG BP_04 FMLE <NA> 2006 29.7 2013
8 AFG BP_04 FMLE <NA> 2007 29.8 2013
9 AFG BP_04 FMLE <NA> 2008 29.9 2013
10 AFG BP_04 FMLE <NA> 2009 30.1 2013
11 AFG BP_04 FMLE <NA> 2010 30.2 2013
12 AFG BP_04 FMLE <NA> 2011 30.3 2013
13 AFG BP_04 FMLE <NA> 2012 30.4 2013
14 AFG BP_04 FMLE <NA> 2013 30.5 2013
15 AFG BP_04 FMLE <NA> 2014 NA 2013
16 AFG BP_04 FMLE <NA> 2015 NA 2013数据
df <- structure(list(iso3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AFG", class = "factor"),
gho_id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "BP_04", class = "factor"),
sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "FMLE", class = "factor"),
location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "<NA>", class = "factor"),
year = 2000:2015, gho_value = c(28.6, 28.8, 29, 29.2, 29.4,
29.5, 29.7, 29.8, 29.9, 30.1, 30.2, 30.3, 30.4, 30.5, NA,
NA)), row.names = c(NA, -16L), class = "data.frame")https://stackoverflow.com/questions/59192876
复制相似问题