我有一个数据框架,其中包含了关于世界各国事故的信息。数据框架的结构类似于以下示例:
a <- data.frame(country = c("AAA" , "BBB" , "CCC") ,
incident = rep("disaster" , times = 3) ,
'start year' = c(1990 , 1995 , 2011) ,
'end year' = c(1993 , 1995 , 2012))给予a
country incident start.year end.year
1 AAA disaster 1990 1993
2 BBB disaster 1995 1995
3 CCC disaster 2011 2012我想对此进行转换,以便每一行都包含每一年的事件,而不是仅包含间隔。理想情况下,它看起来应该是这样的:
country incident year
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012是否有一个最优的代码可以将其转换为包含起始年和结束年?
发布于 2022-08-31 18:12:35
我们可以使用map2作为list获取两列之间的序列,然后将list列作为unnest
library(dplyr)
library(purrr)
library(tidyr)
a %>%
transmute(country, incident, year = map2(start.year, end.year, `:`)) %>%
unnest(year)-output
# A tibble: 7 × 3
country incident year
<chr> <chr> <int>
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012如果“country”列为unique,则可以使用组by/汇总,也可以使用rowwise展开
a %>%
group_by(country) %>%
summarise(incident, year = start.year:end.year, .groups = 'drop')
# A tibble: 7 × 3
country incident year
<chr> <chr> <int>
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012或者使用uncount扩展数据
a %>%
uncount(end.year - start.year + 1) %>%
group_by(country) %>%
mutate(year = start.year + row_number() - 1, .keep = 'unused',
end.year = NULL) %>%
ungroup发布于 2022-08-31 18:36:54
以下是使用pivot_longer、fill和complete的另一种选择
library(dplyr)
library(tidyr)
a %>%
pivot_longer(cols = ends_with("year"),
values_to = "year") %>%
group_by(country) %>%
complete(year = full_seq(min(year):max(year), 1)) %>%
fill(c(incident)) %>%
select(-name) country year incident
<chr> <dbl> <chr>
1 AAA 1990 disaster
2 AAA 1991 disaster
3 AAA 1992 disaster
4 AAA 1993 disaster
5 BBB 1995 disaster
6 BBB 1995 disaster
7 CCC 2011 disaster
8 CCC 2012 disasterhttps://stackoverflow.com/questions/73560132
复制相似问题