我想用350个啤酒酿酒厂在ggplot2中创建一个线条图。我想统计一下每年有多少活跃的啤酒厂。我只有啤酒厂活动的开始和结束日期。tidyverse回答优先。
begin_datum_jaar是酿酒厂成立的一年。eind_datum_jaar是啤酒厂结束的年份。
示例数据框架:
library(tidyverse)
# A tibble: 4 x 3
brouwerijnaam begin_datum_jaar eind_datum_jaar
<chr> <int> <int>
1 Brand 1340 2019
2 Heineken 1592 2019
3 Grolsche 1615 2019
4 Bavaria 1719 2010dput:
df <- structure(list(brouwerijnaam = c("Brand", "Heineken", "Grolsche",
"Bavaria"), begin_datum_jaar = c(1340L, 1592L, 1615L, 1719L),
eind_datum_jaar = c(2019L, 2019L, 2019L, 2010L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))期望的输出,其中etc.是占位符。
# A tibble: 13 x 2
year n
<chr> <dbl>
1 1340 1
2 1341 1
3 1342 1
4 1343 1
5 etc. 1
6 1592 2
7 1593 2
8 etc. 2
9 1625 3
10 1626 3
11 1627 3
12 1628 3
13 etc. 3发布于 2019-03-11 11:07:41
我们可以使用map2获取每个对应元素的从开始到结束日期的序列,使用unnest的list列来展开,使用count获取“年份”的频率。
library(tidyverse)
df %>%
transmute(year = map2(begin_datum_jaar, eind_datum_jaar, `:`)) %>%
unnest %>%
count(year)
# A tibble: 680 x 2
# year n
# <int> <int>
# 1 1340 1
# 2 1341 1
# 3 1342 1
# 4 1343 1
# 5 1344 1
# 6 1345 1
# 7 1346 1
# 8 1347 1
# 9 1348 1
#10 1349 1
# … with 670 more rows或者使用来自base R的base R
table(unlist(do.call(Map, c(f = `:`, df[-1]))))发布于 2019-03-11 11:02:27
可以尝试:
library(tidyverse)
df %>%
rowwise %>%
do(data.frame(brouwerij = .$brouwerijnaam,
Year = seq(.$begin_datum_jaar, .$eind_datum_jaar, by = 1))) %>%
count(Year, name = "Active breweries") %>%
ggplot(aes(x = Year, y = `Active breweries`)) +
geom_line() +
theme_minimal()或者为第一部分尝试expand:
df %>%
group_by(brouwerijnaam) %>%
expand(Year = begin_datum_jaar:eind_datum_jaar) %>%
ungroup() %>%
count(Year, name = "Active breweries") 但是,请注意,rowwise、do或expand部件是资源密集型的,可能需要很长时间。如果发生这种情况,我宁愿使用data.table展开数据框架,然后继续,如下所示:
library(data.table)
library(tidyverse)
df <- setDT(df)[, .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)), by = brouwerijnaam]
df %>%
count(Year, name = "Active breweries") %>%
ggplot(aes(x = Year, y = `Active breweries`)) +
geom_line() +
theme_minimal()上面的内容直接给出了情节。如果您想先将其保存到数据帧中(然后执行ggplot2操作),这是主要部分(我使用data.table进行扩展,因为在我的经验中它要快得多):
library(data.table)
library(tidyverse)
df <- setDT(df)[
, .(Year = seq(begin_datum_jaar, eind_datum_jaar, by = 1)),
by = brouwerijnaam] %>%
count(Year, name = "Active breweries")输出:
# A tibble: 680 x 2
Year `Active breweries`
<dbl> <int>
1 1340 1
2 1341 1
3 1342 1
4 1343 1
5 1344 1
6 1345 1
7 1346 1
8 1347 1
9 1348 1
10 1349 1
# ... with 670 more rows发布于 2019-03-11 11:01:07
df1 <- data.frame(year=1000:2020) # Enter range for years of choice
df1 %>%
rowwise()%>%
mutate(cnt=nrow(df %>%
filter(begin_datum_jaar<year & eind_datum_jaar>year)
)
)https://stackoverflow.com/questions/55100098
复制相似问题