我的数据
Chemical date concentration limit
A 01-01-2016 0.2 0.01
A 01-02-2016 0.2 0.01
A 01-01-2017 0.005 0.01
A 01-02-2017 0.2 0.01
B 01-01-2016 0.3 0.1
B 01-02-2016 0.05 0.1
B 01-01-2017 0.2 0.1
B 01-02-2017 0.2 0.1
C 01-01-2016 1.2 1
C 01-02-2016 0.8 1
C 01-01-2017 0.9 1
C 01-02-2017 0.9 1我想显示每种化学物质的百分比,当它超过每年的极限(注意,每个限制是不同的)。所以我想要这样的东西
Year A B C
2016 100% 50% 50%
2017 50% 100% 0我已经有了计算每种化学物质每年超过的次数的代码,但是当计算百分比时,我就弄错了。
这是我必须数的时间。
library(tidyverse)
counts<- data %>%
group_by(Chemical, grp = format(date, format = '%Y')) %>%
mutate(exceed = concentration >= limit) %>% # TRUE/FALSE
summarise(tot_exceed = sum(exceed)) %>% # count each T/F
spread(Chemical, tot_exceed, fill = 0)所以我明白了
Year A B C
2016 2 1 1
2017 1 2 0至于百分比,我试过这个。
percentage_exceed<- data %>%
group_by(Chemical, grp = format(date, format = '%Y')) %>%
mutate(exceed = concentration >= limit, countconc = length(concentration))
%>%
summarise(percent = (sum(exceed)/countconc)*100) %>%
spread(Chemical, percent, fill = 0)但我没有得到我想要的结果。你能帮帮我吗?
发布于 2018-12-17 11:02:08
dt = read.table(text = "
Chemical date concentration limit
A 01-01-2016 0.2 0.01
A 01-02-2016 0.2 0.01
A 01-01-2017 0.005 0.01
A 01-02-2017 0.2 0.01
B 01-01-2016 0.3 0.1
B 01-02-2016 0.05 0.1
B 01-01-2017 0.2 0.1
B 01-02-2017 0.2 0.1
C 01-01-2016 1.2 1
C 01-02-2016 0.8 1
C 01-01-2017 0.9 1
C 01-02-2017 0.9 1
", header=T)
library(tidyverse)
library(lubridate)
dt %>%
mutate(year = year(dmy(date))) %>%
group_by(year, Chemical) %>%
summarise(Total = n(),
Num_exceed = sum(concentration >= limit)) %>%
ungroup() %>%
mutate(Prc = paste0(Num_exceed / Total * 100,"%")) %>%
select(year, Chemical, Prc) %>%
spread(Chemical, Prc)
# # A tibble: 2 x 4
# year A B C
# <dbl> <chr> <chr> <chr>
# 1 2016 100% 50% 50%
# 2 2017 50% 100% 0% 发布于 2018-12-17 11:01:05
用tidyverse
library(tidyverse)
library(lubridate)
data %>%
mutate(yr=mdy(date) %>% year) %>%
group_by(Chemical,yr) %>%
mutate(exceed = ifelse(concentration>=limit,1,0 )) %>%
summarise(tot_exceed =sum(exceed)) %>%
group_by(Chemical) %>%
mutate(proc=tot_exceed/max(tot_exceed)*100) %>%
select(-tot_exceed) %>%
spread(Chemical,proc)
# A tibble: 2 x 4
yr A B C
<dbl> <dbl> <dbl> <dbl>
1 2016 100 50 100
2 2017 50 100 0发布于 2018-12-17 11:04:12
您的方法非常好,只需将sum替换为mean并乘以100:
data %>% group_by(Chemical, grp = format(date, format = '%Y')) %>%
mutate(exceed = concentration >= limit) %>%
summarise(tot_exceed = mean(exceed) * 100) %>%
spread(Chemical, tot_exceed, fill = 0)
# A tibble: 2 x 4
# grp A B C
# <chr> <dbl> <dbl> <dbl>
# 1 2016 100 50 50
# 2 2017 50 100 0在你的尝试中
summarise(percent = (sum(exceed)/countconc) * 100)几乎是这样的:错误在于countconc是整个列,而不是单个值(这是总结所需的)。因此,因为它是每个组中的常量列,所以您可以写,例如,
summarise(percent = (sum(exceed)/countconc[1]) * 100)但考虑到之前的情况,
mutate(exceed = concentration >= limit, countconc = length(concentration)) 它最终只是一个手段,所以我们回到代码在我的答案开始。
还请注意,使用lubridate,您可以将第一行写成
data %>% group_by(Chemical, Year = year(date)) %>% 非常简洁的东西,但可能不是你想要的格式
data %>% group_by(Chemical, Year = year(date)) %>%
summarise(Percentage = mean(concentration > limit) * 100)
# A tibble: 6 x 3
# Groups: Chemical [?]
# Chemical Year Percentage
# <fct> <dbl> <dbl>
# 1 A 2016 100
# 2 A 2017 50
# 3 B 2016 50
# 4 B 2017 100
# 5 C 2016 50
# 6 C 2017 0https://stackoverflow.com/questions/53813647
复制相似问题