我有一个如下所示的dataframe combined_data (这只是一个例子):
Year state_name VoS_thousUSD industry
2008 Alabama 100 Shipping
2009 Alabama 100 Shipping
2008 Alabama 200 Shipping
2010 Alabama 100 Shipping
2010 Alabama 50 Shipping
2010 Alabama 100 Shipping
2008 Alabama 100 Shipping有多个Year、state_name和industry变量,有关联的VoS_thousUSD值,以及我不再需要的其他列。
我在试着制造这个
Year state_name VoS_thousUSD industry
2008 Alabama 400 Shipping
2009 Alabama 100 Shipping
2010 Alabama 250 Shipping其中,数据文件按Year、state_name和industry分组,而VoS_thousand是这些组的和。
到目前为止我已经
combined_data %>%
group_by(Year, state_name, GCAM_industry) %>%
summarise() -> VoS_thousUSD_state_ind但我不知道如何/在哪里添加VoS_thousUSD的总和。希望使用dplyr管道。
发布于 2020-06-08 18:12:23
我们可以用
aggregate( VoS_thousUSD~ ., combined_data, FUN = sum)或使用dplyr
library(dplyr)
combined_data %>%
group_by(Year, state_name, industry) %>%
summarise(VoS_thousUSD = sum(VoS_thousUSD))
# A tibble: 3 x 4
# Groups: Year, state_name [3]
# Year state_name industry VoS_thousUSD
# <int> <chr> <chr> <int>
#1 2008 Alabama Shipping 400
#2 2009 Alabama Shipping 100
#3 2010 Alabama Shipping 250数据
combined_data <- structure(list(Year = c(2008L, 2009L, 2008L, 2010L, 2010L, 2010L,
2008L), state_name = c("Alabama", "Alabama", "Alabama", "Alabama",
"Alabama", "Alabama", "Alabama"), VoS_thousUSD = c(100L, 100L,
200L, 100L, 50L, 100L, 100L), industry = c("Shipping", "Shipping",
"Shipping", "Shipping", "Shipping", "Shipping", "Shipping")),
class = "data.frame", row.names = c(NA,
-7L))https://stackoverflow.com/questions/62268561
复制相似问题