假设我有一个数据集,如下所示
Country, Sold, Model
China, 100, Toyota
China, 200, Honda
China, 200, Suzuki
USA, 100, Tesla
USA, 50, Shevi
USA, 50, Lambo我想要得到如下的输出
China, Toyota[20%]; Honda[40%]; Suzuki[40%]
USA, Tesla[50%]; Shevi[25%]; Lambo[25%]因此,数据按国家/地区分组,然后对于每个车型,其销售份额显示在车型名称旁边。是否可以通过使用R来实现?
发布于 2019-02-14 03:42:28
编辑:我很抱歉这是超级老生常谈,但这是我能做的最好的。我相信有更好的方法,希望有人能尽快告诉你更好的方法。
library(dplyr)
df <- tribble(
~Country, ~Sold, ~Model,
"China", 100, "Toyota",
"China", 200, "Honda",
"China", 200, "Suzuki",
"USA", 100, "Tesla",
"USA", 50, "Shevi",
"USA", 50, "Lambo")
)
model_by_country <- df %>%
group_by(Country, Model) %>%
summarize(Total_Sold = sum(Sold)) %>%
group_by(Country) %>%
mutate(Percent_Sold = Total_Sold / sum(Total_Sold)) %>%
select(-Total_Sold) %>%
ungroup()
model_by_country
## Country Model Percent_Sold
## <chr> <chr> <dbl>
## 1 China Honda 0.4
## 2 China Suzuki 0.4
## 3 China Toyota 0.2
## 4 USA Lambo 0.25
## 5 USA Shevi 0.25
## 6 USA Tesla 0.5
# EDITS begin here
format_country_per <- function(country) {
model_by_country %>%
filter(Country == country) %>%
mutate(Model_Percent_Sold = paste0(Model, "[", 100 * Percent_Sold, "%]")) %>%
.$Model_Percent_Sold %>%
paste(., collapse = "; ") %>%
paste(country, ., sep = ", ")
}
format_country_per("China")
## [1] "China, Honda[40%]; Suzuki[40%]; Toyota[20%]"
format_country_per("USA")
## [1] "USA, Lambo[25%]; Shevi[25%]; Tesla[50%]"发布于 2019-02-14 04:05:25
您似乎想要按国家/地区和型号列出的表格的行百分比。这给出了一个包含这两个因素的所有可能组合的表格:
100*prop.table( # multiply proportions to get percentages
with(dat, tapply(Sold, list(Country,Model), sum, default=0)), #apply sum in categories
1) # the "1" indicates these should be row proportions
Honda Lambo Shevi Suzuki Tesla Toyota
China 40 0 0 40 0 20
USA 0 25 25 0 50 0https://stackoverflow.com/questions/54678021
复制相似问题