我有一个数据集,其中有以下列: and、flavorid和unitSoled。
Flavor Flavorid unitsoled
beans 350 6
creamy 460 2
.
.
.我想找出前十种口味,然后计算每种口味的市场份额。我的逻辑是每种口味的市场份额=特定口味的单价除以总单价。
我如何实现这一点。对于产量,我只需要两个颜色的弗拉沃德和相应的市场份额。我需要先在某个表中保存前十种口味吗?
发布于 2015-01-24 23:02:58
一种方法是使用dplyr包:
一个示例数据集:
flavor <- rep(letters[1:15],each=5)
flavorid <- rep(1:15,each=5)
unitsold <- 1:75
df <- data.frame(flavor,flavorid,unitsold)
> df
flavor flavorid unitsold
1 a 1 1
2 a 1 2
3 a 1 3
4 a 1 4
5 a 1 5
6 b 2 6
7 b 2 7
8 b 2 8
9 b 2 9
...
...解决方案:
library(dplyr)
df %>%
select(flavorid,unitsold) %>% #select the columns you want
group_by(flavorid) %>% #group by flavorid
summarise(total=sum(unitsold)) %>% #sum the total units sold per id
mutate(marketshare=total/sum(total)) %>% #calculate the market share per id
arrange( desc(marketshare)) %>% #order by marketshare descending
head(10) #pick the 10 first
#and you can add another select(flavorid,marketshare) if you only want those two输出:
Source: local data frame [10 x 3]
flavorid total marketshare
1 15 365 0.12807018
2 14 340 0.11929825
3 13 315 0.11052632
4 12 290 0.10175439
5 11 265 0.09298246
6 10 240 0.08421053
7 9 215 0.07543860
8 8 190 0.06666667
9 7 165 0.05789474
10 6 140 0.04912281https://stackoverflow.com/questions/28126642
复制相似问题