我有一个包含变量的表: OrderPostingYear、OrderPostingMonth、ProductsFamily、Sales、QTY。现在,我想要创建一个dataframe来显示一个表,其中行作为每个ProductFamily (分组),列作为每个OrderPosting年份&OrderPostingMonth(分组),值是销售之和。我该怎么做?
>ProductTable
OrderPostingYear OrderPostingMonth ProductsFamily Sales QTY
2008 1 R1 5234 1
2008 1 R2 223 2
2009 1 R3 34 1
2008 2 R1 1634 3
2010 4 R3 224 1 结果应该是:
>PFTable
2008-1 2008-2 2009-1 2010-4
R1 5234 1634 0 0
R2 223 0 0 0
R3 0 0 34 224我想在dplyr中使用group_by和summarise_each,而不是成功。需要帮助求你了。谢谢!
PFTable<-data.frame(ProductTable%>%
group_by(ProductFamily) %>% summarise_each(.,funs(sum(SalesVolume,na.rm=TRUE)),group_by(OrderPostingYear,OrderPostingMonth)))发布于 2015-10-01 19:13:57
我们可以使用acast将'long‘转换为'wide’格式。
library(reshape2)
acast(ProductTable, ProductsFamily~OrderPostingYear+OrderPostingMonth,
value.var='Sales', fill=0)
# 2008_1 2008_2 2009_1 2010_4
#R1 5234 1634 0 0
#R2 223 0 0 0
#R3 0 0 34 224如果我们想使用dplyr/tidyr,那么unite的'OrderPostingYear‘和'OrderPostingMonth',删除'QTY’和spread,从'long‘重塑为'wide’。
library(dplyr)
library(tidyr)
unite(df1, OrderMonth, OrderPostingYear, OrderPostingMonth, sep="-") %>%
select(-QTY) %>%
spread(OrderMonth, Sales)https://stackoverflow.com/questions/32895100
复制相似问题