我有一组数据,其中包含了关于客户的信息以及他们花费了多少钱,每个客户只出现一次:
customer<-c("Andy","Bobby","Oscar","Oliver","Jane","Cathy","Emma","Chris")
age<-c(25,34,20,35,23,35,34,22)
gender<-c("male","male","male","male","female","female","female","female")
moneyspent<-c(100,100,200,200,400,400,500,200)
data<-data.frame(customer=customer,age=age,gender=gender,moneyspent=moneyspent)如果我想计算男性和女性客户的平均消费金额,我可以使用tapply:
tapply(moneyspent,gender,mean)这意味着:
female male
375 150不过,我现在想找出按性别及年龄组别计算的平均开支额,而我的目标是:
Male Age 20-30 Female Age 20-30 Male Age 30-40 Female Age 30-40
150 300 150 450我如何修改代码,使其给出这些结果?
谢谢
发布于 2014-12-10 18:03:33
您可能需要使用cut
mat <- tapply(moneyspent, list(gender, age=cut(age, breaks=c(20,30,40),
include.lowest=TRUE)), mean)
nm1 <- outer(rownames(mat), colnames(mat), FUN=paste)
setNames(c(mat), nm1)
#female [20,30] male [20,30] female (30,40] male (30,40]
# 300 150 450 150 其他选择包括
library(dplyr)
data %>%
group_by(gender, age=cut(age, breaks=c(20,30,40),
include.lowest=TRUE)) %>%
summarise(moneyspent=mean(moneyspent))或
library(data.table)
setDT(data)[, list(moneyspent=mean(moneyspent)),
by=list(gender, age=cut(age, breaks= c(20,30,40), include.lowest=TRUE))]发布于 2019-02-18 19:18:54
使用plyr包
library(plyr)
ddply(data,.(gender, age=cut(age, breaks=c(20,30,40),
include.lowest=TRUE)), summarize, moneyspent=mean(moneyspent))也会给出同样的结果。
注释: Summarize和Summari的e执行相同的功能。
警告: loading plyr掩盖了dplyr的总结!在再次使用像detach plyr这样的函数之前,您需要使用Summarize。
https://stackoverflow.com/questions/27407726
复制相似问题