使用tapply和sapply,我尝试根据我给tapply的多个(两个)索引对计数数进行求和。问题是返回的矩阵失去了我给tapply的列名。最后,我使用data.frame ()将矩阵转换为一个,用于将其输入到ggplot中,并且必须以更手动的方式添加变量名,但我希望通过这两个apply()函数将它们保留下来。当我只使用tapply()中的索引时,度量/变量名称就会保留下来,因此我不知道为什么两个索引会丢失它们。
Fc_desc. <- rep(c(rep("Local",10),rep("Collector",10),rep("Arterial",10)),2)
Year. <- c(rep(seq(2000,2008,2),12))
df.. <- data.frame(Fc_desc = Fc_desc., Year = Year., Tot_ped_fatal_cnt = sample(length(Year.)),Tot_ped_inj_lvl_a_cnt = sample(length(Year.)))
#Define metrics(columns) of interest
Metrics. <- c("Tot_ped_fatal_cnt", "Tot_ped_inj_lvl_a_cnt")
#Summarize into long data frame
Ped_FcSv.. <- melt(sapply(Metrics., function(x){tapply(df..[,x],list(df..$Year, df..$Fc_desc), sum,na.rm=T)}),varnames = c("Fc_desc","Year","Injury_Severity"), value.name = "Count")发布于 2018-08-01 18:03:12
我最初的解决方案是使用循环和列表“
Metrics. <- c("Tot_ped_fatal_cnt", "Tot_ped_inj_lvl_a_cnt")
TempList_ <- list()
for(metric in Metrics.){
TempList_[[metric]] <- tapply(df..[,metric],list(df..$Year, df..$Fc_desc),
sum)
}
TempList_YrSv <- melt(TempList_, varnames = c("Year","Fc_desc"), value.name =
"Count")
colnames(TempList_YrSv )[3] <- "Injury_Severity"它使用6行代码,在我的717,000行实际数据上花费0.46秒
我修改并应用了Aosmith解决方案:
Cols. <- c(Metrics., "Year","Fc_desc")
#Transpose data to long form
df_long <- melt(df..[,Cols.], measure.vars = Metrics., variable.name = c("Injury_Severity"), value.name = "Count")
#Apply aggregate() to sum Count on 3 indices
Ped_YrSv.. <- aggregate(Count ~ Fc_desc + Year + Injury_Severity, data = df_long, FUN = sum,na.rm=T)这个解决方案需要3.9秒,但只有3行。分裂头发,我意识到,但我试图变得更优雅,摆脱列表和循环,所以这是有帮助的。我想我可以对此感到高兴。谢谢大家。
https://stackoverflow.com/questions/51618742
复制相似问题