在R中,我有df1,df2和df3,它们代表闪电风暴。每个df有两列,“城市”和“受伤”。
df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))我希望将城市列上的外部联接类型上的所有3个数据帧合并,以便所有城市都出现在合并的数据帧中,并且伤害计数将相加如下:
combined.df
city df1.freq df2.freq df3.freq
atlanta 5 0 7
new york 8 3 0
chicago 0 2 0
los angeles 0 0 1发布于 2013-12-04 04:30:29
使用基本R reshape函数替代@flodel的版本:
dat <- list(df1,df2,df3)
intm <- data.frame(do.call(rbind,dat),val=rep(seq_along(dat),sapply(dat,nrow)))
reshape(intm, idvar="city", timevar="val", direction="wide")
# city injuries.1 injuries.2 injuries.3
#1 atlanta 5 NA 7
#2 new york 8 3 NA
#3 chicago NA 2 NA
#5 los angeles NA NA 1发布于 2013-12-04 04:18:28
这对于任意数量的data.frames都是通用的:
library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
# city injuries.x injuries.y injuries
# 1 atlanta 5 NA 7
# 2 new york 8 3 NA
# 3 chicago NA 2 NA
# 4 los angeles NA NA 1然而,多个合并可能是缓慢的。另一种方法是将data.frames堆成一个长的:
df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
name = c("df1", "df2", "df3")))
# city injuries name
# 1 atlanta 5 df1
# 2 new york 8 df1
# 3 chicago 2 df2
# 4 new york 3 df2
# 5 los angeles 1 df3
# 6 atlanta 7 df3然后使用xtabs重新塑造数据,例如:
xtabs(injuries ~ city + name, df.long)
# name
# city df1 df2 df3
# atlanta 5 0 7
# new york 8 3 0
# chicago 0 2 0
# los angeles 0 0 1(对于最后一步,reshape函数可能也很有用,但我对它并不十分熟悉。)
发布于 2013-12-04 03:50:10
merge是你的朋友。键入?merge以获得更多详细信息。
> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
city injuries.x injuries.y injuries
1 atlanta 5 NA 7
2 chicago NA 2 NA
3 los angeles NA NA 1
4 new york 8 3 NA编辑.虽然我喜欢@flodel的解决方案,但这里有一个更简单的解决方案,它可能更容易理解:
Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))https://stackoverflow.com/questions/20366422
复制相似问题