我有张这样的桌子,
> head(dt2)
Weight Height Fitted interval limit value
1 65.6 174.0 71.91200 pred lwr 53.73165
2 80.7 193.5 91.63237 pred lwr 73.33198
3 72.6 186.5 84.55326 pred lwr 66.31751
4 78.8 187.2 85.26117 pred lwr 67.02004
5 74.8 181.5 79.49675 pred lwr 61.29244
6 86.4 184.0 82.02501 pred lwr 63.80652我想让它像这样,
> head(reshape2::dcast(dt2,
Weight + Height + Fitted + interval ~ limit,
fun.aggregate = mean))
Weight Height Fitted interval lwr upr
1 42.0 153.4 51.07920 conf 49.15463 53.00376
2 42.0 153.4 51.07920 pred 32.82122 69.33717
3 43.2 160.0 57.75378 conf 56.35240 59.15516
4 43.2 160.0 57.75378 pred 39.54352 75.96404
5 44.8 149.5 47.13512 conf 44.87642 49.39382
6 44.8 149.5 47.13512 pred 28.83891 65.43133但是使用tidyr::spread,我如何才能做到呢?
我在用,
> tidyr::spread(dt2, limit, value)但一旦发现错误,
Error: Duplicate identifiers for rows (1052, 1056), (238, 242), (1209, 1218), (395, 404), (839, 1170), (25, 356), (1173, 1203, 1215), (359, 389, 401), (1001, 1200), (187, 386), (906, 907), (92, 93), (930, 1144), (116, 330), (958, 1171), (144, 357), (902, 1018), (88, 204), (960, 1008), (146, 194), (1459, 1463), (645, 649), (1616, 1625), (802, 811), (1246, 1577), (432, 763), (1580, 1610, 1622), (766, 796, 808), (1408, 1607), (594, 793), (1313, 1314), (499, 500), (1337, 1551), (523, 737), (1365, 1578), (551, 764), (1309, 1425), (495, 611), (1367, 1415), (553, 601)随机10行:
> dt[sample(nrow(dt), 10), ]
Weight Height Fitted interval limit value
1253 52.2 162.5 60.28203 conf upr 61.51087
426 49.1 158.8 56.54022 pred upr 74.75756
1117 78.4 184.5 82.53066 conf lwr 80.98778
1171 85.9 166.4 64.22611 conf lwr 63.21254
948 61.4 177.8 75.75494 conf lwr 74.66393
384 90.9 172.7 70.59731 pred lwr 52.41828
289 75.9 172.7 70.59731 pred lwr 52.41828
3 44.8 149.5 47.13512 pred lwr 28.83891
774 87.3 182.9 80.91258 pred upr 99.12445
772 86.4 175.3 73.22669 pred upr 91.40919发布于 2016-02-05 16:04:41
假设您从如下所示的数据开始:
mydf
# Weight Height Fitted interval limit value
# 1 42 153.4 51.0792 conf lwr 49.15463
# 2 42 153.4 51.0792 pred lwr 32.82122
# 3 42 153.4 51.0792 conf upr 53.00376
# 4 42 153.4 51.0792 pred upr 69.33717
# 5 42 153.4 51.0792 conf lwr 60.00000
# 6 42 153.4 51.0792 pred lwr 90.00000注意分组列(1到5)第5行和第6行中的重复。这本质上就是"tidyr“告诉你的。第一行和第五行是重复的,第二和第六行是重复的。
tidyr::spread(mydf, limit, value)
# Error: Duplicate identifiers for rows (1, 5), (2, 6)正如@Jaap所建议的,解决方案是首先“总结”数据。由于"tidyr“仅用于重塑数据(不像"reshape2",后者聚合和重新塑造),因此在更改数据表单之前,需要使用"dplyr”执行聚合。在这里,我为"value“列使用了summarise。
如果在summarise步骤停止执行,您会发现原始的6行数据集“缩小”到4行。现在,spread将如预期的那样工作。
mydf %>%
group_by(Weight, Height, Fitted, interval, limit) %>%
summarise(value = mean(value)) %>%
spread(limit, value)
# Source: local data frame [2 x 6]
#
# Weight Height Fitted interval lwr upr
# (dbl) (dbl) (dbl) (chr) (dbl) (dbl)
# 1 42 153.4 51.0792 conf 54.57731 53.00376
# 2 42 153.4 51.0792 pred 61.41061 69.33717这与dcast的预期输出与fun.aggregate = mean相匹配。
reshape2::dcast(mydf, Weight + Height + Fitted + interval ~ limit, fun.aggregate = mean)
# Weight Height Fitted interval lwr upr
# 1 42 153.4 51.0792 conf 54.57731 53.00376
# 2 42 153.4 51.0792 pred 61.41061 69.33717样本数据:
mydf <- structure(list(Weight = c(42, 42, 42, 42, 42, 42), Height = c(153.4,
153.4, 153.4, 153.4, 153.4, 153.4), Fitted = c(51.0792, 51.0792,
51.0792, 51.0792, 51.0792, 51.0792), interval = c("conf", "pred",
"conf", "pred", "conf", "pred"), limit = structure(c(1L, 1L,
2L, 2L, 1L, 1L), .Label = c("lwr", "upr"), class = "factor"),
value = c(49.15463, 32.82122, 53.00376, 69.33717, 60,
90)), .Names = c("Weight", "Height", "Fitted", "interval",
"limit", "value"), row.names = c(NA, 6L), class = "data.frame") 发布于 2016-02-05 16:42:30
下面是data.table替代dplyr的方法。使用阿南达答案中的mydf。
library(data.table)
library(magrittr)
library(tidyr)
DT <- data.table(mydf)首先,您可以使用by来计算每个极限的平均值。
DT[, .(lwr = mean(value[limit == "lwr"]),
upr = mean(value[limit == "upr"])),
by = .(Weight, Height, Fitted, interval)]如果这个limit == ...看起来太难编码了,那么您可以首先聚合成一个长格式,然后是spread。这是因为一旦聚合,就没有重复。
DT[, .(value = mean(value)), by = .(Weight, Height, Fitted, interval, limit)] %>%
spread(key = "limit", value = "value")两个人都了解你
# Weight Height Fitted interval lwr upr
#1: 42 153.4 51.0792 conf 54.57731 53.00376
#2: 42 153.4 51.0792 pred 61.41061 69.33717https://stackoverflow.com/questions/35225052
复制相似问题