很难用标题来解释这个问题。但是这里有一个数据框架,您可以看到我有3个流名。我有一个与每个流名相关联的3个唯一值。我希望从value列中找到的相应流的值中减去这些唯一值,然后将其附加到名为error的新列中的数据帧中。
stream n rates means column value
1 Brooks 3 3.0 0.9629152 1 0.42707006
2 Siouxon 3 3.0 0.5831929 1 0.90503736
3 Speelyai 3 3.0 0.6199235 1 0.08554021
4 Brooks 4 7.5 0.9722707 1 1.43338843
5 Siouxon 4 7.5 0.5865031 1 0.50574543
6 Speelyai 4 7.5 0.6118634 1 0.32252396
7 Brooks 5 10.0 0.9637475 1 0.88984211
8 Siouxon 5 10.0 0.5804420 1 0.47501800
9 Speelyai 5 10.0 0.5959238 1 0.15079491
10 Brooks 6 13.0 0.9486575 1 1.32422105
11 Siouxon 6 13.0 0.5846854 1 0.39479684
12 Speelyai 6 13.0 0.5597146 1 0.37005941下面是我希望从value列中减去的“唯一”值
> true.lwd.sp <- 0.583984402 (speelyai)
> true.lwd.sx <- 0.585852702 (souixon)
> true.lwd.br <- 0.944062036 (brooks)谢谢你的帮助。总有一天,我可能知道如何做所有这些简单的任务!
发布于 2015-08-06 16:42:54
我们可以创建一个新的数据集,并将“stream”列与新数据集中的相应列相匹配,获取数字索引以从‘df1 2’中获取相应的“值”,并从'df1‘或原始数据集中减去。
df1$error <- df1$value-df2$value[match(df1$stream, df2$stream)]
df1
# stream n rates means column value error
#1 Brooks 3 3.0 0.9629152 1 0.42707006 -0.51699198
#2 Siouxon 3 3.0 0.5831929 1 0.90503736 0.31918466
#3 Speelyai 3 3.0 0.6199235 1 0.08554021 -0.49844419
#4 Brooks 4 7.5 0.9722707 1 1.43338843 0.48932639
#5 Siouxon 4 7.5 0.5865031 1 0.50574543 -0.08010727
#6 Speelyai 4 7.5 0.6118634 1 0.32252396 -0.26146044
#7 Brooks 5 10.0 0.9637475 1 0.88984211 -0.05421993
#8 Siouxon 5 10.0 0.5804420 1 0.47501800 -0.11083470
#9 Speelyai 5 10.0 0.5959238 1 0.15079491 -0.43318949
#10 Brooks 6 13.0 0.9486575 1 1.32422105 0.38015901
#11 Siouxon 6 13.0 0.5846854 1 0.39479684 -0.19105586
#12 Speelyai 6 13.0 0.5597146 1 0.37005941 -0.21392499数据
df1 <- structure(list(stream = c("Brooks", "Siouxon", "Speelyai",
"Brooks",
"Siouxon", "Speelyai", "Brooks", "Siouxon", "Speelyai", "Brooks",
"Siouxon", "Speelyai"), n = c(3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L,
5L, 6L, 6L, 6L), rates = c(3, 3, 3, 7.5, 7.5, 7.5, 10, 10, 10,
13, 13, 13), means = c(0.9629152, 0.5831929, 0.6199235, 0.9722707,
0.5865031, 0.6118634, 0.9637475, 0.580442, 0.5959238, 0.9486575,
0.5846854, 0.5597146), column = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), value = c(0.42707006, 0.90503736, 0.08554021,
1.43338843, 0.50574543, 0.32252396, 0.88984211, 0.475018, 0.15079491,
1.32422105, 0.39479684, 0.37005941)), .Names = c("stream", "n",
"rates", "means", "column", "value"), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
df2 <- data.frame(stream=c('Brooks', 'Siouxon', 'Speelyai'),
value=c(0.944062036, 0.585852702, 0.583984402), stringsAsFactors=FALSE)发布于 2015-08-06 16:55:10
使用data.table使用@akrun的答案中的df2的另一个选项
library(data.table)
setDT(df1)[stream == df2$stream, error := value - df2$value]
# stream n rates means column value error
#1: Brooks 3 3.0 0.9629152 1 0.42707006 -0.51699198
#2: Siouxon 3 3.0 0.5831929 1 0.90503736 0.31918466
#3: Speelyai 3 3.0 0.6199235 1 0.08554021 -0.49844419
#4: Brooks 4 7.5 0.9722707 1 1.43338843 0.48932639
#5: Siouxon 4 7.5 0.5865031 1 0.50574543 -0.08010727
#6: Speelyai 4 7.5 0.6118634 1 0.32252396 -0.26146044
#7: Brooks 5 10.0 0.9637475 1 0.88984211 -0.05421993
#8: Siouxon 5 10.0 0.5804420 1 0.47501800 -0.11083470
#9: Speelyai 5 10.0 0.5959238 1 0.15079491 -0.43318949
#10: Brooks 6 13.0 0.9486575 1 1.32422105 0.38015901
#11: Siouxon 6 13.0 0.5846854 1 0.39479684 -0.19105586
#12: Speelyai 6 13.0 0.5597146 1 0.37005941 -0.21392499发布于 2015-08-06 17:02:33
Akrun是对的,但是您需要一些额外的代码来加入数据集。检查这个简单的例子(但类似的例子):
library(dplyr)
# your original dataset
dt1 = data.frame(stream = c("A","B","C","A","B","C"),
value = c(5,6,7,8,9,10))
# your dataset with the values for each case
dt2 = data.frame(stream = c("A","B","C"),
truevalue = c(0.58, 0.57, 0.56))
# join datasets and create the error variable
result = dt1 %>% left_join(dt2, by="stream") %>% mutate(error = value - truevalue)
result重要的是确保流的名称在两个数据集中匹配,因此连接将被正确执行。
https://stackoverflow.com/questions/31861246
复制相似问题