我正在尝试根据我的一列中的条件在data.table中执行滞后操作。
Variable Range Avg.Concentration
1 m21.021 (H[1]3O[16]1) blank 0.000000
2 m21.021 (H[1]3O[16]1) dms1 0.000000
3 m21.021 (H[1]3O[16]1) dms2 0.000000
4 m21.021 (H[1]3O[16]1) dms3 0.000000
5 m21.021 (H[1]3O[16]1) dms4 0.000000
6 m33.993 (O[16]2H[1]1) blank 0.103741
7 m33.993 (O[16]2H[1]1) dms1 0.107979
8 m33.993 (O[16]2H[1]1) dms2 0.108123
9 m33.993 (O[16]2H[1]1) dms3 0.104491
10 m33.993 (O[16]2H[1]1) dms4 0.104051
11 m39.032 (H[1]5O[16]2) blank 1.391382
12 m39.032 (H[1]5O[16]2) dms1 1.025274
13 m39.032 (H[1]5O[16]2) dms2 1.007375
14 m39.032 (H[1]5O[16]2) dms3 1.068320
15 m39.032 (H[1]5O[16]2) dms4 1.154035我需要做的是针对每一组变量(m21、m33、m39等)。我需要创建一个名为“计算浓度”的新列,其中列中的值等于从dms1、dms2 dms3等的平均浓度减去的空白的平均浓度。
例如,对于变量为m33的第7-10行,在这个新的计算浓度列中,我需要:
7: 0.107979 - 0.103741 (m33的dms1 1-空白值)
8: 0.108123 - 0.103741 (m33的dms2 2-空白值)
9: 0.104491 - 0.103741 ( dms3 -m33空白值)
10: 0.104051 - 0.103741 ( dms4 -blank for m33)
对于每个唯一的变量,空白的值都会发生变化,这就是为什么我不能将空白设置为常量值的原因。
当我只有两个范围选项(dms和空白)时,我只能使用延迟来减去Avg,从它上面的行中减去浓度,如下所示:
Variable Range Avg.Concentration
1 m21.021 (H[1]3O[16]1) blank 0.000000
2 m21.021 (H[1]3O[16]1) dms 0.000000
3 m33.993 (O[16]2H[1]1) blank 0.103741
4 m33.993 (O[16]2H[1]1) dms 0.107979
5 m39.032 (H[1]5O[16]2) blank 1.391382
6 m39.032 (H[1]5O[16]2) dms 1.025274#calculate concentration (average - blank)
df[, Cal.Concentration := Avg.Concentration - shift(Avg.Concentration, fill = first(Avg.Concentration))]
理想情况下,我想要的是某种函数,比如Range == "dms1“从上面的行中减去Avg.Concentration的值,如果Range == 'dms2‘减去Avg的值。以上两行的浓度,如果范围为== 'dms3‘减去Avg的值。以上三行的浓度,如果范围为== 'dms4‘,则减去Avg的值。浓度在4行以上。
我不必使用data.table,我愿意接受建议。我已经找到了一个暂时的解决方案,但它确实很费时,因为它涉及到创建一个名为“空白”的新列来存储每个唯一变量的空白值,然后我只需减去数组。
发布于 2022-11-10 11:16:58
这行得通吗?
quux[, Cal.Concentration := Avg.Concentration - Avg.Concentration[Range == "blank"],
by = .(Variable)]
quux
# Variable Range Avg.Concentration Cal.Concentration
# <char> <char> <num> <num>
# 1: m21.021 (H[1]3O[16]1) blank 0.000000 0.000000
# 2: m21.021 (H[1]3O[16]1) dms1 0.000000 0.000000
# 3: m21.021 (H[1]3O[16]1) dms2 0.000000 0.000000
# 4: m21.021 (H[1]3O[16]1) dms3 0.000000 0.000000
# 5: m21.021 (H[1]3O[16]1) dms4 0.000000 0.000000
# 6: m33.993 (O[16]2H[1]1) blank 0.103741 0.000000
# 7: m33.993 (O[16]2H[1]1) dms1 0.107979 0.004238
# 8: m33.993 (O[16]2H[1]1) dms2 0.108123 0.004382
# 9: m33.993 (O[16]2H[1]1) dms3 0.104491 0.000750
# 10: m33.993 (O[16]2H[1]1) dms4 0.104051 0.000310
# 11: m39.032 (H[1]5O[16]2) blank 1.391382 0.000000
# 12: m39.032 (H[1]5O[16]2) dms1 1.025274 -0.366108
# 13: m39.032 (H[1]5O[16]2) dms2 1.007375 -0.384007
# 14: m39.032 (H[1]5O[16]2) dms3 1.068320 -0.323062
# 15: m39.032 (H[1]5O[16]2) dms4 1.154035 -0.237347数据
quux <- setDT(structure(list(Variable = c("m21.021 (H[1]3O[16]1)", "m21.021 (H[1]3O[16]1)", "m21.021 (H[1]3O[16]1)", "m21.021 (H[1]3O[16]1)", "m21.021 (H[1]3O[16]1)", "m33.993 (O[16]2H[1]1)", "m33.993 (O[16]2H[1]1)", "m33.993 (O[16]2H[1]1)", "m33.993 (O[16]2H[1]1)", "m33.993 (O[16]2H[1]1)", "m39.032 (H[1]5O[16]2)", "m39.032 (H[1]5O[16]2)", "m39.032 (H[1]5O[16]2)", "m39.032 (H[1]5O[16]2)", "m39.032 (H[1]5O[16]2)"), Range = c("blank", "dms1", "dms2", "dms3", "dms4", "blank", "dms1", "dms2", "dms3", "dms4", "blank", "dms1", "dms2", "dms3", "dms4"), Avg.Concentration = c(0, 0, 0, 0, 0, 0.103741, 0.107979, 0.108123, 0.104491, 0.104051, 1.391382, 1.025274, 1.007375, 1.06832, 1.154035), Cal.Concentration = c(0, 0, 0, 0, 0, 0, 0.00423800000000001, 0.004382, 0.000750000000000001, 0.000310000000000005, 0, -0.366108, -0.384007, -0.323062, -0.237347)), row.names = c(NA, -15L), class = c("data.table", "data.frame")))https://stackoverflow.com/questions/74384479
复制相似问题