我每组有三个样本(副本)。我想使用T检验来比较组之间的值(MappedReadsCPM)。但是,我有4000个值要进行顺序比较(由PeakNumber指定)。下面的代码行很接近,但它并没有告诉R只比较peak_1,然后只比较peak_2,依此类推。
t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2")))$p.value我不想打印4000个p值-理想情况下,我可以将它们添加到数据帧中。
pvalues <- t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2")))$p.valuedata2
PeakNumber Sample Group MappedReadsCPM
peak_1 A 1 43.53819
peak_2 A 1 49.20722
peak_3 A 1 38.54943
peak_4 A 1 99.09472
peak_1 B 2 105.21728
peak_2 B 2 42.63114
peak_3 B 2 78.00591
peak_4 B 2 74.37773
peak_1 C 2 509.30606
peak_2 C 2 101.36234
peak_3 C 2 25.17051
peak_4 C 2 32.8804
peak_1 D 1 35.37478
peak_2 D 1 89.11722
peak_3 D 1 112.24688
peak_4 D 1 386.40139
peak_1 E 3 631.07692
peak_2 E 3 162.58791
peak_3 E 3 46.93961
peak_4 E 3 56.69035
peak_1 F 2 38.7762
peak_2 F 2 261.45587
peak_3 F 2 43.99171
peak_4 F 2 72.11012
peak_1 G 1 118.5962
peak_2 G 1 250.1178
peak_3 G 1 84.35
peak_4 G 1 386.40139发布于 2020-01-16 02:24:18
您可以使用sapply遍历数据中的所有唯一峰值,并将数据子集到该特定峰值:
pvalues <- sapply(unique(data2$PeakNumber), function(peak){
t.test(MappedReadsCPM~Group, data=subset(data2, Group %in% c("1", "2") & PeakNumber == peak))$p.value
})发布于 2020-01-16 03:22:50
在您的数据中,似乎不能为Group == 3运行测试。因此,我首先对数据进行子集,以仅保留组1和组2。
df_12 <- subset(df1, Group != 3)现在通过PeakNumber执行split,然后对测试执行lapply。输出是测试结果的列表。
sp <- split(df_12, df_12$PeakNumber)
t_list <- lapply(sp, function(DF){
t.test(MappedReadsCPM ~ Group, data = DF)
})这将从上面的结果中提取p值。
pvals <- sapply(t_list, '[[', 'p.value')
pvals
# peak_1 peak_2 peak_3 peak_4
#0.4105493 0.9526529 0.3357703 0.1348856 最后一次清理。
rm(df_12, sp)https://stackoverflow.com/questions/59757184
复制相似问题