我的数据集由3个时间点上记录的基因表达值组成,我正在尝试应用带tukey校正的anova测试,以寻找基因在不同时间点的差异表达。所以对于每个基因,我想要一个比较:基因a时间点1 vs 2基因a时间点2 vs 3基因a时间点3 vs 1
我的数据格式如下:
> head(rf)
gene expn timepoint rep
2 EG620009 // EG620009 8.428851 x0hr 0
3 LYPLA1 10.386500 x0hr 0
21 EG620009 // EG620009 8.582346 x0hr 1
31 LYPLA1 10.379710 x0hr 1
22 EG620009 // EG620009 8.566248 x0hr 2
32 LYPLA1 10.399080 x0hr 2
> tail(rf)
gene expn timepoint rep
23 EG620009 // EG620009 8.561409 x24hr 0
33 LYPLA1 10.233400 x24hr 0
24 EG620009 // EG620009 8.750639 x24hr 1
34 LYPLA1 10.023780 x24hr 1
25 EG620009 // EG620009 8.560267 x24hr 2
35 LYPLA1 10.025980 x24hr 2如果我这样做:
TukeyHSD(aov(rf$expn ~ rf$timepoint * rf$gene))这会给我一个所有基因的每个时间点之间的比较。包括诸如基因a时间点1与基因b时间点2的比较
我一直在尝试解决如何将aov函数应用于按基因进行数据子集的问题。我定义了一个函数,将p值作为输出,并尝试使用by函数将其单独应用于每个基因,如下所示;
> gene.aov = function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}
> aov.pval = function(y) {y$timepoint[,4]}
> gene.pval = function(z) {aov.pval(gene.aov(z))}
> pvals = by(rf$expn,list(rf$gene),gene.pval)
> Error in eval(predvars, data, env) :
numeric 'envir' arg not of length one 有什么提示为什么这个不起作用吗?或者我应该用一种完全不同的方式来处理这个问题?谢谢!
发布于 2016-12-07 22:05:53
它不起作用,因为by期望它的第一个参数是data.frame或矩阵,你传递的是rf$exp,这是一个numeric向量。你可以这样做,它会工作得很好(为了可读性,我放弃了多个函数)。
by(rf, rf$gene, function(x) {TukeyHSD(aov(expn ~ timepoint, data = x))}, simplify = F)
rf$gene: EG620009
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = expn ~ timepoint, data = x)
$timepoint
diff lwr upr p adj
x24hr-x0hr 0.09829 -0.123391 0.319971 0.2857424
---------------------------------------------------------------------------
rf$gene: LYPLA1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = expn ~ timepoint, data = x)
$timepoint
diff lwr upr p adj
x24hr-x0hr -0.2940433 -0.4876756 -0.100411 0.0135193https://stackoverflow.com/questions/41017997
复制相似问题