我有一个200×102的数据矩阵叫做prostmat,它测量了男性癌症患者(病例,后52列)和健康男性(对照组,前50列)的基因活性。我想做一个置换测试,比较病例组和对照组的平均水平。我可以在每个排列中随机抽取病例/对照标签,然后计算各组之间均值的绝对差异。我要带B=1000去。这是我的方法,但我认为这里有问题。希望能在这方面提供帮助
T=mean(prostmat[1:200,51:102])-mean(prostmat[1:200,1:50])
B=1000
n=200
Ts <- rep(NaN, nrow=n, ncol=B)
for (i in 1:n){
for (b in 1:B){
tmpx=sample(prostmat[i,51:102])
tmpy=sample(prostmat[i,1:50])
Ts[b]=(mean(tmpx)-mean(tmpy))
p.val=sum(Ts<T)/B # HA: mean(x) < mean(y)
}
}发布于 2022-04-17 16:46:18
最好的方法是创建一个两列数据,其中第一列是控制/处理的指示符,第二列是整个数据。应该是尺寸为20400×2的数据,然后对此进行一次置换测试:
#Create Data
a <- data.frame(group = 'control', gene = c(as.matrix(prostmat[, 1:50])))
b <- data.frame(group = 'treatment', gene = c(as.matrix(prostmat[, 51:102])))
data <- rbind(a, b)
# Carry the permutation:
mean_diff <- with(data, diff(tapply(gene, group, mean)))
# Same as with(data, mean(gene[group == 'treatment'] - mean(gene[group == 'control']))
B <- 1000
distribution <- numeric(B)
for(i in seq(B){
distribution[i] <- with(data, diff(tapply(gene, sample(group), mean)))
}
# Calculate p_value
p_val <- mean(abs(distribution) >= abs(mean_diff)) # if the HA: mean(treatment) != mean(control)
p_val <- mean(distribution <= mean_diff) # # if the HA: mean(treatment) < mean(control)https://stackoverflow.com/questions/71902838
复制相似问题