首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用R包"metafor“一次对数千个基因进行meta分析

如何使用R包"metafor“一次对数千个基因进行meta分析
EN

Stack Overflow用户
提问于 2019-07-22 20:45:10
回答 2查看 135关注 0票数 0

我正在尝试使用R包"metafor“对许多基因进行meta分析,我知道如何一次只分析一个基因,但对数千个基因这样做是荒谬的。有没有人能帮帮我!感谢您的建议!

我有分别命名为'se_summary‘和'HR_summary’的所有基因的se和HR的所有结果。我需要使用来自五个研究"ICGC,TCGA,G71,G62,G8“的这些基因的se和HR作为输入进行meta分析。

我用来对单个基因进行meta分析的代码(以基因AAK1为例)是:

代码语言:javascript
复制
library(metafor) 
se.AAK1 <- as.numeric(se_summary[rownames(se_summary) == 'AAK1',][,-1])
HR.AAK1 <- as.numeric(HR_summary[rownames(HR_summary) == 'AAK1',][,-1])
beta.AAK1 <- log(HR.AAK1)

####First I need to use the random model to see if the test for Heterogeneity is significant or not.

pool.AAK1 <- rma(beta.AAK1, sei=se.AAK1) 
summary(pool.AAK1)
#### and this gives the following output:

#>Random-Effects Model (k = 5; tau^2 estimator: REML)
#>  logLik  deviance       AIC       BIC      AICc 
#> -2.5686    5.1372    9.1372    7.9098   21.1372   

#>tau^2 (estimated amount of total heterogeneity): 0.0870 (SE = 0.1176)
#>tau (square root of estimated tau^2 value):      0.2950
#>I^2 (total heterogeneity / total variability):   53.67%
#>H^2 (total variability / sampling variability):  2.16

#>Test for Heterogeneity:
#>Q(df = 4) = 8.5490, p-val = 0.0734

#>Model Results:
#>estimate      se     zval    pval    ci.lb   ci.ub 
#> -0.3206  0.1832  -1.7500  0.0801  -0.6797  0.0385  . 
#>---
#>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


####If the I^2 > 50%, we still use the Random-effect Model but if the I^2 <= 50%, we then use the Fixed-effect Model 

pool.AAK1 <- rma(beta.AAK1, sei=se.AAK1, method="FE") 
summary(pool.AAK1)
####this gives the following output:

#>Fixed-Effects Model (k = 5)
#> logLik  deviance       AIC       BIC      AICc 
#> -2.5793    8.5490    7.1587    6.7681    8.4920   

#>Test for Heterogeneity:
#>Q(df = 4) = 8.5490, p-val = 0.0734

#>Model Results:
#>estimate      se     zval    pval    ci.lb    ci.ub 
#> -0.2564  0.1191  -2.1524  0.0314  -0.4898  -0.0229  * 
#>---
#>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

如果我只得到一个基因,这很好用,但是我需要一次为所有这些基因做所有的工作,然后导出输出,包括“异质性p-val",以及所有模型结果"estimate,se,zval,pval,ci.lb,ci.ub”到一个.txt文件,每一行对应一个基因,输出应该是这样的:

代码语言:javascript
复制
Gene_symbol Heterogeneity_p-val estimate    se      zval    pval    ci.lb   ci.ub
AAK1        0.0734              -0.2564    0.1191   -2.1524 0.0314  -0.4898 -0.0229
A2M         0.9664               0.1688    0.1173    1.4388 0.1502  -0.0611 0.3987

如果需要,这里有一段示例数据"se_summary“

代码语言:javascript
复制
Gene_symbol ICGC_se TCGA_se G71_se  G62_se  G8_se
A1CF        0.312   0.21    0.219   0.292   0.381
A2M         0.305   0.21    0.219   0.292   0.387
A2ML1       0.314   0.211   0.222   0.289   0.389
A4GALT      0.305   0.21    0.225   0.288   0.388
A4GNT       0.306   0.211   0.222   0.288   0.385
AAAS        0.308   0.213   0.223   0.298   0.38
AACS        0.307   0.209   0.221   0.287   0.38
AADAC       0.302   0.212   0.221   0.293   0.404
AADAT       0.308   0.214   0.22    0.288   0.391
AAK1        0.304   0.209   0.22    0.303   0.438
AAMP        0.303   0.211   0.222   0.288   0.394

和一段样本数据"HR_summary“

代码语言:javascript
复制
Gene_symbol ICGC_HR TCGA_HR G71_HR  G62_HR  G8_HR
A1CF        1.689   1.427   0.864   1.884   1.133
A2M         1.234   1.102   1.11    1.369   1.338
A2ML1       0.563   0.747   0.535   1.002   0.752
A4GALT      0.969   0.891   0.613   0.985   0.882
A4GNT       1.486   0.764   1.051   1.317   1.465
AAAS        1.51    1.178   1.076   0.467   0.681
AACS        1.4     1.022   1.255   1.006   1.416
AADAC       0.979   0.642   1.236   1.581   1.234
AADAT       1.366   1.405   1.18    1.057   1.408
AAK1        1.04    0.923   0.881   0.469   0.329
AAMP        1.122   0.639   1.473   0.964   1.284
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-07-23 01:06:07

要点1:如果你的数据是从不同的人群中收集的,你不应该使用固定效应模型。因为HR在你的人群中可能是不同的。

要点2:如果将HR转换为log(HR),则应为log(HR)计算SE。

您的数据:

代码语言:javascript
复制
  se_summary=data.frame(
    Gene_symbol=c("A1CF","A2M","A2ML1","A4GALT","A4GNT","AAAS","AACS","AADAC","AADAT","AAK1","AAMP"),
    ICGC_se=c(0.312,0.305,0.314,0.305,0.306,0.308,0.307,0.302,0.308,0.304,0.303),
    TCGA_se=c(0.21,0.21,0.211,0.21,0.211,0.213,0.209,0.212,0.214,0.209,0.211),
    G71_se=c(0.219,0.219,0.222,0.225,0.222,0.223,0.221,0.221,0.22,0.22,0.222),
    G62_se=c(0.292,0.292,0.289,0.288,0.288,0.298,0.287,0.293,0.288,0.303,0.288),
    G8_se=c(0.381,0.387,0.389,0.388,0.385,0.38,0.38,0.404,0.391,0.438,0.394))

代码语言:javascript
复制
 HR_summary=data.frame(
    Gene_symbol=c("A1CF","A2M","A2ML1","A4GALT","A4GNT","AAAS","AACS","AADAC","AADAT","AAK1","AAMP"),
    ICGC_HR=c(1.689,1.234,0.563,0.969,1.486,1.51,1.4,0.979,1.366,1.04,1.122),
    TCGA_HR=c(1.427,1.102,0.747,0.891,0.764,1.178,1.022,0.642,1.405,0.923,0.639),
    G71_HR=c(0.864,1.11,0.535,0.613,1.051,1.076,1.255,1.236,1.18,0.881,1.473),
    G62_HR=c(1.884,1.369,1.002,0.985,1.317,0.467,1.006,1.581,1.057,0.469,0.964),
    G8_HR=c(1.133,1.338,0.752,0.882,1.465,0.681,1.416,1.234,1.408,0.329,1.284))

1)合并数据

代码语言:javascript
复制
data=cbind(se_summary,log(HR_summary[,-1]))

2)计算meta-log HR函数

代码语言:javascript
复制
met=function(x) {
y=rma(as.numeric(x[7:11]), sei=as.numeric(x[2:6]))
y=c(y$b,y$beta,y$se,y$zval,y$pval,y$ci.lb,y$ci.ub,y$tau2,y$I2)
y
}

3)对所有行执行函数

代码语言:javascript
复制
results=data.frame(t(apply(data,1,met)))
rownames(results)=rownames(data)
colnames(results)=c("b","beta","se","zval","pval","ci.lb","ci.ub","tau2","I2")

4)结果

代码语言:javascript
复制
> results
                 b        beta        se       zval        pval
A1CF    0.27683114  0.27683114 0.1538070  1.7998601 0.071882735
A2M     0.16877042  0.16877042 0.1172977  1.4388214 0.150201136
A2ML1  -0.37676308 -0.37676308 0.1182825 -3.1852811 0.001446134
A4GALT -0.18975044 -0.18975044 0.1179515 -1.6087159 0.107678477
A4GNT   0.09500277  0.09500277 0.1392486  0.6822528 0.495079085
AAAS   -0.07012629 -0.07012629 0.2000932 -0.3504680 0.725987468
AACS    0.15333550  0.15333550 0.1170061  1.3104915 0.190029610
AADAC   0.04902471  0.04902471 0.1738017  0.2820727 0.777887764
AADAT   0.23785528  0.23785528 0.1181503  2.0131593 0.044097875
AAK1   -0.32062727 -0.32062727 0.1832183 -1.7499744 0.080122725
AAMP    0.02722082  0.02722082 0.1724461  0.1578512 0.874574077
              ci.lb       ci.ub       tau2       I2
A1CF   -0.024625107  0.57828740 0.04413257 37.89339
A2M    -0.061128821  0.39866965 0.00000000  0.00000
A2ML1  -0.608592552 -0.14493360 0.00000000  0.00000
A4GALT -0.420931120  0.04143024 0.00000000  0.00000
A4GNT  -0.177919527  0.36792508 0.02455208 25.35146
AAAS   -0.462301836  0.32204926 0.12145183 62.23915
AACS   -0.075992239  0.38266324 0.00000000  0.00000
AADAC  -0.291620349  0.38966978 0.07385974 50.18761
AADAT   0.006285038  0.46942552 0.00000000  0.00000
AAK1   -0.679728455  0.03847392 0.08700387 53.66905
AAMP   -0.310767314  0.36520895 0.07266674 50.07330
票数 3
EN

Stack Overflow用户

发布于 2019-07-23 00:56:02

将数据放在长格式中,将效果大小和se数据并排放在一起,然后使用split并对每个数据应用rma。您可以为rma对象创建自己版本的broomtidy函数。

代码语言:javascript
复制
library(metafor) 
library(reshape)
se_summary<-read.table(text="
Gene_symbol ICGC_se TCGA_se G71_se  G62_se  G8_se
AADAT       0.308   0.214   0.22    0.288   0.391
AAK1        0.304   0.209   0.22    0.303   0.438
AAMP        0.303   0.211   0.222   0.288   0.394
",header=T)

HR_summary<-read.table(text="
Gene_symbol ICGC_HR TCGA_HR G71_HR  G62_HR  G8_HR
AADAT       0.308   0.214   0.22    0.288   0.391
AAK1        0.304   0.209   0.22    0.303   0.438
AAMP        0.303   0.211   0.222   0.288   0.394
                       ",header=T)

HR_summary<-melt(HR_summary,id.vars = "Gene_symbol")%>%
  mutate(.,variable=sapply(strsplit(as.character(variable), split='_', fixed=TRUE), function(x) (x[1])))%>%
  rename(gene=variable)
se_summary<-melt(se_summary,id.vars = "Gene_symbol")%>%
  mutate(.,variable=sapply(strsplit(as.character(variable), split='_', fixed=TRUE), function(x) (x[1])))%>%
  rename(gene=variable)
HR_summary<-merge(HR_summary,se_summary,by=c("Gene_symbol","gene"),suffixes=c(".HR",".se"))



tidy.rma<-function(x) {
  return(data.frame(estimate=x$b,se=x$se,zval=x$zval,ci.lb=x$ci.lb,ci.ub=x$ci.ub,k=x$k,Heterog_pv=x$QEp#the main stuff: overall ES, etc
                    #variance components( random effects stuff): nlvls is n sites
  )) #test for heterogeneity q value and p-value
}



rbindlist(lapply(split(HR_summary, droplevels(HR_summary$Gene_symbol)),
                 function(x)with(x, tidy.rma(rma(yi=value.HR, sei=value.se,method="FE")))),idcol = "Gene_symbol2")
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57146352

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档