大家好,我是邓飞,今天重新发一下之前写的TASSEL做GWAS分析的教程,前几天有幸和TASSEL和GAPIT软件包的作者张志武老师做了一个腾讯会议的视频沟通,张老师给我普及了很多GWAS相关的内容,受益良多。
张老师询问TASSEL软件在国内的使用情况,我说非常多,很多人因为这个软件顺利分析了自己的数据,完成了自己的毕业论文,发表了自己的文章,这个软件很强大,里面的功能目前还有很多人在使用。
张老师说现在GAPIT软件增加了很多功能,让我测试体验一下,我诚惶诚恐,后面一定要好好使用测试一下,写一下相关教程。

在和张老师的沟通中,张老师像我讲授了FDR和Power在GWAS中的应用,以及几种GWAS模型的特点,我受益良多,后面我会把会议中的收获总结几篇博客,一起发到公众号里面。
张老师每年都会在国内做GWAS相关的公益培训,下次我一定参加,当面向张老师请教。
下面是正文。
大家好,我是邓飞,之前写过TASSEL如何做GWAS的教程,最近小伙伴问得比较多,就再发一次,本次教程包括六个部分,都有超链接可以看。
教程有数据,代码,流程文件,跟着教程走一遍,能学个8-9分,然后用自己的数据跑起来,就没问题了。如果还有问题,可以加入飞哥的星球,提供技术答疑,非常666:
笔记计划分为六篇:
第一篇:读取plink基因型数据和表型数据
第二篇:对基因型数据质控:缺失质控,maf质控,hwe质控,样本质控
第三篇:基因型数据可视化:kingship,MDS,PCA
第四篇:一般线性模型进行GWAS分析(GLM模型)
第五篇:混合线性模型进行GWAS分析(MLM模型)
第六篇:TASSEL结果可视化:QQ plot,曼哈顿图
TASSEL配套数据和代码,公众号回复:tassel,获得下载链接:

质控后的plink数据和表型数据:

「GLM的GWAS分析结果:」

「MLM的GWAS分析结果:」

TASSEL有对结果进行可视化的模块,包括qq图和曼哈顿图,但是图不方便调整。这里用TASSEL的分析结果,使用R语言进行绘制qq图和曼哈顿图。




需要用到:
qqmantidyversedata.table下面代码,会判断是否有这三个包,如果没有,就自动安装。然后载入软件包。
if(!require(data.table)) install.packages("data.table")
if(!require(qqman)) install.packages("qqman")
if(!require(tidyverse)) install.packages("tidyverse")
library(qqman)
library(tidyverse)
library(data.table)
results_log = fread("glm-result.txt")
dim(results_log)
head(results_log)
select = dplyr::select
table(results_log$Trait)
结果:
> table(results_log$Trait)
dpoll EarDia EarHT
2460 2460 2460
数据中共有三个性状,可以选择一个性状,进行可视化。
d1 = results_log %>% filter(Trait == "dpoll") %>% select(Chr,Marker,Pos,p)
head(d1)
summary(d1)
d1 = d1 %>% drop_na(p)
summary(d1)
注意,有些P值是NA,在作图时会报错,这里将其移除。
整理后的结果:
> summary(d1)
Chr Marker Pos p
Min. : 1.0 Length:2460 Min. : 139753 Min. :0.0000
1st Qu.: 2.0 Class :character 1st Qu.: 43868061 1st Qu.:0.1236
Median : 4.0 Mode :character Median :128423374 Median :0.3911
Mean : 4.7 Mean :120382976 Mean :0.4165
3rd Qu.: 7.0 3rd Qu.:175628840 3rd Qu.:0.6743
Max. :10.0 Max. :298413352 Max. :0.9996
作图代码:
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
tiff("y1-曼哈顿图.tiff")
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
tiff("y1-QQ图.tiff")
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
「曼哈顿图:」

「QQ图:」

其它两个性状的作图代码:
d2 = results_log %>% filter(Trait == "EarDia") %>% select(Chr,Marker,Pos,p)
head(d2)
summary(d2)
d2 = d2 %>% drop_na(p)
summary(d2)
tiff("y2-曼哈顿图.tiff")
manhattan(d2,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y2-QQ图.tiff")
qq(d2$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d3 = results_log %>% filter(Trait == "EarHT") %>% select(Chr,Marker,Pos,p)
head(d3)
summary(d3)
d3 = d3 %>% drop_na(p)
summary(d3)
tiff("y3-曼哈顿图.tiff")
manhattan(d3,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y3-QQ图.tiff")
qq(d3$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
将整理后的不同性状的结果保存到本地:
fwrite(d1,"y1_result.csv")
fwrite(d2,"y2_result.csv")
fwrite(d3,"y3_result.csv")
读取数据,提取性状,去掉P值为缺失的行:
library(qqman)
library(data.table)
results_log = fread("mlm-result.txt", head=TRUE)
dim(results_log)
head(results_log)
library(tidyverse)
select = dplyr::select
table(results_log$Trait)
d1 = results_log %>% filter(Trait == "dpoll") %>% select(Chr,Marker,Pos,p)
head(d1)
summary(d1)
d1 = d1 %>% drop_na(p)
summary(d1)
「曼哈顿图:」

「QQ图:」

其它两个作图代码:
d2 = results_log %>% filter(Trait == "EarDia") %>% select(Chr,Marker,Pos,p)
head(d2)
summary(d2)
d2 = d2 %>% drop_na(p)
summary(d2)
tiff("y2-曼哈顿图.tiff")
manhattan(d2,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y2-QQ图.tiff")
qq(d2$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d3 = results_log %>% filter(Trait == "EarHT") %>% select(Chr,Marker,Pos,p)
head(d3)
summary(d3)
d3 = d3 %>% drop_na(p)
summary(d3)
tiff("y3-曼哈顿图.tiff")
manhattan(d3,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y3-QQ图.tiff")
qq(d3$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
「GLM的可视化代码:」
## 对TASSEL GLM 模型可视化
if(!require(data.table)) install.packages("data.table")
if(!require(qqman)) install.packages("qqman")
if(!require(tidyverse)) install.packages("tidyverse")
library(qqman)
library(tidyverse)
library(data.table)
results_log = fread("glm-result.txt")
dim(results_log)
head(results_log)
select = dplyr::select
table(results_log$Trait)
d1 = results_log %>% filter(Trait == "dpoll") %>% select(Chr,Marker,Pos,p)
head(d1)
summary(d1)
d1 = d1 %>% drop_na(p)
summary(d1)
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
tiff("y1-曼哈顿图.tiff")
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
tiff("y1-QQ图.tiff")
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d2 = results_log %>% filter(Trait == "EarDia") %>% select(Chr,Marker,Pos,p)
head(d2)
summary(d2)
d2 = d2 %>% drop_na(p)
summary(d2)
tiff("y2-曼哈顿图.tiff")
manhattan(d2,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y2-QQ图.tiff")
qq(d2$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d3 = results_log %>% filter(Trait == "EarHT") %>% select(Chr,Marker,Pos,p)
head(d3)
summary(d3)
d3 = d3 %>% drop_na(p)
summary(d3)
tiff("y3-曼哈顿图.tiff")
manhattan(d3,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y3-QQ图.tiff")
qq(d3$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
fwrite(d1,"y1_result.csv")
fwrite(d2,"y2_result.csv")
fwrite(d3,"y3_result.csv")
「MLM的可视化代码:」
## 对TASSEL GLM 模型可视化
library(qqman)
library(data.table)
results_log = fread("mlm-result.txt", head=TRUE)
dim(results_log)
head(results_log)
library(tidyverse)
select = dplyr::select
table(results_log$Trait)
d1 = results_log %>% filter(Trait == "dpoll") %>% select(Chr,Marker,Pos,p)
head(d1)
summary(d1)
d1 = d1 %>% drop_na(p)
summary(d1)
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
tiff("y1-曼哈顿图.tiff")
manhattan(d1,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
tiff("y1-QQ图.tiff")
qq(d1$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d2 = results_log %>% filter(Trait == "EarDia") %>% select(Chr,Marker,Pos,p)
head(d2)
summary(d2)
d2 = d2 %>% drop_na(p)
summary(d2)
tiff("y2-曼哈顿图.tiff")
manhattan(d2,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y2-QQ图.tiff")
qq(d2$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
d3 = results_log %>% filter(Trait == "EarHT") %>% select(Chr,Marker,Pos,p)
head(d3)
summary(d3)
d3 = d3 %>% drop_na(p)
summary(d3)
tiff("y3-曼哈顿图.tiff")
manhattan(d3,chr="Chr",bp="Pos",p="p",snp="Marker", main = "Manhattan plot: logistic")
dev.off()
tiff("y3-QQ图.tiff")
qq(d3$p, main = "Q-Q plot of GWAS p-values : log")
dev.off()
fwrite(d1,"y1_result.csv")
fwrite(d2,"y2_result.csv")
fwrite(d3,"y3_result.csv")
前几篇:
使用TASSEL学习GWAS笔记(1/6):读取plink基因型数据和表型数据
使用TASSEL学习GWAS笔记(2/6):对基因型数据进行质控及导出基因型
使用TASSEL学习GWAS笔记(3/6):基因型数据可视化:kingship,PCA,MDS
使用TASSEL学习GWAS笔记(4/6):一般线性模型进行GWAS分析(GLM模型)
使用TASSEL学习GWAS笔记(5/6):混合线性模型进行GWAS分析(MLM模型)