❝本文翻译整理自:https://github.com/stefpeschel/NetCoMi ❞
从高通量测序数据中获得微生物关联网络已是一种常见的数据分析方法,使我们得以了解微生物群落在环境中的复杂相互作用。一般来说,网络分析工作流程包括几个步骤,包括零值处理,数据归一化以及计算微生物关联。另一方面,由于微生物之间的相互作用可能会在不同条件下发生变化(例如在健康个体和患者之间),因此识别两组之间的网络差异通常也是不可或缺的分析要点。
本文介绍的 NetCoMi (「Net」work 「Co」nstruction and Comparison for 「Mi」crobiome Data) R 包即是包括了构建,分析和比较微生物成分数据网络的功能。这样我们就可以研究单个分类单元,分类单元组或整个网络结构在两组之间是否发生了变化。NetCoMi 包含了构建差异网络的功能,从而可进一步探究一对微生物的关联在两组之间是否存在显著差异。此外,NetCoMi 还可以构建和分析微生物组样本的相异度网络,对整个微生物组样本的异质性进行可视化。
❝Stefanie Peschel, Christian L Müller, Erika von Mutius, Anne-Laure Boulesteix, Martin Depner (2020). NetCoMi: network construction and comparison for microbiome data in R. Briefings in Bioinformatics, bbaa290. https://doi.org/10.1093/bib/bbaa290. ❞

使用相同的布局,比较两种实验环境下的微生物网络关系
「相关性计算方法:」
cor() from stats package)cor() from stats package)bicor() from WGCNA packagesparcc() from SpiecEasi package)ccrepe package)SpiecEasi package)SPRING package)propr package)「相异度计算方法:」
vegdist() from vegan package)vegdist() from vegan package)KLD() from LaplacesDemon package)KLD() from LaplacesDemon package)KLD() from LaplacesDemon package)vegdist() and clr() from SpiecEasi package)「零值替换方法:」
multRepl from zCompositions package)lrEM from zCompositions package)cmultRepl from zCompositions package)「归一化计算方法:」
cumNormMat from metagenomeSeq package)rrarefy from vegan package)varianceStabilizingTransformation from DESeq2 package)clr() from SpiecEasi package))直接通过 Github 安装:
#install.packages("devtools")
devtools::install_github("stefpeschel/NetCoMi", dependencies = TRUE,
repos = c("https://cloud.r-project.org/",
BiocManager::repositories()))
但在国内,这个安装方式比较麻烦。从上面 NetCoMi 所包含林林总总的计算方法就知道,它包含了好多依赖包,且很多也都是需要从 Github 安装,跟何况最近不知怎么回事 Github 也老被墙。。。
不过,好在贴心的阿鲍为大家准备了 docker 镜像,大家直接 pull 即可:
docker pull zwbao/netcomi
创建容器:
docker run --rm -p 8787:8787 -e PASSWORD=yourpasswordhere zwbao/netcomi
我们使用 SpiecEasi 包的 American Gut 数据集作为示例。NetCoMi 的主要函数如下:
netConstruct()netAnalyze()netCompare()上面这三个功能必须按顺序依次执行。另一个函数是 diffnet(),用于构建差异关联网络,diffnet()基于 netConstruct() 生成的对象分析。
首先,载入 NetCoMi 以及示例数据:
library(NetCoMi)
data("amgut1.filt")
data("amgut2.filt.phy")
我们首先使用 SPRING 算法构造单个的关联网络,以研究 OTU 之间的关系。
用 netConstruct() 过滤数据,如下所示:
filtSamp 参数)。filtTax 参数)。measure 参数定义相关性或相异度计算方法,在示例中为 "spring"。其他参数通过 measurePar 传递到 SPRING()。将 nlambda 和 rep.num 设置为 10 以减少运行时间,但对于真实数据我们应将其设置为更高的数值。
因为在 SPRING() 算法内部已经执行了归一化以及零值处理。因此,我们将 normMethod 和 zeroMethod 设置为 "none"。
此外,我们将 sparsMethod 设置为 "none",因为 SPRING 返回的稀疏网络不需要执行额外的稀疏化步骤。
我们使用 signed 方法将关联转换为相异度(dissFunc 参数)。在这种情况下,强烈负相关的分类单元具有较高的相异度,反之亦然,这与网络中的边权重相对应。
net_single <- netConstruct(amgut1.filt,
filtTax = "highestFreq",
filtTaxPar = list(highestFreq = 100),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 1000),
measure = "spring",
measurePar = list(nlambda=10,
rep.num=10),
normMethod = "none",
zeroMethod = "none",
sparsMethod = "none",
dissFunc = "signed",
verbose = 3,
seed = 123456)
## Data filtering ...
## 27 taxa removed.
## 100 taxa and 289 samples remaining.
##
## Calculate 'spring' associations ...
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## Done.
NetCoMi 的 netAnalyze() 函数用于分析网络。
此处,我们将 centrLCC 设置为 TRUE 意味着仅针对最大连通分量(largest connected component,LCC)中的节点计算中心度。
使用贪婪模块优化算法来识别 clusters(通过 igraph 包中的 cluster_fast_greedy() 函数)。
Hubs 定义为特征向量中心度值高于网络中所有特征向量中心度的 95% 分位数的节点(hubPar 参数)。
weightDeg 和 normDeg 设置为 FALSE,以便将节点度(node degree)简单定义为与该节点相邻的节点数。
props_single <- netAnalyze(net_single,
centrLCC = TRUE,
clustMethod = "cluster_fast_greedy",
hubPar = "eigenvector",
weightDeg = FALSE, normDeg = FALSE)
#?summary.microNetProps
summary(props_single, numbNodes = 5L)
##
## Component sizes
## ```````````````
## size: 100
## #: 1
## ______________________________
## Global network properties
## `````````````````````````
##
## Number of components 1.00000
## Clustering coefficient 0.32020
## Moduarity 0.51909
## Positive edge percentage 92.01278
## Edge density 0.06323
## Natural connectivity 0.01454
## Vertex connectivity 1.00000
## Edge connectivity 1.00000
## Average dissimilarity* 0.97984
## Average path length** 2.22537
##
## *Dissimilarity = 1 - edge weight
## **Path length: Units with average dissimilarity
##
## ______________________________
## Clusters
## - In the whole network
## - Algorithm: cluster_fast_greedy
## ````````````````````````````````
##
## name: 1 2 3 4 5 6
## #: 15 25 14 21 20 5
##
## ______________________________
## Hubs
## - In alphabetical/numerical order
## - Based on empirical quantiles of centralities
## ```````````````````````````````````````````````
## 189396
## 191687
## 293896
## 364563
## 544358
##
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
##
## 293896 15
## 544358 14
## 174012 13
## 364563 12
## 311477 12
##
## Betweenness centrality (normalized):
##
## 175617 0.14533
## 175537 0.11833
## 165261 0.10142
## 185451 0.09462
## 268332 0.08493
##
## Closeness centrality (normalized):
##
## 544358 0.71315
## 293896 0.69119
## 191687 0.67098
## 311477 0.66954
## 174012 0.66699
##
## Eigenvector centrality (normalized):
##
## 293896 1.00000
## 191687 0.87429
## 364563 0.86119
## 189396 0.83855
## 544358 0.81202
我们用 cluster 定义节点颜色,并用节点的特征向量中心度定义节点大小。
# help page
?plot.microNetProps
p <- plot(props_single,
nodeColor = "cluster",
nodeSize = "eigenvector",
title1 = "Network on OTU level with SPRING associations",
showTitle = TRUE,
cexTitle = 2.3)
legend(0.7, 1.1, cex = 2.2, title = "estimated association:",
legend = c("+","-"), lty = 1, lwd = 3, col = c("#009900","red"),
bty = "n", horiz = TRUE)

下面将演示如何使用 NetCoMi 比较两个网络。
使用 "SEASONAL_ALLERGIES" 变量将数据集分为两组,这里用到的 metagMisc 包可根据变量拆分 phyloseq 对象。接下来将两个 phyloseq 对象输入 NetCoMi。(我们选择方差最大的 50 个节点以节省时间)
# devtools::install_github("vmikk/metagMisc")
# Split the phyloseq object into two groups
amgut_split <- metagMisc::phyloseq_sep_variable(amgut2.filt.phy,
"SEASONAL_ALLERGIES")
# Network construction
net_season <- netConstruct(data = amgut_split$no,
data2 = amgut_split$yes,
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
measure = "spring",
measurePar = list(nlambda=10,
rep.num=10),
normMethod = "none",
zeroMethod = "none",
sparsMethod = "none",
dissFunc = "signed",
verbose = 3,
seed = 123456)
## Data filtering ...
## 95 taxa removed in each data set.
## 1 rows with zero sum removed in group 1.
## 1 rows with zero sum removed in group 2.
## 43 taxa and 162 samples remaining in group 1.
## 43 taxa and 120 samples remaining in group 2.
##
## Calculate 'spring' associations ...
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## Done.
##
## Calculate associations in group 2 ...
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## Done.
将包含两个网络的 netConstruct() 对象传递到 netAnalyze()。同时为两个网络计算网络属性。
以下参数仅仅是为了演示 netAnalyze() 的功能,所选参数并非最佳。
centrLCC = FALSE:为所有节点(不仅仅是 LCC)计算中心度。avDissIgnoreInf = TRUE:计算平均相异度时,具有无限相异度的节点将被忽略。sPathNorm = FALSE:最短路径将不用平均相异度进行归一化。hubPar = c("degree", "between", "closeness"): Hubs 定义为同时具有最高 degree,betweenness 和 closeness centrality 的节点。lnormFit = TRUE 和 hubQuant = 0.9:用对数正态分布拟合中心度值,以标识具有“最高”中心度值的节点。如果该节点的三个中心度度量都超过 90% 分位数,则将该节点标识为 hub。「注意:这里的参数必须根据研究问题仔细斟酌。并非所有情况下都适合 NetCoMi 的默认值!」
props_season <- netAnalyze(net_season,
centrLCC = FALSE,
avDissIgnoreInf = TRUE,
sPathNorm = FALSE,
clustMethod = "cluster_fast_greedy",
hubPar = c("degree", "between", "closeness"),
hubQuant = 0.9,
lnormFit = TRUE,
normDeg = FALSE,
normBetw = FALSE,
normClose = FALSE,
normEigen = FALSE)
summary(props_season)
##
## Component sizes
## ```````````````
## Group 1:
## size: 31 8 1
## #: 1 1 4
##
## Group 2:
## size: 41 1
## #: 1 2
## ______________________________
## Global network properties
## `````````````````````````
## Largest connected component (LCC):
## group '1' group '2'
## Relative LCC size 0.72093 0.95349
## Clustering coefficient 0.27184 0.29563
## Moduarity 0.51794 0.54832
## Positive edge percentage 100.00000 100.00000
## Edge density 0.11183 0.09512
## Natural connectivity 0.04296 0.03257
## Vertex connectivity 1.00000 1.00000
## Edge connectivity 1.00000 1.00000
## Average dissimilarity* 0.68091 0.67853
## Average path length** 2.23414 2.49352
##
## Whole network:
## group '1' group '2'
## Number of components 6.00000 3.00000
## Clustering coefficient 0.31801 0.29563
## Moduarity 0.62749 0.54832
## Positive edge percentage 100.00000 100.00000
## Edge density 0.06977 0.08638
## Natural connectivity 0.02979 0.03072
##
## *Dissimilarity = 1 - edge weight
## **Path length: Sum of dissimilarities along the path
##
## ______________________________
## Clusters
## - In the whole network
## - Algorithm: cluster_fast_greedy
## ````````````````````````````````
## group '1':
## name: 0 1 2 3 4
## #: 4 10 13 8 8
##
## group '2':
## name: 0 1 2 3 4 5
## #: 2 6 11 8 8 8
##
## ______________________________
## Hubs
## - In alphabetical/numerical order
## - Based on log-normal quantiles of centralities
## ```````````````````````````````````````````````
## group '1' group '2'
## 322235
##
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
## group '1' group '2'
## 322235 7 9
## 364563 7 5
## 259569 7 5
## 184983 6 5
## 9715 5 4
## ______ ______
## 322235 7 9
## 363302 4 9
## 158660 3 6
## 90487 3 5
## 188236 4 5
##
## Betweenness centrality (unnormalized):
## group '1' group '2'
## 364563 148 82
## 188236 144 87
## 259569 122 39
## 331820 115 8
## 322235 106 226
## ______ ______
## 158660 0 317
## 470239 5 256
## 10116 0 233
## 322235 106 226
## 326792 0 161
##
## Closeness centrality (unnormalized):
## group '1' group '2'
## 364563 22.67516 25.82886
## 259569 22.19619 23.95399
## 322235 22.1629 30.0221
## 188236 21.1083 26.37517
## 184983 20.0581 22.58796
## ______ ______
## 322235 22.1629 30.0221
## 158660 17.64634 27.65421
## 363302 17.27059 27.61001
## 326792 17.95234 26.47248
## 188236 21.1083 26.37517
##
## Eigenvector centrality (unnormalized):
## group '1' group '2'
## 364563 0.30442 0.2137
## 184983 0.29042 0.26935
## 188236 0.2392 0.23412
## 516022 0.23484 0.14094
## 190464 0.22311 0.21337
## ______ ______
## 363302 0.21617 0.38157
## 322235 0.14538 0.29676
## 194648 0.17436 0.28061
## 184983 0.29042 0.26935
## 188236 0.2392 0.23412
在上述设置中,只有一个 hub 节点被识别出来。
首先,在两组中分别计算网络布局。
由于 SPRING 使用 mclr 变换作为归一化方法,因此根据 mclr 转换后的数据对节点大小进行了缩放。
节点颜色表示不同的 cluster。注意,默认情况下,如果两个 cluster 中至少有两个公共节点(sameColThresh = 2),则两个 cluster 的颜色相同。 将 sameClustCol 设置为 FALSE 可获得不同的 cluster 颜色。
plot(props_season,
sameLayout = FALSE,
nodeColor = "cluster",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.7,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)

使用不同的布局会为两组分别生成一个“漂亮的”网络图,但要想一眼就看出两组之间的差异还是不够直观。
因此,我们现在尝试在两个组中使用相同的网络布局。 在下面,为组1(左网络)计算布局,并为组2 分配该布局。
rmSingles 设置为 "inboth",因为如果使用相同的布局,则只能删除两个组中都未连接的节点。
plot(props_season,
sameLayout = TRUE,
layoutGroup = 1,
rmSingles = "inboth",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.8,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)

在上图中,我们可以看到两组之间的明显差异。 例如,“季节性过敏”组中的 OTU“ 322235”比非季节性过敏组中的联系更紧密,这就是为什么它是右侧 Hub 而不是左侧的原因。
由于简单地将一个组的布局接至另一个组通常会导致其中一个组生成的图比较难看,因此 NetCoMi(> = 1.0.2)提供了另一种选择(layoutGroup = "union"),两组的布局将结合起来使用。 这样,两个网络的节点都将尽可能均等地放置在最佳位置。
plot(props_season,
sameLayout = TRUE,
layoutGroup = "union",
rmSingles = "inboth",
nodeSize = "mclr",
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 2.5,
cexHubLabels = 3,
cexTitle = 3.8,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend("bottom", title = "estimated association:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)

这里为了加速运行时间,我们不使用置换检验,将 permTest 参数设置为 FALSE(但在实际分析中还是得加上这步)。
comp_season <- netCompare(props_season, permTest = FALSE, verbose = FALSE)
summary(comp_season,
groupNames = c("No allergies", "Allergies"),
showCentr = c("degree", "between", "closeness"),
numbNodes = 5)
##
## Comparison of Network Properties
## ----------------------------------
## CALL:
## netCompare(x = props_season, permTest = FALSE, verbose = FALSE)
##
## ______________________________
## Global network properties
## `````````````````````````
## Largest connected component (LCC):
## No allergies Allergies difference
## Relative LCC size 0.721 0.953 0.233
## Clustering coefficient 0.272 0.296 0.024
## Moduarity 0.518 0.548 0.030
## Positive edge percentage 100.000 100.000 0.000
## Edge density 0.112 0.095 0.017
## Natural connectivity 0.043 0.033 0.010
## Vertex connectivity 1.000 1.000 0.000
## Edge connectivity 1.000 1.000 0.000
## Average dissimilarity* 0.681 0.679 0.002
## Average path length** 2.234 2.494 0.259
##
## Whole network:
## No allergies Allergies difference
## Number of components 6.000 3.000 3.000
## Clustering coefficient 0.318 0.296 0.022
## Moduarity 0.627 0.548 0.079
## Positive edge percentage 100.000 100.000 0.000
## Edge density 0.070 0.086 0.017
## Natural connectivity 0.030 0.031 0.001
## -----
## *: Dissimilarity = 1 - edge weight
## **Path length: Sum of dissimilarities along the path
##
## ______________________________
## Jaccard index (similarity betw. sets of most central nodes)
## ``````````````````````````````````````````````````````````
## Jacc P(<=Jacc) P(>=Jacc)
## degree 0.286 0.475500 0.738807
## betweenness centr. 0.077 0.038537 * 0.994862
## closeness centr. 0.267 0.404065 0.790760
## eigenvec. centr. 0.667 0.996144 0.018758 *
## hub taxa 0.000 0.666667 1.000000
## -----
## Jaccard index ranges from 0 (compl. different) to 1 (sets equal)
##
## ______________________________
## Adjusted Rand index (similarity betw. clusterings)
## ``````````````````````````````````````````````````
## ARI p-value
## 0.327 0
## -----
## ARI in [-1,1] with ARI=1: perfect agreement betw. clusterings,
## ARI=0: expected for two random clusterings
## p-value: two-tailed test with null hypothesis ARI=0
##
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
## No allergies Allergies abs.diff.
## 363302 4 9 5
## 549871 4 0 4
## 469709 1 4 3
## 158660 3 6 3
## 181016 0 3 3
##
## Betweenness centrality (unnormalized):
## No allergies Allergies abs.diff.
## 158660 0 317 317
## 470239 5 256 251
## 10116 0 233 233
## 326792 0 161 161
## 322235 106 226 120
##
## Closeness centrality (unnormalized):
## No allergies Allergies abs.diff.
## 181016 0.000 22.880 22.880
## 361496 0.000 22.253 22.253
## 549871 19.787 0.000 19.787
## 278234 0.000 15.093 15.093
## 10116 7.039 20.965 13.925
##
## _________________________________________________________
## Significance codes: ***: 0.001, **: 0.01, *: 0.05, .: 0.1
下面,我们将建立一个差异关联网络,如果两个节点在两个组之间具有显著差异的关联,则将两个节点连接在一起。
为了缩短运行时间,这里我们使用 Pearson 相关性来估计 OTUs 之间的关联。
Fisher's z 检验用于识别差异关联的 OTUs。使用 local false discovery rate 校正 P 值。
注意:这里 sparsMethod 设置为 "none",只是为了能够将所有差异关联包括在关联网络图中。 但是,差异网络始终基于稀疏化之前的估计关联矩阵。
net_season_pears <- netConstruct(data = amgut_split$no,
data2 = amgut_split$yes,
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
measure = "pearson",
normMethod = "clr",
sparsMethod = "none",
thresh = 0.2,
verbose = 3)
## Infos about changed arguments:
## Zero replacement needed for clr transformation. 'multRepl' used.
## Data filtering ...
## 95 taxa removed in each data set.
## 1 rows with zero sum removed in group 1.
## 1 rows with zero sum removed in group 2.
## 43 taxa and 162 samples remaining in group 1.
## 43 taxa and 120 samples remaining in group 2.
##
## Zero treatment in group 1:
## Execute multRepl() ... Done.
##
## Zero treatment in group 2:
## Execute multRepl() ... Done.
##
## Normalization in group 1:
## Execute clr(){SpiecEasi} ... Done.
##
## Normalization in group 2:
## Execute clr(){SpiecEasi} ... Done.
##
## Calculate 'pearson' associations ... Done.
##
## Calculate associations in group 2 ... Done.
# Differential network construction
diff_season <- diffnet(net_season_pears,
diffMethod = "fisherTest",
adjust = "lfdr")
## Adjust for multiple testing using 'lfdr' ...
## Execute fdrtool() ...
## Step 1... determine cutoff point
## Step 2... estimate parameters of null distribution and eta0
## Step 3... compute p-values and estimate empirical PDF/CDF
## Step 4... compute q-values and local fdr
## Done.
# Differential network plot
plot(diff_season,
cexNodes = 0.8,
cexLegend = 3,
cexTitle = 4,
mar = c(2,2,8,5),
legendGroupnames = c("group 'no'", "group 'yes'"),
legendPos = c(0.7,1.6))

在上面显示的差异网络中, edge 颜色代表两组中的关联方向。 例如,如果两个 OTU 在组1 中正相关,而在组2 中负相关(如“ 191541”和“ 188236”),则相应的 edge 为青色。
我们还通过构建仅包含差异关联的 OTU 的关联网络来进一步探究。
props_season_pears <- netAnalyze(net_season_pears,
clustMethod = "cluster_fast_greedy",
weightDeg = TRUE,
normDeg = FALSE)
# Identify the differentially associated OTUs
diffmat_sums <- rowSums(diff_season$diffAdjustMat)
diff_asso_names <- names(diffmat_sums[diffmat_sums > 0])
plot(props_season_pears,
nodeFilter = "names",
nodeFilterPar = diff_asso_names,
nodeColor = "gray",
highlightHubs = FALSE,
sameLayout = TRUE,
layoutGroup = "union",
rmSingles = FALSE,
nodeSize = "clr",
edgeTranspHigh = 20,
labelScale = FALSE,
cexNodes = 1.5,
cexLabels = 3,
cexTitle = 3.8,
groupNames = c("No seasonal allergies", "Seasonal allergies"),
hubBorderCol = "gray40")
legend(-0.15,-0.7, title = "estimated correlation:", legend = c("+","-"),
col = c("#009900","red"), inset = 0.05, cex = 4, lty = 1, lwd = 4,
bty = "n", horiz = TRUE)

我们可以看到,上述 OTUs 191541 与 188236 在左组为强正相关,而在右组则为负相关。
如果使用相异度构建网络,则是将每个样本作为节点。相异度被转换为相似度,用作 edge 权重,以便使具有相似微生物组成的样本在网络图中更接近。
下面,我们将使用适合于组成数据的 Aitchison 距离构建网络。由于 Aitchison 距离基于 clr 变换,因此需要替换掉数据中的零。这里用使用 knn 算法稀疏网络。
net_aitchison <- netConstruct(amgut1.filt,
measure = "aitchison",
zeroMethod = "multRepl",
sparsMethod = "knn",
kNeighbor = 3,
verbose = 3)
## Infos about changed arguments:
## Counts normalized to fractions for measure 'aitchison'.
## 127 taxa and 289 samples remaining.
##
## Zero treatment:
## Execute multRepl() ... Done.
##
## Normalization:
## Counts normalized by total sum scaling.
##
## Calculate 'aitchison' dissimilarities ... Done.
##
## Sparsify dissimilarities via 'knn' ... Done.
对于 cluster 检测,我们使用基于 average linkage 的层次聚类。函数内部,将 k=3 从 stats 包传递到 cutree(),从而将树分割成 3 个clusters。
props_aitchison <- netAnalyze(net_aitchison,
clustMethod = "hierarchical",
clustPar = list(method = "average", k = 3),
hubPar = "eigenvector")
plot(props_aitchison,
nodeColor = "cluster",
nodeSize = "eigenvector",
hubTransp = 40,
edgeTranspLow = 60,
charToRm = "00000",
mar = c(1, 3, 3, 5))
# get green color with 50% transparency
green2 <- colToTransp("#009900", 40)
legend(0.4, 1.1,
cex = 2.2,
legend = c("high similarity (low Aitchison distance)",
"low similarity (high Aitchison distance)"),
lty = 1,
lwd = c(3, 1),
col = c("darkgreen", green2),
bty = "n")

在这个相异度网络中, Hubs 定义为与数据集中许多其他样本具有相似的微生物组成的样本。