NetCoMi | 微生物组数据的网络比较

生信菜鸟团

发布于 2021-04-29 11:37:05

5.2K0

❝本文翻译整理自：https://github.com/stefpeschel/NetCoMi ❞

从高通量测序数据中获得微生物关联网络已是一种常见的数据分析方法，使我们得以了解微生物群落在环境中的复杂相互作用。一般来说，网络分析工作流程包括几个步骤，包括零值处理，数据归一化以及计算微生物关联。另一方面，由于微生物之间的相互作用可能会在不同条件下发生变化（例如在健康个体和患者之间），因此识别两组之间的网络差异通常也是不可或缺的分析要点。

本文介绍的 NetCoMi (「Net」work 「Co」nstruction and Comparison for 「Mi」crobiome Data) R 包即是包括了构建，分析和比较微生物成分数据网络的功能。这样我们就可以研究单个分类单元，分类单元组或整个网络结构在两组之间是否发生了变化。NetCoMi 包含了构建差异网络的功能，从而可进一步探究一对微生物的关联在两组之间是否存在显著差异。此外，NetCoMi 还可以构建和分析微生物组样本的相异度网络，对整个微生物组样本的异质性进行可视化。

❝Stefanie Peschel, Christian L Müller, Erika von Mutius, Anne-Laure Boulesteix, Martin Depner (2020). NetCoMi: network construction and comparison for microbiome data in R. Briefings in Bioinformatics, bbaa290. https://doi.org/10.1093/bib/bbaa290. ❞

Github： https://github.com/stefpeschel/NetCoMi

使用相同的布局，比较两种实验环境下的微生物网络关系

NetCoMi 中包含的计算方法

「相关性计算方法：」

Pearson coefficient (cor() from stats package)
Spearman coefficient (cor() from stats package)
Biweight Midcorrelation bicor() from WGCNA package
SparCC (sparcc() from SpiecEasi package)
CCLasso (R code on GitHub)
CCREPE (ccrepe package)
SpiecEasi (SpiecEasi package)
SPRING (SPRING package)
gCoda (R code on GitHub)
propr (propr package)

「相异度计算方法：」

Euclidean distance (vegdist() from vegan package)
Bray-Curtis dissimilarity (vegdist() from vegan package)
Kullback-Leibler divergence (KLD) (KLD() from LaplacesDemon package)
Jeffrey divergence (own code using KLD() from LaplacesDemon package)
Jensen-Shannon divergence (own code using KLD() from LaplacesDemon package)
Compositional KLD (own implementation following [Martín-Fernández et al., 1999])
Aitchison distance (vegdist() and clr() from SpiecEasi package)

「零值替换方法：」

Adding a predefined pseudo count
Multiplicative replacement (multRepl from zCompositions package)
Modified EM alr-algorithm (lrEM from zCompositions package)
Bayesian-multiplicative replacement (cmultRepl from zCompositions package)

「归一化计算方法：」

Total Sum Scaling (TSS) (own implementation)
Cumulative Sum Scaling (CSS) (cumNormMat from metagenomeSeq package)
Common Sum Scaling (COM) (own implementation)
Rarefying (rrarefy from vegan package)
Variance Stabilizing Transformation (VST) (varianceStabilizingTransformation from DESeq2 package)
Centered log-ratio (clr) transformation (clr() from SpiecEasi package))

安装 NetCoMi

直接通过 Github 安装：

#install.packages("devtools")

devtools::install_github("stefpeschel/NetCoMi", dependencies = TRUE,
                         repos = c("https://cloud.r-project.org/",
                                   BiocManager::repositories()))

但在国内，这个安装方式比较麻烦。从上面 NetCoMi 所包含林林总总的计算方法就知道，它包含了好多依赖包，且很多也都是需要从 Github 安装，跟何况最近不知怎么回事 Github 也老被墙。。。

不过，好在贴心的阿鲍为大家准备了 docker 镜像，大家直接 pull 即可：

docker pull zwbao/netcomi

创建容器：

docker run --rm -p 8787:8787 -e PASSWORD=yourpasswordhere zwbao/netcomi

基本使用方法

我们使用 SpiecEasi 包的 American Gut 数据集作为示例。NetCoMi 的主要函数如下：

用于构建网络的 netConstruct()
用于网络分析的 netAnalyze()
用于网络比较的 netCompare()

上面这三个功能必须按顺序依次执行。另一个函数是 diffnet()，用于构建差异关联网络，diffnet()基于 netConstruct() 生成的对象分析。

首先，载入 NetCoMi 以及示例数据：

library(NetCoMi)
data("amgut1.filt")
data("amgut2.filt.phy")

用 SPRING 算法构建单个关联网络

网络构建

我们首先使用 SPRING 算法构造单个的关联网络，以研究 OTU 之间的关系。

用 netConstruct() 过滤数据，如下所示：

仅包括 reads 总数至少为 1000 的样本（filtSamp 参数）。
仅包括频率最高的 100 个分类单元（filtTax 参数）。

measure 参数定义相关性或相异度计算方法，在示例中为 "spring"。其他参数通过 measurePar 传递到 SPRING()。将 nlambda 和 rep.num 设置为 10 以减少运行时间，但对于真实数据我们应将其设置为更高的数值。

因为在 SPRING() 算法内部已经执行了归一化以及零值处理。因此，我们将 normMethod 和 zeroMethod 设置为 "none"。

此外，我们将 sparsMethod 设置为 "none"，因为 SPRING 返回的稀疏网络不需要执行额外的稀疏化步骤。

我们使用 signed 方法将关联转换为相异度（dissFunc 参数）。在这种情况下，强烈负相关的分类单元具有较高的相异度，反之亦然，这与网络中的边权重相对应。

net_single <- netConstruct(amgut1.filt,
                           filtTax = "highestFreq",
                           filtTaxPar = list(highestFreq = 100),
                           filtSamp = "totalReads",
                           filtSampPar = list(totalReads = 1000),
                           measure = "spring",
                           measurePar = list(nlambda=10, 
                                             rep.num=10),
                           normMethod = "none", 
                           zeroMethod = "none",
                           sparsMethod = "none", 
                           dissFunc = "signed",
                           verbose = 3,
                           seed = 123456)

## Data filtering ...

## 27 taxa removed.

## 100 taxa and 289 samples remaining.

## 
## Calculate 'spring' associations ...

## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done

## Done.

网络分析

NetCoMi 的 netAnalyze() 函数用于分析网络。

此处，我们将 centrLCC 设置为 TRUE 意味着仅针对最大连通分量（largest connected component，LCC）中的节点计算中心度。

使用贪婪模块优化算法来识别 clusters（通过 igraph 包中的 cluster_fast_greedy() 函数）。

Hubs 定义为特征向量中心度值高于网络中所有特征向量中心度的 95％分位数的节点（hubPar 参数）。

weightDeg 和 normDeg 设置为 FALSE，以便将节点度（node degree）简单定义为与该节点相邻的节点数。

props_single <- netAnalyze(net_single, 
                           centrLCC = TRUE,
                           clustMethod = "cluster_fast_greedy",
                           hubPar = "eigenvector",
                           weightDeg = FALSE, normDeg = FALSE)

#?summary.microNetProps
summary(props_single, numbNodes = 5L)

## 
## Component sizes
## ```````````````         
## size: 100
##    #:   1
## ______________________________
## Global network properties
## `````````````````````````
##                                  
## Number of components      1.00000
## Clustering coefficient    0.32020
## Moduarity                 0.51909
## Positive edge percentage 92.01278
## Edge density              0.06323
## Natural connectivity      0.01454
## Vertex connectivity       1.00000
## Edge connectivity         1.00000
## Average dissimilarity*    0.97984
## Average path length**     2.22537
## 
##  *Dissimilarity = 1 - edge weight
## **Path length: Units with average dissimilarity
## 
## ______________________________
## Clusters
## - In the whole network
## - Algorithm: cluster_fast_greedy
## ```````````````````````````````` 
##                       
## name:  1  2  3  4  5 6
##    #: 15 25 14 21 20 5
## 
## ______________________________
## Hubs
## - In alphabetical/numerical order
## - Based on empirical quantiles of centralities
## ```````````````````````````````````````````````       
##  189396
##  191687
##  293896
##  364563
##  544358
## 
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
##          
## 293896 15
## 544358 14
## 174012 13
## 364563 12
## 311477 12
## 
## Betweenness centrality (normalized):
##               
## 175617 0.14533
## 175537 0.11833
## 165261 0.10142
## 185451 0.09462
## 268332 0.08493
## 
## Closeness centrality (normalized):
##               
## 544358 0.71315
## 293896 0.69119
## 191687 0.67098
## 311477 0.66954
## 174012 0.66699
## 
## Eigenvector centrality (normalized):
##               
## 293896 1.00000
## 191687 0.87429
## 364563 0.86119
## 189396 0.83855
## 544358 0.81202

网络可视化

我们用 cluster 定义节点颜色，并用节点的特征向量中心度定义节点大小。

# help page
?plot.microNetProps

p <- plot(props_single, 
          nodeColor = "cluster", 
          nodeSize = "eigenvector",
          title1 = "Network on OTU level with SPRING associations", 
          showTitle = TRUE,
          cexTitle = 2.3)

legend(0.7, 1.1, cex = 2.2, title = "estimated association:",
       legend = c("+","-"), lty = 1, lwd = 3, col = c("#009900","red"), 
       bty = "n", horiz = TRUE)

网络比较

下面将演示如何使用 NetCoMi 比较两个网络。

网络构建

使用 "SEASONAL_ALLERGIES" 变量将数据集分为两组，这里用到的 metagMisc 包可根据变量拆分 phyloseq 对象。接下来将两个 phyloseq 对象输入 NetCoMi。（我们选择方差最大的 50 个节点以节省时间）

# devtools::install_github("vmikk/metagMisc")

# Split the phyloseq object into two groups
amgut_split <- metagMisc::phyloseq_sep_variable(amgut2.filt.phy, 
                                                "SEASONAL_ALLERGIES")

# Network construction
net_season <- netConstruct(data = amgut_split$no, 
                           data2 = amgut_split$yes,  
                           filtTax = "highestVar",
                           filtTaxPar = list(highestVar = 50),
                           measure = "spring",
                           measurePar = list(nlambda=10, 
                                             rep.num=10),
                           normMethod = "none", 
                           zeroMethod = "none",
                           sparsMethod = "none", 
                           dissFunc = "signed",
                           verbose = 3,
                           seed = 123456)

## Data filtering ...

## 95 taxa removed in each data set.

## 1 rows with zero sum removed in group 1.

## 1 rows with zero sum removed in group 2.

## 43 taxa and 162 samples remaining in group 1.

## 43 taxa and 120 samples remaining in group 2.

## 
## Calculate 'spring' associations ...

## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done

## Done.
## 
## Calculate associations in group 2 ...

## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done
## The input is identified as the covariance matrix.
## Conducting Meinshausen & Buhlmann graph estimation (mb)....done

## Done.

网络分析

将包含两个网络的 netConstruct() 对象传递到 netAnalyze()。同时为两个网络计算网络属性。

以下参数仅仅是为了演示 netAnalyze() 的功能，所选参数并非最佳。

centrLCC = FALSE：为所有节点（不仅仅是 LCC）计算中心度。
avDissIgnoreInf = TRUE：计算平均相异度时，具有无限相异度的节点将被忽略。
sPathNorm = FALSE：最短路径将不用平均相异度进行归一化。
hubPar = c("degree", "between", "closeness")： Hubs 定义为同时具有最高 degree，betweenness 和 closeness centrality 的节点。
lnormFit = TRUE 和 hubQuant = 0.9：用对数正态分布拟合中心度值，以标识具有“最高”中心度值的节点。如果该节点的三个中心度度量都超过 90％分位数，则将该节点标识为 hub。
非标准化中心度用于所有的四个度量。

「注意：这里的参数必须根据研究问题仔细斟酌。并非所有情况下都适合 NetCoMi 的默认值！」

props_season <- netAnalyze(net_season, 
                           centrLCC = FALSE,
                           avDissIgnoreInf = TRUE,
                           sPathNorm = FALSE,
                           clustMethod = "cluster_fast_greedy",
                           hubPar = c("degree", "between", "closeness"),
                           hubQuant = 0.9,
                           lnormFit = TRUE,
                           normDeg = FALSE,
                           normBetw = FALSE,
                           normClose = FALSE,
                           normEigen = FALSE)

summary(props_season)

## 
## Component sizes
## ```````````````
## Group 1:            
## size: 31 8 1
##    #:  1 1 4
## 
## Group 2:          
## size: 41 1
##    #:  1 2
## ______________________________
## Global network properties
## `````````````````````````
## Largest connected component (LCC):
##                          group '1' group '2'
## Relative LCC size          0.72093   0.95349
## Clustering coefficient     0.27184   0.29563
## Moduarity                  0.51794   0.54832
## Positive edge percentage 100.00000 100.00000
## Edge density               0.11183   0.09512
## Natural connectivity       0.04296   0.03257
## Vertex connectivity        1.00000   1.00000
## Edge connectivity          1.00000   1.00000
## Average dissimilarity*     0.68091   0.67853
## Average path length**      2.23414   2.49352
## 
## Whole network:
##                          group '1' group '2'
## Number of components       6.00000   3.00000
## Clustering coefficient     0.31801   0.29563
## Moduarity                  0.62749   0.54832
## Positive edge percentage 100.00000 100.00000
## Edge density               0.06977   0.08638
## Natural connectivity       0.02979   0.03072
## 
##  *Dissimilarity = 1 - edge weight
## **Path length: Sum of dissimilarities along the path
## 
## ______________________________
## Clusters
## - In the whole network
## - Algorithm: cluster_fast_greedy
## ```````````````````````````````` 
## group '1':                 
## name: 0  1  2 3 4
##    #: 4 10 13 8 8
## 
## group '2':                  
## name: 0 1  2 3 4 5
##    #: 2 6 11 8 8 8
## 
## ______________________________
## Hubs
## - In alphabetical/numerical order
## - Based on log-normal quantiles of centralities
## ```````````````````````````````````````````````
##  group '1' group '2'
##               322235
## 
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
##         group '1' group '2'
##  322235         7         9
##  364563         7         5
##  259569         7         5
##  184983         6         5
##    9715         5         4
##            ______    ______
##  322235         7         9
##  363302         4         9
##  158660         3         6
##   90487         3         5
##  188236         4         5
## 
## Betweenness centrality (unnormalized):
##         group '1' group '2'
##  364563       148        82
##  188236       144        87
##  259569       122        39
##  331820       115         8
##  322235       106       226
##            ______    ______
##  158660         0       317
##  470239         5       256
##   10116         0       233
##  322235       106       226
##  326792         0       161
## 
## Closeness centrality (unnormalized):
##         group '1' group '2'
##  364563  22.67516  25.82886
##  259569  22.19619  23.95399
##  322235   22.1629   30.0221
##  188236   21.1083  26.37517
##  184983   20.0581  22.58796
##            ______    ______
##  322235   22.1629   30.0221
##  158660  17.64634  27.65421
##  363302  17.27059  27.61001
##  326792  17.95234  26.47248
##  188236   21.1083  26.37517
## 
## Eigenvector centrality (unnormalized):
##         group '1' group '2'
##  364563   0.30442    0.2137
##  184983   0.29042   0.26935
##  188236    0.2392   0.23412
##  516022   0.23484   0.14094
##  190464   0.22311   0.21337
##            ______    ______
##  363302   0.21617   0.38157
##  322235   0.14538   0.29676
##  194648   0.17436   0.28061
##  184983   0.29042   0.26935
##  188236    0.2392   0.23412

在上述设置中，只有一个 hub 节点被识别出来。

网络可视化比较

首先，在两组中分别计算网络布局。

由于 SPRING 使用 mclr 变换作为归一化方法，因此根据 mclr 转换后的数据对节点大小进行了缩放。

节点颜色表示不同的 cluster。注意，默认情况下，如果两个 cluster 中至少有两个公共节点（sameColThresh = 2），则两个 cluster 的颜色相同。将 sameClustCol 设置为 FALSE 可获得不同的 cluster 颜色。

plot(props_season, 
     sameLayout = FALSE, 
     nodeColor = "cluster",
     nodeSize = "mclr",
     labelScale = FALSE,
     cexNodes = 1.5, 
     cexLabels = 2.5,
     cexHubLabels = 3,
     cexTitle = 3.7,
     groupNames = c("No seasonal allergies", "Seasonal allergies"),
     hubBorderCol  = "gray40")

legend("bottom", title = "estimated association:", legend = c("+","-"), 
       col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4, 
       bty = "n", horiz = TRUE)

使用不同的布局会为两组分别生成一个“漂亮的”网络图，但要想一眼就看出两组之间的差异还是不够直观。

因此，我们现在尝试在两个组中使用相同的网络布局。在下面，为组1（左网络）计算布局，并为组2 分配该布局。

rmSingles 设置为 "inboth"，因为如果使用相同的布局，则只能删除两个组中都未连接的节点。

plot(props_season, 
     sameLayout = TRUE, 
     layoutGroup = 1,
     rmSingles = "inboth", 
     nodeSize = "mclr", 
     labelScale = FALSE,
     cexNodes = 1.5, 
     cexLabels = 2.5,
     cexHubLabels = 3,
     cexTitle = 3.8,
     groupNames = c("No seasonal allergies", "Seasonal allergies"),
     hubBorderCol  = "gray40")

legend("bottom", title = "estimated association:", legend = c("+","-"), 
       col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4, 
       bty = "n", horiz = TRUE)

在上图中，我们可以看到两组之间的明显差异。例如，“季节性过敏”组中的 OTU“ 322235”比非季节性过敏组中的联系更紧密，这就是为什么它是右侧 Hub 而不是左侧的原因。

由于简单地将一个组的布局接至另一个组通常会导致其中一个组生成的图比较难看，因此 NetCoMi（> = 1.0.2）提供了另一种选择（layoutGroup = "union"），两组的布局将结合起来使用。这样，两个网络的节点都将尽可能均等地放置在最佳位置。

plot(props_season, 
     sameLayout = TRUE, 
     layoutGroup = "union",
     rmSingles = "inboth", 
     nodeSize = "mclr", 
     labelScale = FALSE,
     cexNodes = 1.5, 
     cexLabels = 2.5,
     cexHubLabels = 3,
     cexTitle = 3.8,
     groupNames = c("No seasonal allergies", "Seasonal allergies"),
     hubBorderCol  = "gray40")

legend("bottom", title = "estimated association:", legend = c("+","-"), 
       col = c("#009900","red"), inset = 0.02, cex = 4, lty = 1, lwd = 4, 
       bty = "n", horiz = TRUE)

量化网络的比较

这里为了加速运行时间，我们不使用置换检验，将 permTest 参数设置为 FALSE（但在实际分析中还是得加上这步）。

comp_season <- netCompare(props_season, permTest = FALSE, verbose = FALSE)

summary(comp_season, 
        groupNames = c("No allergies", "Allergies"),
        showCentr = c("degree", "between", "closeness"), 
        numbNodes = 5)

## 
## Comparison of Network Properties
## ----------------------------------
## CALL: 
## netCompare(x = props_season, permTest = FALSE, verbose = FALSE)
## 
## ______________________________
## Global network properties
## `````````````````````````
## Largest connected component (LCC):
##                          No allergies   Allergies    difference
## Relative LCC size               0.721       0.953         0.233
## Clustering coefficient          0.272       0.296         0.024
## Moduarity                       0.518       0.548         0.030
## Positive edge percentage      100.000     100.000         0.000
## Edge density                    0.112       0.095         0.017
## Natural connectivity            0.043       0.033         0.010
## Vertex connectivity             1.000       1.000         0.000
## Edge connectivity               1.000       1.000         0.000
## Average dissimilarity*          0.681       0.679         0.002
## Average path length**           2.234       2.494         0.259
## 
## Whole network:
##                          No allergies   Allergies    difference
## Number of components            6.000       3.000         3.000
## Clustering coefficient          0.318       0.296         0.022
## Moduarity                       0.627       0.548         0.079
## Positive edge percentage      100.000     100.000         0.000
## Edge density                    0.070       0.086         0.017
## Natural connectivity            0.030       0.031         0.001
## -----
##  *: Dissimilarity = 1 - edge weight
## **Path length: Sum of dissimilarities along the path
## 
## ______________________________
## Jaccard index (similarity betw. sets of most central nodes)
## ``````````````````````````````````````````````````````````
##                     Jacc   P(<=Jacc)     P(>=Jacc)    
## degree             0.286    0.475500      0.738807    
## betweenness centr. 0.077    0.038537 *    0.994862    
## closeness centr.   0.267    0.404065      0.790760    
## eigenvec. centr.   0.667    0.996144      0.018758 *  
## hub taxa           0.000    0.666667      1.000000    
## -----
## Jaccard index ranges from 0 (compl. different) to 1 (sets equal)
## 
## ______________________________
## Adjusted Rand index (similarity betw. clusterings)
## ``````````````````````````````````````````````````
##    ARI       p-value
##  0.327             0
## -----
## ARI in [-1,1] with ARI=1: perfect agreement betw. clusterings,
##                    ARI=0: expected for two random clusterings
## p-value: two-tailed test with null hypothesis ARI=0
## 
## ______________________________
## Centrality measures
## - In decreasing order
## - Computed for the complete network
## ````````````````````````````````````
## Degree (unnormalized):
##        No allergies Allergies abs.diff.
## 363302            4         9         5
## 549871            4         0         4
## 469709            1         4         3
## 158660            3         6         3
## 181016            0         3         3
## 
## Betweenness centrality (unnormalized):
##        No allergies Allergies abs.diff.
## 158660            0       317       317
## 470239            5       256       251
## 10116             0       233       233
## 326792            0       161       161
## 322235          106       226       120
## 
## Closeness centrality (unnormalized):
##        No allergies Allergies abs.diff.
## 181016        0.000    22.880    22.880
## 361496        0.000    22.253    22.253
## 549871       19.787     0.000    19.787
## 278234        0.000    15.093    15.093
## 10116         7.039    20.965    13.925
## 
## _________________________________________________________
## Significance codes: ***: 0.001, **: 0.01, *: 0.05, .: 0.1

差异网络

下面，我们将建立一个差异关联网络，如果两个节点在两个组之间具有显著差异的关联，则将两个节点连接在一起。

为了缩短运行时间，这里我们使用 Pearson 相关性来估计 OTUs 之间的关联。

Fisher's z 检验用于识别差异关联的 OTUs。使用 local false discovery rate 校正 P 值。

注意：这里 sparsMethod 设置为 "none"，只是为了能够将所有差异关联包括在关联网络图中。但是，差异网络始终基于稀疏化之前的估计关联矩阵。

net_season_pears <- netConstruct(data = amgut_split$no, 
                                 data2 = amgut_split$yes, 
                                 filtTax = "highestVar",
                                 filtTaxPar = list(highestVar = 50),
                                 measure = "pearson", 
                                 normMethod = "clr",
                                 sparsMethod = "none", 
                                 thresh = 0.2,
                                 verbose = 3)

## Infos about changed arguments:

## Zero replacement needed for clr transformation. 'multRepl' used.

## Data filtering ...

## 95 taxa removed in each data set.

## 1 rows with zero sum removed in group 1.

## 1 rows with zero sum removed in group 2.

## 43 taxa and 162 samples remaining in group 1.

## 43 taxa and 120 samples remaining in group 2.

## 
## Zero treatment in group 1:

## Execute multRepl() ... Done.
## 
## Zero treatment in group 2:
## Execute multRepl() ... Done.
## 
## Normalization in group 1:
## Execute clr(){SpiecEasi} ... Done.
## 
## Normalization in group 2:
## Execute clr(){SpiecEasi} ... Done.
## 
## Calculate 'pearson' associations ... Done.
## 
## Calculate associations in group 2 ... Done.

# Differential network construction
diff_season <- diffnet(net_season_pears,
                       diffMethod = "fisherTest", 
                       adjust = "lfdr")

## Adjust for multiple testing using 'lfdr' ... 
## Execute fdrtool() ...

## Step 1... determine cutoff point
## Step 2... estimate parameters of null distribution and eta0
## Step 3... compute p-values and estimate empirical PDF/CDF
## Step 4... compute q-values and local fdr

## Done.

# Differential network plot
plot(diff_season, 
     cexNodes = 0.8, 
     cexLegend = 3,
     cexTitle = 4,
     mar = c(2,2,8,5),
     legendGroupnames = c("group 'no'", "group 'yes'"),
     legendPos = c(0.7,1.6))

在上面显示的差异网络中， edge 颜色代表两组中的关联方向。例如，如果两个 OTU 在组1 中正相关，而在组2 中负相关（如“ 191541”和“ 188236”），则相应的 edge 为青色。

我们还通过构建仅包含差异关联的 OTU 的关联网络来进一步探究。

props_season_pears <- netAnalyze(net_season_pears, 
                                 clustMethod = "cluster_fast_greedy",
                                 weightDeg = TRUE,
                                 normDeg = FALSE)

# Identify the differentially associated OTUs
diffmat_sums <- rowSums(diff_season$diffAdjustMat)
diff_asso_names <- names(diffmat_sums[diffmat_sums > 0])

plot(props_season_pears, 
     nodeFilter = "names",
     nodeFilterPar = diff_asso_names,
     nodeColor = "gray",
     highlightHubs = FALSE,
     sameLayout = TRUE, 
     layoutGroup = "union",
     rmSingles = FALSE, 
     nodeSize = "clr",
     edgeTranspHigh = 20,
     labelScale = FALSE,
     cexNodes = 1.5, 
     cexLabels = 3,
     cexTitle = 3.8,
     groupNames = c("No seasonal allergies", "Seasonal allergies"),
     hubBorderCol  = "gray40")

legend(-0.15,-0.7, title = "estimated correlation:", legend = c("+","-"), 
       col = c("#009900","red"), inset = 0.05, cex = 4, lty = 1, lwd = 4, 
       bty = "n", horiz = TRUE)

我们可以看到，上述 OTUs 191541 与 188236 在左组为强正相关，而在右组则为负相关。

相异度网络

如果使用相异度构建网络，则是将每个样本作为节点。相异度被转换为相似度，用作 edge 权重，以便使具有相似微生物组成的样本在网络图中更接近。

下面，我们将使用适合于组成数据的 Aitchison 距离构建网络。由于 Aitchison 距离基于 clr 变换，因此需要替换掉数据中的零。这里用使用 knn 算法稀疏网络。

net_aitchison <- netConstruct(amgut1.filt,
                              measure = "aitchison",
                              zeroMethod = "multRepl",
                              sparsMethod = "knn", 
                              kNeighbor = 3,
                              verbose = 3)

## Infos about changed arguments:

## Counts normalized to fractions for measure 'aitchison'.

## 127 taxa and 289 samples remaining.

## 
## Zero treatment:

## Execute multRepl() ... Done.
## 
## Normalization:
## Counts normalized by total sum scaling.
## 
## Calculate 'aitchison' dissimilarities ... Done.
## 
## Sparsify dissimilarities via 'knn' ... Done.

对于 cluster 检测，我们使用基于 average linkage 的层次聚类。函数内部，将 k=3 从 stats 包传递到 cutree()，从而将树分割成 3 个clusters。

props_aitchison <- netAnalyze(net_aitchison,
                              clustMethod = "hierarchical",
                              clustPar = list(method = "average", k = 3),
                              hubPar = "eigenvector")

plot(props_aitchison, 
     nodeColor = "cluster", 
     nodeSize = "eigenvector",
     hubTransp = 40,
     edgeTranspLow = 60,
     charToRm = "00000",
     mar = c(1, 3, 3, 5))

# get green color with 50% transparency
green2 <- colToTransp("#009900", 40)

legend(0.4, 1.1,
       cex = 2.2,
       legend = c("high similarity (low Aitchison distance)",
                  "low similarity (high Aitchison distance)"), 
       lty = 1, 
       lwd = c(3, 1),
       col = c("darkgreen", green2),
       bty = "n")