Demonstrating the use of SCNIC
We demonstrate the use of SCNIC with two example datasets. These are 1)
a study that used 16S rRNA sequencing of fecal material to compare
microbiome composition in individuals with and without HIV and in men
who have sex with men (MSM) who were at a high risk of contracting HIV
[43], and 2) a dataset analyzing the microbiome of water samples at
various depths in two of the Great Lakes. We chose these two datasets so
that we could evaluate performance using datasets from both
host-associated and free-living microbiomes. We also used the Great
Lakes dataset to compare module size and modularity between SMD and LMM
selected modules.
HIV dataset:
The HIV data set was retrieved from NCBI SRA accession number SRP068240,
and samples from the BCN0 cohort were used for these analyses. Reads
were error corrected, quality trimmed, and primers were removed using
default parameters in BBTools [44]. DADA2 [45] was used to
define amplicon sequence variants (ASVs) with reads trimmed from the
left by 30 base pairs and truncated at 269. ASVs were binned into
operational taxonomic units (OTUs) using USEARCH [46] at 99%
identity using QIIME 1 [47]. A phylogenetic tree was made using a
single representative sequence from each OTU and the SEPP protocol
[48, 49] using QIIME 2 [34]. We evaluated the average
phylogenetic distance between OTUs in the same module using thedistance method of Biopython [50, 51]. Taxonomy was assigned
using the Naive Bayes QIIME 2 feature classifier, version
gg-13-8-99-515-806-nb-classifier.qza.
The original study describing these data showed a strong divergence in
gut microbiome composition in MSM compared to non-MSM independent of HIV
infection status and more subtle differences associated with HIV
infection when controlling for MSM behavior. The goal of our analysis
was to evaluate whether comparing gut microbiome composition between HIV
negative MSM and non-MSM with SCNIC modules provide additional
significant taxa compared to without, and additional insights as to
which taxa that differ with MSM also are in turn demonstrating
co-correlated structure with each other. Co-correlation of microbes may
indicate that they are a part of a broader community type, interact with
each other, or have shared environmental drivers of their prevalence. A
further goal of this analysis is to examine the effects of using
different R-value thresholds on the results. The SMD method was
specifically used with SparCC R-value thresholds between 0.20 and 1.0,
with 0.05 increments.
Great Lakes dataset
The Great Lakes dataset was previously published as part of the Earth
Microbiome Project [52]. This study evaluated patterns of microbial
relative abundance across depths in Lake Michigan (N=16) and Lake
Superior (N=33), with depth of samples collected ranging from 5 to 3654
meters. The study additionally recorded data on pH and salinity. The
Great Lakes data set was retrieved from QIITA accession number 1041
[53]. ASVs were found using DADA2 with a left trim of 30 and a
truncation length of 135. OTUs were subsequently picked on the ASVs
using VSEARCH [54] with a 99% identity threshold, resulting in
3,871 OTUs. These steps were done with QIIME 2 [34]. SCNIC was
applied with the SMD method and .2, .4 and .65 R-value thresholds.
Comparison of SMD to LMM using the Great Lakes dataset
To identify differences in module structure from SMD versus LMM
partitions, we assessed the module size and modularity of 221 separately
partitioned networks from the Great Lakes dataset using varying
parameters for SCNIC. The parameters included SCNIC R thresholds ranging
from 0.1 to 0.7 and gamma ranging from 0.15 to 0.9 for LMM.