Use of SCNIC influences the detection of OTUs that differ between MSM and non-MSM
We next evaluated the effects of applying SCNIC with default SparCC and SMD parameters and varying R-value thresholds on downstream statistical analysis results. To investigate differential abundance based on MSM status in the HIV dataset we used ANCOM[28]. After removing taxa that were not present in at least 5% of the samples the OTU table had 317 samples and 639 OTUs. We found that 12 OTUs were significantly different between MSM and non-MSM without using SCNIC. Using SCNIC at R-values of 0.2, 0.4, and 0.65 and running ANCOM on the filtered output feature table, we found that most of the significant features were modules at an R-value of 0.2 and 0.4 but not 0.65 (e.g. 14 of the 15 significant features were modules at R=0.2) (Table 1). This was the case even though the vast majority of OTUs were not a part of modules at the 0.4 R-value threshold (Figure 4A). The majority of 12 of the OTUs that were significant without running SCNIC, were grouped into modules with each other and with OTUs that were not individually significant without running SCNIC. These significant modules contained 74, 26, and 1 new OTU at R-values of 0.2, 0.4 and 0.65 respectively. Using SCNIC also resulted in the identification of 1, 5 and 25 (at R-values of 0.2, 0.4 and 0.65) OTUs that were individually significant that were not significant without running SCNIC, with no OTUs that were individually significant losing significance because they were binned in a module, indicating an increase in statistical power resulting from running a test like ANCOM that controls the FDR.
Considering correlation structure of significant features can help in understanding the broader community context of bacteria that differ with MSM status. In module-0 for each of the R-values, which significantly differed by MSM status in all cases, Prevotella was the dominant genus (Figure 5). At an R-value of 0.65, all OTUs in module-0 were assigned to the genus Prevotella (Figure 5C). However, at an R-value of 0.4 module-0 included sevenPrevotella OTUs, one Dialister , and an unidentified member of the Paraprevotellaceae family. At the R-value of 0.2,Prevotella accounted for 13 of the 25 OTUs and 11 of the 12 pre-SCNIC significant OTUs were all found in this module. This suggests that individual OTUs that differ with MSM status may in some cases be a part of a consortium of diverse members that collectively display features that may contribute to differences in microbiome function.
To further explore this concept, we investigated the results generated with an R-value of 0.4, as the significant features maintain a strong level of correlation while being phylogenetically diverse. When running ANCOM on this feature table, we found that these individually significant OTUs tended to be joined into modules with other highly co-correlated microbes and that these modules significantly differed with MSM (Figure 6). Of particular note, we observe that the modules and taxa that are significantly related to MSM do not all correlate with each other. At the R-value of 0.4, module-36 contains two taxa,Erysipelotrichaceae and Clostridium that are negatively correlated with the other significant taxa and modules (Figure 6). Module-2 contains Eubacterium, Catenibacterium andPrevotella which are phylogenetically heterogenous but mutually co-occurring. A follow up experiment, which leverages insights that SCNIC generates, may combine different strains of microbes to assemble a community type to test for functional correlates of disease.