ABSTRACT
Background
Microbiome studies are often limited by a lack of statistical power due
to small sample sizes and a large number of features. This problem is
exacerbated in correlative studies of multi-omic datasets. Statistical
power can be increased by finding and summarizing modules of correlated
observations, which is one dimensionality reduction method.
Additionally, modules provide biological insight as correlated groups of
microbes can have relationships among themselves.
Results
To address these challenges, we developed SCNIC: Sparse Cooccurrence
Network Investigation for compositional data. SCNIC is open-source
software that can generate correlation networks and detect and summarize
modules of highly correlated features. Modules can be formed using
either the Louvain Modularity Maximization (LMM) algorithm or a Shared
Minimum Distance algorithm (SMD) that we newly describe here and relate
to LMM using simulated data. We applied SCNIC to two published datasets
and we achieved increased statistical power and identified microbes that
not only differed across groups, but also correlated strongly with each
other, suggesting shared environmental drivers or cooperative
relationships among them.
Conclusions
SCNIC provides an easy way to generate correlation networks, identify
modules of correlated features and summarize them for downstream
statistical analysis. Although SCNIC was designed considering properties
of microbiome data, such as compositionality and sparsity, it can be
applied to a variety of data types including metabolomics data and used
to integrate multiple data types. SCNIC allows for the identification of
functional microbial relationships at scale while increasing statistical
power through feature reduction.