Evaluating the SMD algorithm using simulated data
Since SMD has not been applied to microbiome module detection before, we
compared SMD to LMM using simulated data. In order to evaluate the
performance of SMD for module detection under different parameter
settings and compare it to LMM, we simulated a wide range of networks.
The simulations had networks with similar characteristics to those seen
in networks generated from microbiome datasets. These included networks
with power law degree distributions (N=175) with values of ɑ, the
exponent term of the power law formula (y =kX-ɑ ), varying between 1.8 and 2.6, as well as
networks with regular degree distributions (N=200) with p , the
probability of one node being connected to another, varying from 0.001
to 0.2. The power law and regular degree distribution networks were
created using the NetworkX v2.6.3 implementations of
configuration_model and erdos_renyi_graph, respectively, and all had
a size of 500. The networks with power law degree distributions had
modularity values between 0.2 and 0.9, with higher ɑ corresponding to
higher modularity, and the networks with regular degree distributions
had modularity values between 0.07 and 0.98, with lower pcorresponding to higher modularity. Higher modularity scores indicate
many connections within modules and fewer connections between modules.
We then calculated SMD and LMM partitions (with LMM gamma = 1) of each
network and compared the homogeneity between the two partitions. Because
SMD modules are smaller than LMM modules, we used the homogeneity metric
described by Rosenberg and Hirschberg [42] (implemented via
Scikit-learn v0.24.2) to assess whether nodes partitioned together by
SMD are a subset of the module partitioned by LMM. A score of 1
represents that all nodes in SMD modules represent sub-modules of
LMM-partitioned modules, whereas a score of 0 represents that no two
nodes that were classified by SMD into the same module were partitioned
into a module together by the LMM method.