2 METHODS
2.1 Integrative Systems Biology Workflow. We developed and
applied a systems biology workflow to study BCG network pharmacology and
prioritize small-molecule BCG mimics and antivirals. Our workflow
(Figure1) incorporates three major components: (1) a module for mining
and prioritizing gene signatures representative of a condition or a
biological state; (2) a network-mining module to identify genetic
perturbations that induce gene expression profiles that are highly
enriched with the genes constituting the condition gene signature; and
(3) a pathway enrichment module to understand the biological processes
involved in the mechanism of action of BCG and highly correlated genetic
perturbagens.
2.2 BCG consensus gene signature. A consensus gene signature
for BCG vaccine was derived from gene expression profiles in peripheral
blood mononuclear cells (PBMCs) in response to a BCG challenge test
reported by Matsumiya et al , GSE58636 dataset on NCBI Gene
Expression Omnibus (GEO). All whole blood samples were collected from
healthy human subjects enrolled in phase 1 trial (clinical trials
registration: NCT01194180). For the purposes of this study we used the
gene expression profiles generated from two human subject groups
included in the above trial: group 1 (BCG naive), and group 2 (BCG
vaccinated; median time since vaccination, 10 years). The consensus gene
signature we prepared to study network pharmacology and query the
connectivity map consisted of the genes that showed significant
differential gene expression in response to a BCG challenge test
(stimulated) in comparison with controls (unstimulated) on days 0 and 14
in both groups 1 and 2.
2.3 Network Building. A systematic search, for nearest neighbor
(NN) genes/proteins of the upregulated and downregulated genes in BCG’s
gene signature, was conducted in Cytoscape version 3.8.0 using the
STRING protein query application. All retrieved protein-protein
interactions (PPIs), including both physical and functional interactions
were retrieved from popular databases such as MINT, HPRD, BIND, DIP,
BioGRID, KEGG, Reactome, EcoCyc, NCI-Nature Pathway Interaction
Database, and Gene Ontology (GO) protein complexes. Network building
tools in Cytoscape version 3.7.2 were used to generate PPI networks for
BCG-CGS.
2.4 Enrichment Analysis. Enrichment analysis was conducted in
Cytoscape and MetaCore to identify pathways and biological processes
associated with BCG-CGS and CMap genetic connections. The significance
of the enrichment was determined by the hypergeometric test. All terms
from the ontology are ranked based on their calculated p-values.
Ontology terms with p-values less than the p-value threshold 0.05 are
defined as statistically significant and therefore relevant to the
studied list of genes. All terms from the ontology are ranked according
to their calculated p-values.
2.5 The Connectivity Map (CMap). The CMap is a chemogenomics
database that catalogs 1.3 Million profiles of transcriptional responses
of human cells to chemical and genetic perturbations. Currently, there
are 27,927 perturbagens (19,911 small molecules, and 7,494 genetic
perturbagens) producing 476,251 expression signatures in 9 human cell
lines: PC3, VCAP, A375, A459, HA1E, HCC515, HT29, MCF7, HEPG2. This
database of cellular signatures has been produced using the L1000
platform; a high-throughput gene expression assay that measures the mRNA
transcript abundance of 978 ”landmark” genes from human cells.
2.6 Causal Reasoning. Causal reasoning analysis identify genes
and proteins of a ‘topological significance’ in order to make decisions
whether these genes/proteins are eligible for targeting in the studied
phenotype. In this study we applied causal reasoning to identify
molecular regulators that most likely directly cause the observed
expression changes in transcriptional profiles in response to BCG. In
this approach, changes in gene expression, both direction and effect of
edges in the network are taken into account. For each node (i.e., gene)
in causal reasoning network, observed changes in expression are matched
with the expected changes inferred from network structure given the
hypothesis that the observed gene expression is decreased or increased
due to its activity. Each node has outgoing activation or inhibition
effects on other objects in the knowledge database, and a key hub with a
predicted increase in activity shows increased expression for those
genes that the hub is known to activate, and it shows decreased
expression for genes it is known to inhibit. Each predicted key hub has
a prediction P-value which is produced as a result of a binomial test
used to assess the probability of making a given number of supportive
data out of all defined differentially expressed genes (DEGs) in
examined data. It is noteworthy that causal reasoning examines both
direct neighbors of differentially expressed genes, and remote (several
steps away) regulators. All causal reasoning predictions were performed
in Key Pathway Advisor from Clarivate Analytics, using the Pollard
method.
2.7 R-package gplots. gplots v3.0.1.2 was used as for plotting
enhanced heatmaps for transcriptional data (e.g., heatmap representing
BCG-CGS in Figure 2). Heat maps were generated using the heatmap.2
function included in this package.