2 METHODS
2.1 Integrative Systems Biology Workflow. We developed and applied a systems biology workflow to study BCG network pharmacology and prioritize small-molecule BCG mimics and antivirals. Our workflow (Figure1) incorporates three major components: (1) a module for mining and prioritizing gene signatures representative of a condition or a biological state; (2) a network-mining module to identify genetic perturbations that induce gene expression profiles that are highly enriched with the genes constituting the condition gene signature; and (3) a pathway enrichment module to understand the biological processes involved in the mechanism of action of BCG and highly correlated genetic perturbagens.
2.2 BCG consensus gene signature. A consensus gene signature for BCG vaccine was derived from gene expression profiles in peripheral blood mononuclear cells (PBMCs) in response to a BCG challenge test reported by Matsumiya et al , GSE58636 dataset on NCBI Gene Expression Omnibus (GEO). All whole blood samples were collected from healthy human subjects enrolled in phase 1 trial (clinical trials registration: NCT01194180). For the purposes of this study we used the gene expression profiles generated from two human subject groups included in the above trial: group 1 (BCG naive), and group 2 (BCG vaccinated; median time since vaccination, 10 years). The consensus gene signature we prepared to study network pharmacology and query the connectivity map consisted of the genes that showed significant differential gene expression in response to a BCG challenge test (stimulated) in comparison with controls (unstimulated) on days 0 and 14 in both groups 1 and 2.
2.3 Network Building. A systematic search, for nearest neighbor (NN) genes/proteins of the upregulated and downregulated genes in BCG’s gene signature, was conducted in Cytoscape version 3.8.0 using the STRING protein query application. All retrieved protein-protein interactions (PPIs), including both physical and functional interactions were retrieved from popular databases such as MINT, HPRD, BIND, DIP, BioGRID, KEGG, Reactome, EcoCyc, NCI-Nature Pathway Interaction Database, and Gene Ontology (GO) protein complexes. Network building tools in Cytoscape version 3.7.2 were used to generate PPI networks for BCG-CGS.
2.4 Enrichment Analysis. Enrichment analysis was conducted in Cytoscape and MetaCore to identify pathways and biological processes associated with BCG-CGS and CMap genetic connections. The significance of the enrichment was determined by the hypergeometric test. All terms from the ontology are ranked based on their calculated p-values. Ontology terms with p-values less than the p-value threshold 0.05 are defined as statistically significant and therefore relevant to the studied list of genes. All terms from the ontology are ranked according to their calculated p-values.
2.5 The Connectivity Map (CMap). The CMap is a chemogenomics database that catalogs 1.3 Million profiles of transcriptional responses of human cells to chemical and genetic perturbations. Currently, there are 27,927 perturbagens (19,911 small molecules, and 7,494 genetic perturbagens) producing 476,251 expression signatures in 9 human cell lines: PC3, VCAP, A375, A459, HA1E, HCC515, HT29, MCF7, HEPG2. This database of cellular signatures has been produced using the L1000 platform; a high-throughput gene expression assay that measures the mRNA transcript abundance of 978 ”landmark” genes from human cells.
2.6 Causal Reasoning. Causal reasoning analysis identify genes and proteins of a ‘topological significance’ in order to make decisions whether these genes/proteins are eligible for targeting in the studied phenotype. In this study we applied causal reasoning to identify molecular regulators that most likely directly cause the observed expression changes in transcriptional profiles in response to BCG. In this approach, changes in gene expression, both direction and effect of edges in the network are taken into account. For each node (i.e., gene) in causal reasoning network, observed changes in expression are matched with the expected changes inferred from network structure given the hypothesis that the observed gene expression is decreased or increased due to its activity. Each node has outgoing activation or inhibition effects on other objects in the knowledge database, and a key hub with a predicted increase in activity shows increased expression for those genes that the hub is known to activate, and it shows decreased expression for genes it is known to inhibit. Each predicted key hub has a prediction P-value which is produced as a result of a binomial test used to assess the probability of making a given number of supportive data out of all defined differentially expressed genes (DEGs) in examined data. It is noteworthy that causal reasoning examines both direct neighbors of differentially expressed genes, and remote (several steps away) regulators. All causal reasoning predictions were performed in Key Pathway Advisor from Clarivate Analytics, using the Pollard method.
2.7 R-package gplots. gplots v3.0.1.2 was used as for plotting enhanced heatmaps for transcriptional data (e.g., heatmap representing BCG-CGS in Figure 2). Heat maps were generated using the heatmap.2 function included in this package.