DsrAB Database Generation and Phylogenetic Analyses
To evaluate the structural properties that confer Dsr function over broad geochemical space, an existing DsrAB database was curated from cultivar and environmental genomes (23). The original database was constructed to include DsrAB from genomes of cultivars, targeted PCR-based surveys, and metagenomic data. Thus, many of the sequences were incomplete, potentially confounding structural modeling calculations. Therefore, sequence alignments and annotations were used to curate the database to comprise only full length DsrA and DsrB sequences. Specifically, sequences demarcated as “partial” and those that were likely obtained via PCR-based methods were removed without further consideration. Next, individual alignments of DsrA and DsrB were performed using Clustal Omega (30), guided with primary sequences of DsrAB from D. vulgaris and A. fulgidus . Sequences that were substantially truncated relative to model DsrAB, including those without start codons, were then removed, resulting in a total of 274 full-length DsrAB sequences. The database will be made available upon request from the authors. Phylogenetic analysis of DsrAB sequences was conducted, as previously described (11), and associated metadata was mapped to the DsrAB phylogeny using environment of sequence origin (from the original database publication), in addition to taxonomic information (either from the original database publication or via BLASTp searches of DsrA subunits against the NCBI nr database). DsrAB sequence homologs were subjected to structural alignment using the PROMALS3D multiple sequence and structure alignment server (31). Structures from D. vulgaris (2V4J; (32)) and A. fulgidus (3MMC; (20)) served as threads.