DsrAB Database Generation and Phylogenetic Analyses
To evaluate the structural properties that confer Dsr function over
broad geochemical space, an existing DsrAB database was curated from
cultivar and environmental genomes (23). The original database was
constructed to include DsrAB from genomes of cultivars, targeted
PCR-based surveys, and metagenomic data. Thus, many of the sequences
were incomplete, potentially confounding structural modeling
calculations. Therefore, sequence alignments and annotations were used
to curate the database to comprise only full length DsrA and DsrB
sequences. Specifically, sequences demarcated as “partial” and those
that were likely obtained via PCR-based methods were removed without
further consideration. Next, individual alignments of DsrA and DsrB were
performed using Clustal Omega (30), guided with primary sequences of
DsrAB from D. vulgaris and A. fulgidus . Sequences that
were substantially truncated relative to model DsrAB, including those
without start codons, were then removed, resulting in a total of 274
full-length DsrAB sequences. The database will be made available upon
request from the authors. Phylogenetic analysis of DsrAB sequences was
conducted, as previously described (11), and associated metadata was
mapped to the DsrAB phylogeny using environment of sequence origin (from
the original database publication), in addition to taxonomic information
(either from the original database publication or via BLASTp searches of
DsrA subunits against the NCBI nr database). DsrAB sequence homologs
were subjected to structural alignment using the PROMALS3D multiple
sequence and structure alignment server (31). Structures from D.
vulgaris (2V4J; (32)) and A. fulgidus (3MMC; (20)) served as
threads.