2.3 vAMPirus Analysis Repository
To encourage and simplify the dissemination of parameters and non-read
files needed to reproduce vAMPirus analyses, we created the ‘vAMPirus
Analysis Repository’ (zenodo.org/communities/vampirusrepo/). The
vAMPirus Analysis Repository is a Zenodo Community intended as a central
location where investigators can deposit vAMPirus configuration files,
metadata files, databases used for taxonomy assignment or ASV filtering,
and any other files required to reproduce an analysis. Instructions and
recommendations for submission are available in the vAMPirus manual
(shorturl.at/uCO28). Once uploaded, submissions to the vAMPirus Analysis
Repository are given a DOI.
Validating the vAMPirus workflow with published
double-stranded DNA (dsDNA) virus datasets
We assessed the functionality and performance of vAMPirus’ analytical
workflow using amplicon sequencing datasets from two previously
published dsDNA virus studies (Table 1). Research questions associated
with each study are used as examples in Figure 1A (Finke & Suttle 2019;
Figure 1A, Q1; Frantzen & Holo 2019; Figure 1A, Q2). For each dataset,
we ran a vAMPirus analysis that reproduced the analysis from the
associated published paper as closely as possible. For example, if a
study generated de novo OTUs based on 97% nucleotide identity,
the vAMPirus equivalent was ncASVs generated at 97% nucleotide identity
with similar data quality control constraints. We then compared the
results of the vAMPirus-based analyses to the findings described in each
source manuscript. In brief, vAMPirus identified the same biological
patterns as those published by Finke & Suttle (2019, Figure 3) and
Frantzen & Holo (2019, Figure 4) from their respective sequence
datasets, and detected additional (previously unreported) virus
diversity (Table 1). For example, Finke and Suttle (2019) reported
increased cyanophage community alpha diversity in samples collected from
sites with higher salinity (>27.5 practical salinity units,
Figure 3-I, II); this pattern was present in the corresponding vAMPirus
results (Figure 3-III, IV, V, VI), which included 86% more cyanophage
pcASVs relative to the number of OTUs reported in Finke and Suttle
(2019; Table 1). Similarly, the patterns of lactococcal phage OTU
richness and relative abundances per sample reported by Franzten and
Holo (2019; Figure 4-I) were also present in the vAMPirus results (Table
2; Figure 4-II). vAMPirus reported 43% more lactococcal phage ncASVs,
relative to the OTUs reported by Frantzen and Holo (2019; Table 1,
Figure 4). In addition, vAMPirus ASV-level analysis (Figure 4-III)
revealed high lactococcal phage nucleotide-level diversity (n=531), yet
aminotyping results (Figure 4-IV) suggest that the mutations underlying
this richness mostly result in synonymous mutations: ASV sequences
translated to only 29 aminotypes. Aminotype phylogrouping (see Section
2.2.2) of these data with TreeCluster highlighted a previously hidden
overlap of lactococcal phage diversity across samples and dairy plants
(Figure 4-VI).
Some variation between results obtained from vAMPirus and previous
publications was expected, as the pipelines used in these comparisons
were not identical. The only striking difference between the original
results (in Finke and Suttle 2019 and Frantzen and Holo 2019) and those
produced by vAMPirus is the higher number of pcASVs and ncASVs
(respectively) identified via the latter analytic pipeline. Taxonomy
results generated with vAMPirus by DIAMOND blastx aligning sequences to
the NCBI virus RefSeq database verified that the pcASVs and ncASVs are
of cyanophage and lactococcal phage origin, respectively (Supplemental
Figures S4 and S5). The higher diversity identified by vAMPirus may be
attributable to differences in reference database used (boutique versus
NCBI-curated), handling of singletons, and other factors.
Table 1. Breakdown of test datasets used during vAMPirus
development, including the methods and results from the original
(published) analysis, as well as results from vAMPirus analysis.
vAMPirus results were generated using de novo clustering of ASVs
into ‘clustered ASVs’ (cASVs) based on pairwise nucleotide (ncASV) and
protein (pcASV) sequence similarity. dsDNA = double-stranded DNA.