Over the last two decades, there has been a huge increase in our
understanding of microbial diversity, structure and composition enabled
by high throughput sequencing (HTS) technologies. Yet, it is unclear how
the number of sequences translates to the number of cells or species
within the community. Additional observational data may be required to
ensure relative abundance patterns from sequence reads are biologically
meaningful or presence absence data may be used instead of abundance.
The goal is to obtain robust community abundance data, simultaneously,
from environmental samples. In this issue of Molecular Ecology
Resources, Karlusich et al., (2022) describe a new method for
quantifying phytoplankton cell abundance. Using Tara Oceans
datasets, the authors propose the photosynthetic gene psbO for
reporting accurate relative abundance of the entire phytoplankton
community from metagenomic data. The authors demonstrate improved
correlations with traditional optical methods including microscopy and
flow cytometry, improving upon current molecular identification
typically using rRNA markers genes. Furthermore, to facilitate
application of their approach, the authors curated a psbO gene
database for accessible taxonomic queries. This is an important step
towards improving species abundance estimates from molecular data and
eventually reporting of absolute species abundance, enhancing our
understanding of community dynamics.
High-throughput sequencing (HTS) technologies for identification of taxa
from environmental samples have significantly improved our understanding
of biodiversity and community assembly processes. However,
quantification of species abundance from sequence reads is not a
straight forward task. This is because biases from DNA extraction, PCR
amplification and sequencing will affect the number of sequence reads
obtained for each taxonomic unit and therefore the representation within
the environmental sample (Bik et al., 2012). In addition, multi-copy
genes are often targeted to increase detection sensitivity of target DNA
from environmental samples for example, prokaryote (16S) and eukaryote
(18S) rRNA marker genes. However, large variations in copy number within
and between taxa reduce our ability to quantify taxon abundance.
Karlusich et al. (2022) explains that whilst many HTS studies report the
relative abundance of the gene sequences, this may not be an accurate
measure of the relative abundance of the organisms containing those
sequences. Yet, accurate relative abundance measurements are crucial to
our understanding of community composition simply because when one
taxonomic unit increases in relative abundance, another necessarily
decreases (figure 1).
Inaccurate assessments of abundance will have serious consequences to
our understanding and management of ecosystems. For example, Karlusich
et al. (2022) highlights the ecological importance of marine
phytoplankton including, their position at the foundation of ocean
ecosystems and roles in primary productivity and biogeochemical cycles
(Field, Behrenfeld, Randerson, & Falkowski, 1998). Under future global
change species sorting will potentially alter the composition of
functional groups within marine microbial communities (Di Pane,
Wiltshire, McLean, Boersma, & Meunier, 2022), which in turn feeds back
into the biogeochemical cycles. It is therefore important to know how
these communities will be composed in the future, and the consequences
to ecosystem services they provide. Targeted amplicon sequencing (a.k.ametabarcoding ) is now routinely used for the characterization
of complex assemblages of prokaryotic and eukaryotic organisms (Creer et
al., 2016) and we are now in a position where we can reliably identify
most of the abundant taxa in complex assemblages (albeit with some
exceptions) and provide “semi-quantitative” data of taxa abundance
from complex mixtures (e.g. ocean microbiome (Giner et al., 2016), soil
microbiome (Delgado-Baquerizo et al., 2018), air microbiome
(Drautz-Moses et al., 2022)). However, it is well documented that
metabarcoding suffers from biases associated with PCR amplification of
target genes (Bik et al., 2012). HTS-based metagenomics (the sequencing
of genomic fragments from many members of the community) is a
non-targeted, PCR-free method and as costs decline, is an emerging
solution to taxonomic identification without biases introduced by PCR.
Whilst traditional methods, such as microscopy and flowcytometry are
better at providing quantitative data and are well validated, they often
lack the ability to scale up to whole communities, especially in systems
or methods that rely on human expertise instead of automation (Makiola
et al., 2020). The goal is to obtain reliable abundance data for each
taxonomic unit, from the number of sequences reads obtained from the
environmental sample.
Karlusich et al. (2022) propose a straightforward solution to robustly
measure relative abundance from environmental samples and describe each
step of their selection and validation process. Using datasets from theTara Oceans (global expedition sampling global plankton in the
upper layers of the world ocean (Sunagawa et al., 2020)), Karlusich et
al. (2022) target nuclear-encoded single-copy, core, photosynthetic
genes obtained from metagenomes to circumvent the limitations of
targeted gene sequencing (metabarcoding) and multicopy markers. The
authors focused on the psbO gene, which is essential for
photosynthetic activity and does not have non-photosynthetic homologs,
thus is can be used to measure abundance of the total photosynthetic
group and has the added benefit covering the whole phytoplankton
community. Similarly, both cyanobacteria and eukaryotic phytoplankton
can be measured by combining two rRNA marker genes (e.g. prokaryotic 16S
and eukaryotic 18S) however, relative abundances derived from different
amplicon libraries cannot be directly compared (Tkacz, Hortala, &
Poole, 2018). Importantly, cross domain comparisons can be made using
the psbO gene.
Karlusich et al. (2022), found that the psbO gene is a robust
marker for estimating relative abundance of phytoplankton and were able
to examine the biogeography of the entire phytoplankton community
simultaneously. To validate their approach, the authors used TaraOceans data including, imaging datasets (microscopy and flow cytometry)
and molecular datasets from metabarcoding, metagenomics and
metatranscriptomics. Using imaging datasets (flow cytometry, microscopy)
they demonstrated the accuracy of their approach and even confirmed the
presence colony formation and symbiosis in some of the smallest
phytoplankton cells that were found in the largest size-fractioned water
samples. Armed with the evidence to demonstrate that the psbOgene accurately provides relative abundance data, the authors compared
their results with the commonly used rRNA marker genes 16S and 18S (rRNA
gene miTags from metagenome data and rRNA gene metabarcoding). Here they
show that the psbO gene outperformed rRNA gene datasets in
reporting accurate relative abundance of phytoplankton. Furthermore, the
authors demonstrate that psbO gene improves measures of microbial
community diversity, structure, and composition as compared to rRNA
genes and identified biases in metabarcoding datasets. However, they
report that diversity indices such as Shannon diversity (that accounts
for both species richness and evenness), were sufficiently robust to
account for biases introduced by the rRNA marker methods. Furthermore,
they confirm that neither rRNA gene markers nor psbO could
accurately report biovolume.
This is an exciting tool since we still do not have a clear
understanding of the abundance of phytoplankton groups from the ocean.
Similarly, the same steps can be followed from Karlusich et al. (2022),
in order to identify suitable genes for other study systems. There are
many research avenues where the use of good quality abundance data would
be enormously impactful. For example, to make more accurate assessment
of floral resource use from pollen grains found in honey (Jones et al.,
2021) or the bodies of pollinators (Lowe, Jones, Brennan, Creer, & de
Vere, 2022), exploring how the abundance of allergenic airborne pollen
correlates with human health (Rowney et al., 2021) and to gain insights
into the relationship between gut microbiome and human health (Proctor
et al., 2019). However, it is important that new markers are accompanied
by well populated genetic databases in order to avoid biases during
taxonomic assignment. A measure of absolute abundance is the ultimate
goal and future investigations using this approach can achieve absolute
abundance using careful sampling design and DNA internal standards
(‘spike in’) (Tkacz et al., 2018).