Over the last two decades, there has been a huge increase in our understanding of microbial diversity, structure and composition enabled by high throughput sequencing (HTS) technologies. Yet, it is unclear how the number of sequences translates to the number of cells or species within the community. Additional observational data may be required to ensure relative abundance patterns from sequence reads are biologically meaningful or presence absence data may be used instead of abundance. The goal is to obtain robust community abundance data, simultaneously, from environmental samples. In this issue of Molecular Ecology Resources, Karlusich et al., (2022) describe a new method for quantifying phytoplankton cell abundance. Using Tara Oceans datasets, the authors propose the photosynthetic gene psbO for reporting accurate relative abundance of the entire phytoplankton community from metagenomic data. The authors demonstrate improved correlations with traditional optical methods including microscopy and flow cytometry, improving upon current molecular identification typically using rRNA markers genes. Furthermore, to facilitate application of their approach, the authors curated a psbO gene database for accessible taxonomic queries. This is an important step towards improving species abundance estimates from molecular data and eventually reporting of absolute species abundance, enhancing our understanding of community dynamics.
High-throughput sequencing (HTS) technologies for identification of taxa from environmental samples have significantly improved our understanding of biodiversity and community assembly processes. However, quantification of species abundance from sequence reads is not a straight forward task. This is because biases from DNA extraction, PCR amplification and sequencing will affect the number of sequence reads obtained for each taxonomic unit and therefore the representation within the environmental sample (Bik et al., 2012). In addition, multi-copy genes are often targeted to increase detection sensitivity of target DNA from environmental samples for example, prokaryote (16S) and eukaryote (18S) rRNA marker genes. However, large variations in copy number within and between taxa reduce our ability to quantify taxon abundance. Karlusich et al. (2022) explains that whilst many HTS studies report the relative abundance of the gene sequences, this may not be an accurate measure of the relative abundance of the organisms containing those sequences. Yet, accurate relative abundance measurements are crucial to our understanding of community composition simply because when one taxonomic unit increases in relative abundance, another necessarily decreases (figure 1).
Inaccurate assessments of abundance will have serious consequences to our understanding and management of ecosystems. For example, Karlusich et al. (2022) highlights the ecological importance of marine phytoplankton including, their position at the foundation of ocean ecosystems and roles in primary productivity and biogeochemical cycles (Field, Behrenfeld, Randerson, & Falkowski, 1998). Under future global change species sorting will potentially alter the composition of functional groups within marine microbial communities (Di Pane, Wiltshire, McLean, Boersma, & Meunier, 2022), which in turn feeds back into the biogeochemical cycles. It is therefore important to know how these communities will be composed in the future, and the consequences to ecosystem services they provide. Targeted amplicon sequencing (a.k.ametabarcoding ) is now routinely used for the characterization of complex assemblages of prokaryotic and eukaryotic organisms (Creer et al., 2016) and we are now in a position where we can reliably identify most of the abundant taxa in complex assemblages (albeit with some exceptions) and provide “semi-quantitative” data of taxa abundance from complex mixtures (e.g. ocean microbiome (Giner et al., 2016), soil microbiome (Delgado-Baquerizo et al., 2018), air microbiome (Drautz-Moses et al., 2022)). However, it is well documented that metabarcoding suffers from biases associated with PCR amplification of target genes (Bik et al., 2012). HTS-based metagenomics (the sequencing of genomic fragments from many members of the community) is a non-targeted, PCR-free method and as costs decline, is an emerging solution to taxonomic identification without biases introduced by PCR. Whilst traditional methods, such as microscopy and flowcytometry are better at providing quantitative data and are well validated, they often lack the ability to scale up to whole communities, especially in systems or methods that rely on human expertise instead of automation (Makiola et al., 2020). The goal is to obtain reliable abundance data for each taxonomic unit, from the number of sequences reads obtained from the environmental sample.
Karlusich et al. (2022) propose a straightforward solution to robustly measure relative abundance from environmental samples and describe each step of their selection and validation process. Using datasets from theTara Oceans (global expedition sampling global plankton in the upper layers of the world ocean (Sunagawa et al., 2020)), Karlusich et al. (2022) target nuclear-encoded single-copy, core, photosynthetic genes obtained from metagenomes to circumvent the limitations of targeted gene sequencing (metabarcoding) and multicopy markers. The authors focused on the psbO gene, which is essential for photosynthetic activity and does not have non-photosynthetic homologs, thus is can be used to measure abundance of the total photosynthetic group and has the added benefit covering the whole phytoplankton community. Similarly, both cyanobacteria and eukaryotic phytoplankton can be measured by combining two rRNA marker genes (e.g. prokaryotic 16S and eukaryotic 18S) however, relative abundances derived from different amplicon libraries cannot be directly compared (Tkacz, Hortala, & Poole, 2018). Importantly, cross domain comparisons can be made using the psbO gene.
Karlusich et al. (2022), found that the psbO gene is a robust marker for estimating relative abundance of phytoplankton and were able to examine the biogeography of the entire phytoplankton community simultaneously. To validate their approach, the authors used TaraOceans data including, imaging datasets (microscopy and flow cytometry) and molecular datasets from metabarcoding, metagenomics and metatranscriptomics. Using imaging datasets (flow cytometry, microscopy) they demonstrated the accuracy of their approach and even confirmed the presence colony formation and symbiosis in some of the smallest phytoplankton cells that were found in the largest size-fractioned water samples. Armed with the evidence to demonstrate that the psbOgene accurately provides relative abundance data, the authors compared their results with the commonly used rRNA marker genes 16S and 18S (rRNA gene miTags from metagenome data and rRNA gene metabarcoding). Here they show that the psbO gene outperformed rRNA gene datasets in reporting accurate relative abundance of phytoplankton. Furthermore, the authors demonstrate that psbO gene improves measures of microbial community diversity, structure, and composition as compared to rRNA genes and identified biases in metabarcoding datasets. However, they report that diversity indices such as Shannon diversity (that accounts for both species richness and evenness), were sufficiently robust to account for biases introduced by the rRNA marker methods. Furthermore, they confirm that neither rRNA gene markers nor psbO could accurately report biovolume.
This is an exciting tool since we still do not have a clear understanding of the abundance of phytoplankton groups from the ocean. Similarly, the same steps can be followed from Karlusich et al. (2022), in order to identify suitable genes for other study systems. There are many research avenues where the use of good quality abundance data would be enormously impactful. For example, to make more accurate assessment of floral resource use from pollen grains found in honey (Jones et al., 2021) or the bodies of pollinators (Lowe, Jones, Brennan, Creer, & de Vere, 2022), exploring how the abundance of allergenic airborne pollen correlates with human health (Rowney et al., 2021) and to gain insights into the relationship between gut microbiome and human health (Proctor et al., 2019). However, it is important that new markers are accompanied by well populated genetic databases in order to avoid biases during taxonomic assignment. A measure of absolute abundance is the ultimate goal and future investigations using this approach can achieve absolute abundance using careful sampling design and DNA internal standards (‘spike in’) (Tkacz et al., 2018).