Introduction
Fungi have traditionally been identified based on macro- and
micromorphological features of fruiting body specimens or pure cultures.
The introduction of molecular techniques established in the late 1980s
represented a significant leap forward in fungal identification.
Especially PCR amplification combined with Sanger sequencing of the
nuclear 18S (SSU) and 28S (LSU) ribosomal rRNA genes and the nuclear
ribosomal internal transcribed spacer (ITS) region from fungal tissue
(e.g., lichen thalli, lesions in tissue, cultures from environmental
samples and ectomycorrhizal root tips) quickly became popular and
offered unprecedented taxonomic resolution. Common uses included
species- and genus-level identification, analysis of cryptic species,
and phylogenetic assessment of larger fungal clades as well as the
kingdom Fungi at large (Gherbawy & Voitk, 2010). Later on, the
identification of multiple fungi from more diverse substrates, including
soil, plant roots, and water became possible by including a cloning step
of amplicons prior to sequencing. However, these studies usually
operated with tens to low hundreds of reads, rarely numbering in the
thousands required to appropriately estimate fungal diversity in soils
(Taylor et al., 2014). Accordingly, sequences and operational taxonomic
units (OTUs) were usually handled manually or using specific programs,
with no need for bioinformatics tools.
The development of high-throughput sequencing (HTS) methods such as 454
pyrosequencing (454 Inc., obsolete), Illumina sequencing (Illumina Inc.,
www.illumina.com) and Ion Torrent (Thermo Fisher Scientific Inc.,
www.thermofisher.com) transformed fungal identification capacity in the
2000s (Jumpponen & Jones, 2009). These so-called next- or
second-generation HTS methods increased the number of reads by 2-6
orders of magnitude and the number of simultaneously processable samples
by 1-2 orders of magnitude. These metabarcoding methods (cf. Taberlet et
al., 2012) enabled estimating fungal diversity exhaustively from
environmental DNA (eDNA) on an individual sample scale as well as
facilitating global scale comparisons (Tedersoo et al., 2014; Sun et
al., 2021). Yet, these second-generation platforms as well as the more
recent DNA nanoball sequencing (DNBseq; MGI-Tech Inc., www.mgitech.com)
were only able to address short (<550 bases) fragments of the
genetic markers, resulting in the loss of taxonomic resolution and
phylogenetic information as well as difficulties in identifying
technical artefacts compared with longer Sanger reads.
In the 2010s, long-read, third-generation HTS platforms such as PacBio
single-molecule real-time (SMRT) sequencing (Pacific BioSciences Inc.,
www.pacbio.com) and nanopore sequencing (Oxford Nanopore Technologies
Inc., https://nanoporetech.com) were introduced (van Dijk et al., 2019).
Due to low sequencing depth (tens of thousands of reads in total,
resulting in only hundreds rather than thousands of reads per sample)
and high raw error rates (12-20%), these methods could not initially
compete with short-read HTS platforms. However, both technologies made a
great leap forward in 2020 when PacBio Sequel II instruments became
broadly available and new solutions were developed to greatly reduce
error rates in Nanopore sequencing (Karst et al., 2021; Tedersoo et al.,
2021a). These long-read technologies and synthetic long reads provide
high-quality sequence data for up to 5 kb amplicons, which enables
bridging variable and conserved fragments of one or more genes in a
single sequencing round as well as resolving alleles and haplotypes
(Callahan et al., 2021; Tedersoo et al., 2021a).
Along with the rapid development of HTS methods, bioinformatics
platforms and analytical resources have evolved to match the
computational needs imposed by large datasets. Metabarcoding approaches
have been extensively reviewed in several recent studies with a focus on
their conceptual foundation (Taberlet et al., 2018), pathogenic
organisms (Piombo et al., 2021; Tedersoo et al., 2019), applications in
mycology (Nilsson et al., 2018), eukaryotes more broadly (Ruppert et
al., 2019 ) as well as overall experimental planning (Zinger et al.,
2019a), trade-offs among technology generations (Kennedy et al., 2018;
Loit et al., 2019), and analytical pitfalls (Halwachs et al., 2017;
Critescu & Hebert, 2018). Here we provide a review of available methods
and propose best practices for designing and performing studies using
metabarcoding in fungi. We also compare the performance of several
popular methods developed for bacteria to assess their suitability for
fungi. The vast majority of our recommendations are relevant to
prokaryotes, protists and metazoans alike.