Planning a metabarcoding study
To test scientific hypotheses, researchers should first consider a
proper methodological experimental design – either observational,
experimental or combined – including technical, analytical and
financial requirements. Experimental designs of broad representativeness
(e.g., geographical and ecological scope) and independence of replicates
(i.e., no spatiotemporal autocorrelation) are strongly recommended
(Gotelli & Ellison, 2013; Zinger et al., 2019a). Indeed, metabarcoding
studies do not differ from traditional ecological studies in which the
number and distribution of study sites must be defined appropriately
depending on the initial question (Dickie et al., 2018). Additionally,
metabarcoding studies also require an optimal number of local,
biological replicates, ranging from at least three (for detecting strong
differences in composition) to around 10 (for richness assessment).
Intuitively, more replicates will be required when any expected
ecological differences are relatively small or when the studied location
exhibits strong spatial or environmental heterogeneity.
The size of individual environmental samples should be large enough to
secure enough material for DNA extraction and potential physicochemical
analysis (e.g., pH and C/N ratio). It is also important to consider the
amount of material from the perspectives of pre-treatment and storage.
Too much material will be difficult to mix, dry or freeze - and will
prove costly to preserve in a buffer. To ensure statistical independence
of samples within a site, samples should be located at least 5-10 m
apart from each other in a homogeneous environment. This distance
corresponds to the spatial autocorrelation range in soil fungi (Bahram
et al., 2013) and the average size of macrofungal individuals (Douhan et
al., 2011). In aquatic habitats, communities are likely to
compositionally autocorrelate for even larger distances (Matsuoka et
al., 2019). When assessing diversity patterns along ecological
gradients, transects (e.g., latitudinal, altitudinal and salinity
gradients) should be replicated. Spatial independence should also be
ascertained for plots and treatments. In field and laboratory
experiments, this is best achieved by a randomized block design
(Legendre & Legendre, 2012), although a stratified block design may be
important in environments with known heterogeneity. It is advisable to
collect samples in as short a time period as possible to avoid seasonal
and weather effects such as freeze-thaw cycles and rainfall after a long
dry period (for soil and leaves) that may cause rapid turnover of
microbes and degradation of their DNA by molds. All sampling locations
(including positions in controlled experiments) and sampling dates
should be recorded precisely to permit controlling for spatiotemporal
effects in the following statistical analyses (Bahram et al., 2015;
Tedersoo et al., 2020a).