Planning a metabarcoding study
To test scientific hypotheses, researchers should first consider a proper methodological experimental design – either observational, experimental or combined – including technical, analytical and financial requirements. Experimental designs of broad representativeness (e.g., geographical and ecological scope) and independence of replicates (i.e., no spatiotemporal autocorrelation) are strongly recommended (Gotelli & Ellison, 2013; Zinger et al., 2019a). Indeed, metabarcoding studies do not differ from traditional ecological studies in which the number and distribution of study sites must be defined appropriately depending on the initial question (Dickie et al., 2018). Additionally, metabarcoding studies also require an optimal number of local, biological replicates, ranging from at least three (for detecting strong differences in composition) to around 10 (for richness assessment). Intuitively, more replicates will be required when any expected ecological differences are relatively small or when the studied location exhibits strong spatial or environmental heterogeneity.
The size of individual environmental samples should be large enough to secure enough material for DNA extraction and potential physicochemical analysis (e.g., pH and C/N ratio). It is also important to consider the amount of material from the perspectives of pre-treatment and storage. Too much material will be difficult to mix, dry or freeze - and will prove costly to preserve in a buffer. To ensure statistical independence of samples within a site, samples should be located at least 5-10 m apart from each other in a homogeneous environment. This distance corresponds to the spatial autocorrelation range in soil fungi (Bahram et al., 2013) and the average size of macrofungal individuals (Douhan et al., 2011). In aquatic habitats, communities are likely to compositionally autocorrelate for even larger distances (Matsuoka et al., 2019). When assessing diversity patterns along ecological gradients, transects (e.g., latitudinal, altitudinal and salinity gradients) should be replicated. Spatial independence should also be ascertained for plots and treatments. In field and laboratory experiments, this is best achieved by a randomized block design (Legendre & Legendre, 2012), although a stratified block design may be important in environments with known heterogeneity. It is advisable to collect samples in as short a time period as possible to avoid seasonal and weather effects such as freeze-thaw cycles and rainfall after a long dry period (for soil and leaves) that may cause rapid turnover of microbes and degradation of their DNA by molds. All sampling locations (including positions in controlled experiments) and sampling dates should be recorded precisely to permit controlling for spatiotemporal effects in the following statistical analyses (Bahram et al., 2015; Tedersoo et al., 2020a).