Bioinformatics
We conducted the bioinformatic analyses using the Obitools metabarcoding
package (Boyer et al., 2016). We aligned the paired-end reads using the
command illuminapairedend . We selected sequences with alignment
quality scores bigger than 40 and we demultiplexed the aligned dataset
and removed the primer sequences with ngsfilter . We also filtered
out sequences containing ambiguous bases. We then used Obiuniq to
dereplicate the reads (grouping all identical sequences) while keeping
track of their abundances, and we also removed chimeric sequences using
the uchime_denovo algorithm in VSearch (Rognes, Flouri, Nichols,
Quince & Mahé, 2016). We used the step-by-step aggregation clustering
algorithm implemented in Swarm 2.1.13 (Mahé, Rognes, Quince, de Vargas,
& Dunthorn, 2015) to cluster the sequences into Molecular Operational
Taxonomic Units (MOTUs). For making adult (morphology and DNA barcode
data) and juvenile (metabarcoding data) clustering comparable, we
combined the sequences from both life stages before running the Swarm
clustering algorithm. In the case of adults, we kept only the segment of
the original COI sequences matching the Leray COI fragment. To prevent
the program from discarding adult sequences as singletons, we
artificially increased their initial abundance to 50,000 reads. We set a
distance value of d = 13 for the clustering algorithm, which has been
shown to be the optimal value for discriminating intra and interspecific
divergences, that is, to approximate MOTUs to species-level clusters, in
a wide range of eukaryotic systems (Wangensteen & Turon, 2017; Kemp et
al., 2019; Siegenthaler et al, 2019; Garcés-Pastor et al., 2019; Antich,
Palacín, Wangensteen & Turón, 2021). The species present as adults
whose sequences were clustered together by Swarm (nine pairs, one triad
and one tetrad) were also treated as single entities in downstream
analyses with juveniles. After removing the singletons, we performed the
taxonomic assignment of the representative sequences of each MOTU
(seeds) using Ecotag (Boyer et al., 2016). We built the local reference
sequence database required by Ecotag, combining our sequences of adult
spiders with sequences retrieved from the BOLD database (Ratnasingham &
Hebert, 2007) and the EMBL repository (Kulikova et al., 2004). Ecotag
(Boyer et al. 2016) uses a phylogenetic assignment protocol, based on
the NCBI taxonomy tree, to assign sequences to the last common ancestor
of the most closely related sequences in the local reference database.
This approach does not require establishing arbitrary identity
thresholds for every taxonomic rank (Bakker et al., 2019).
We filtered out putative contaminants of the resulting database by
retaining only the MOTUs assigned to the order Araneae. After the
taxonomic assignment made by Ecotag, we manually checked if there were
better, more recent matches in BOLD or NCBI, and we updated the
identification of those MOTUs for which better matches were found. We
discarded as contaminants 16 MOTUs with low numbers of reads that
corresponded to a checklist of non-iberian species that had been
analysed in other studies conducted in the same lab. We used the LULU
algorithm (Frøslev et al., 2017) to remove the MOTUs corresponding to
pseudogenes. We also built a COI tree using the seed sequence of every
MOTU and the COI sequence of the adult specimens to help allocate
unassigned MOTUs to specific families, genera or species. We inferred
the tree by Maximum Likelihood using IQ-TREE v.1.6 (Nguyen, Schmidt, von
Haeseler, & Minh, 2015). We partitioned positions by codon and assigned
an unlinked GTR model to each partition, and we assessed branch support
by means of 1,000 ultrafast bootstrap approximation replicates (Minh,
Nguyen, & von Haeseler, 2013; Hoang, Chernomor, von Haeseler, Minh &
Vinh, 2018). Analyses were run remotely at the CIPRES Science Gateway
(Miller, Pfeiffer, & Schwartz, 2010). All the replicates of each plot
were added up. All the MOTU’s with less than five total reads were
discarded. Also, for a MOTU to be counted as present in a plot, we
required at least five reads in the plot and detection of the MOTU in at
least two of the replicates of the plot.