Bioinformatics
We conducted the bioinformatic analyses using the Obitools metabarcoding package (Boyer et al., 2016). We aligned the paired-end reads using the command illuminapairedend . We selected sequences with alignment quality scores bigger than 40 and we demultiplexed the aligned dataset and removed the primer sequences with ngsfilter . We also filtered out sequences containing ambiguous bases. We then used Obiuniq to dereplicate the reads (grouping all identical sequences) while keeping track of their abundances, and we also removed chimeric sequences using the uchime_denovo algorithm in VSearch (Rognes, Flouri, Nichols, Quince & Mahé, 2016). We used the step-by-step aggregation clustering algorithm implemented in Swarm 2.1.13 (Mahé, Rognes, Quince, de Vargas, & Dunthorn, 2015) to cluster the sequences into Molecular Operational Taxonomic Units (MOTUs). For making adult (morphology and DNA barcode data) and juvenile (metabarcoding data) clustering comparable, we combined the sequences from both life stages before running the Swarm clustering algorithm. In the case of adults, we kept only the segment of the original COI sequences matching the Leray COI fragment. To prevent the program from discarding adult sequences as singletons, we artificially increased their initial abundance to 50,000 reads. We set a distance value of d = 13 for the clustering algorithm, which has been shown to be the optimal value for discriminating intra and interspecific divergences, that is, to approximate MOTUs to species-level clusters, in a wide range of eukaryotic systems (Wangensteen & Turon, 2017; Kemp et al., 2019; Siegenthaler et al, 2019; Garcés-Pastor et al., 2019; Antich, Palacín, Wangensteen & Turón, 2021). The species present as adults whose sequences were clustered together by Swarm (nine pairs, one triad and one tetrad) were also treated as single entities in downstream analyses with juveniles. After removing the singletons, we performed the taxonomic assignment of the representative sequences of each MOTU (seeds) using Ecotag (Boyer et al., 2016). We built the local reference sequence database required by Ecotag, combining our sequences of adult spiders with sequences retrieved from the BOLD database (Ratnasingham & Hebert, 2007) and the EMBL repository (Kulikova et al., 2004). Ecotag (Boyer et al. 2016) uses a phylogenetic assignment protocol, based on the NCBI taxonomy tree, to assign sequences to the last common ancestor of the most closely related sequences in the local reference database. This approach does not require establishing arbitrary identity thresholds for every taxonomic rank (Bakker et al., 2019).
We filtered out putative contaminants of the resulting database by retaining only the MOTUs assigned to the order Araneae. After the taxonomic assignment made by Ecotag, we manually checked if there were better, more recent matches in BOLD or NCBI, and we updated the identification of those MOTUs for which better matches were found. We discarded as contaminants 16 MOTUs with low numbers of reads that corresponded to a checklist of non-iberian species that had been analysed in other studies conducted in the same lab. We used the LULU algorithm (Frøslev et al., 2017) to remove the MOTUs corresponding to pseudogenes. We also built a COI tree using the seed sequence of every MOTU and the COI sequence of the adult specimens to help allocate unassigned MOTUs to specific families, genera or species. We inferred the tree by Maximum Likelihood using IQ-TREE v.1.6 (Nguyen, Schmidt, von Haeseler, & Minh, 2015). We partitioned positions by codon and assigned an unlinked GTR model to each partition, and we assessed branch support by means of 1,000 ultrafast bootstrap approximation replicates (Minh, Nguyen, & von Haeseler, 2013; Hoang, Chernomor, von Haeseler, Minh & Vinh, 2018). Analyses were run remotely at the CIPRES Science Gateway (Miller, Pfeiffer, & Schwartz, 2010). All the replicates of each plot were added up. All the MOTU’s with less than five total reads were discarded. Also, for a MOTU to be counted as present in a plot, we required at least five reads in the plot and detection of the MOTU in at least two of the replicates of the plot.