Bioinformatic processing
A custom bioinformatics pipeline based on the OBITools (v. 1.2.12,
(Boyer et al., 2016)) and VSEARCH (v. 2.9.1, (Rognes et al., 2016))
software suites and the unoise3 algorithm (Edgar, 2016) was developed to
process reads (available at: anonymized github-link). Forward and
reverse reads were paired with the illuminapairedend function. The reads
were then passed via criteria selecting aligned reads of high-quality
(score > 40.00), assigned to sample and trimmed based on
the sequences of the primers and attached oligonucleotide tags
(ngsfilter). Reads were further selected for sequence non-ambiguity and
read lengths between 80 to 120 bp. To enable faster processing, the data
were split by sample, and distributed over several CPUs whom in parallel
performed dereplication (obiuniq), sorting (vsearch –sortbysize),
denoising with removal of rare sequences (vsearch –cluster_unoise,
–minsize 4, –unoise_alpha 8) and chimera removal (vsearch
–uchime3_denovo). Resulting sequence variants are hereon referred to
as zero-radius Operational Taxonomic Units (zOTUs). After the most
computationally heavy processing, all sample subfiles were concatenated,
and zOTUs were reassigned to sample (obiuniq). Finally, taxonomy was
assigned to the Protist Ribosomal database (PR2, v.4.14.0, Guillou et
al., 2013) using blastn (BLAST+, v. 2.8.1, Altschul et al., 1990;
Camacho et al., 2009).