2.2.1. Barcode/OTU datasets
The following three metabarcode datasets were released in OBA.
The Tara Oceans 18S-V9 rRNA metabarcode dataset consists of eight
size-fractionated communities obtained from two depths in the photic
zone (subsurface=SRF, Deep-Chlorophyll Maximum=DCM), one from the
mesopelagic zone (MES) and one from the marine epipelagic mixed layer
(MIX). Size fractionations corresponded to filter collected pico- and
nano-plankton (0.8-5 μm), and plankton net tows for the nano-, micro-,
and meso- plankton (respectively, 5-20 μm, 20-180 μm and 180-2,000 μm)
(http://taraoceans.sb-roscoff.fr/EukDiv/; de Vargas et al., 2015). This
dataset was built by sequencing plankton metabarcodes and assembling
1,685,214,722 raw reads, from 1,046 samples including Tara Oceans
Polar Circle expedition (https://figshare.com/s/cfbf869ca84310fda6bb;
Ibarbalz et al., 2019). Metabarcodes were clustered into biologically
meaningful 474,303 OTUs, using the ‘Swarm’ approach (Mahé et al.,
2014). For the taxonomic assignment of metabarcodes, the P rotistR ibosomal R eference -PR²- database was used (Guillou et
al., 2013).
The Tara Oceans 16S/18S rRNA miTags dataset
consists of two size-fractionated communities (0.22 to 1.6 μm and 0.22
to 3μm) that were obtained from two depths in the photic zone
(subsurface=SRF, Deep-Chlorophyll Maximum=DCM), as well as one depth is
the mesopelagic zone (MES) and one in the marine epipelagic mixed layer
(MIX). The metagenomics reads corresponding to both size fractions
(enriched in prokaryotes and giant viruses) described in (Salazar et
al., 2019) are available at
https://www.ocean-microbiome.org and https://zenodo.org/record/3473199.
For each prokaryote-enriched sample (N=180), merged 19,037,038 raw reads
Illumina reads (miTags) that contained signatures of the 16S/18S rRNA
gene were extracted (Logares et al., 2014). These fragments were mapped
to a set of 16S/18S reference sequences that were downloaded from the
SILVA database (Release 128: SSU Ref NR 99;
https://www.arb-silva.de/fileadmin/silva_databases/release_128/Exports/SILVA_128_SSURef_Nr99_tax_silva.fasta.gz).
A total of 23,987 miTags sequences were annotated.
Abundance tables were built by counting the number of miTags assigned to
each taxa in each sample and the number of unassignedmiTags
(https://www.ebi.ac.uk/biostudies/files/S-BSST297/u/OM-RGC_v2_taxonomic_profiles.tar.gz).
The 16S-V4V5 metabarcode dataset from the Malaspina-2010 expedition was
built from 60 samples of bathypelagic (BAT: 1000-4000 m) and
abyssopelagic (ABY: 4000-6000 m) waters (Salazar et al., 2015)
(https://github.com/GuillemSalazar/MolEcol_2015). This metabarcode
dataset based on
1,789,427 raw reads contained 3,902 OTU sequences for two plankton size
fractions (0.2 to 0.8 μm and 0.8 to 20 μm). The taxonomic assignment was
performed using the SILVA database (release 115;
https://www.arb-silva.de/fileadmin/silva_databases/release_115/Exports/SSURef_NR99_115_tax_silva.fasta.tgz).
Abundance tables contained the number of reads for the OTUs of
particle-attached (PA) and free-living (FL) prokaryotes detected in 30
globally distributed sampling stations
(https://github.com/GuillemSalazar/MolEcol_2015/blob/master/OTUtable_Salazar_etal_2015_Molecol_norarefac.txt).