Bioinformatics
Unless otherwise mentioned, the default parameter settings were used for
all software mentioned. Sequenced reads were trimmed of adaptors with
Trimmomatic (v0.38; Bolger, Lohse, & Usadel, 2014), using the
paired-end mode ‘PE’ and with a minimum length set to 120bp. Further
trimming was deemed unnecessary after inspecting read quality with
FastQC (v0.11.7;
https://www.bioinformatics.babraham.ac.uk/projects/fastqc). Trimmed
reads were mapped onto each species genome (threespine :
Peichel, Sullivan, Liachko, & White, 2017; tubesnout : Li &
Yeaman, n.d.; ninespine : Nelson & Cresko, 2018) with BWA- MEM
(v0.7.12; Li & Durbin, 2009). PCR-duplicates were flagged using
Picard-MarkDuplicates (v2.18.7; http://broadinstitute.github.io/picard).
As a prerequisite before running MarkDuplicates, the reads were sorted
and read group information was added with Picard -
AddOrReplaceReadGroups. Reads were realigned around indels to adjust
quality scores for sites surrounding indels using GATK3 –
IndelRealigner (v3.8-1-0; McKenna et al., 2010). Before indel
realignment the ninespine reads files, which were sequenced on separate
lanes, were combined into a single file per population using samtools –
merge (v1.9; Li, 2011; Li et al., 2009). After indel realignment
samtools – mpileup was used to combine reads from all populations
within a species. Any reads flagged as duplicates were ignored by
samtools. VarScan (v2.3.9; Koboldt et al., 2012) was used to call SNPs
for each species. The ploidy for each sample was set as double the
number of individuals in the pool (2N). Thresholds were set to filter
out multiallelic SNPs, low coverage (cov < 50), quality (qual
< 20), minor alternative allele frequency (maf <
0.01), and SNPs with less than two reads for the minor allele
(min-read-count < 2). The coverage filter was set to ensure
that each individual in a sample was represented at least once, assuming
DNA pooling was balanced.