Identifying signatures of local adaptation (within species)
Genes showing signatures of differentiation across the latitudinal
gradient were identified for each species using a top-candidate approach
(Yeaman et al. , 2016). Initially, FST outliers
were identified as any SNPs with scores in the top
999th quantile. Then, the number of FST
outliers within each gene was compared to the expected number that
could have arisen by chance, which was estimated from a binomial
distribution with a probability of success of 0.001 (i.e. the
probability of being an outlier). Any gene that had more observed
FST outliers than the 999th quantile
of this binomial distribution was considered a top candidate for local
adaptation (using qbinom in R).
Determining orthologs: comparing patterns between species
pairs
To assess patterns consistent with convergent evolution between species
pairs, candidate genes were matched to orthologs in the other species.
Orthologs were identified between threespines and tubesnouts using a
table compiled by Li et al. (in review) using OMA (v2.3.0; Altenhoff et
al., 2018; Glover, Altenhoff, & Dessimoz, 2019). As the two stickleback
species are more closely related and share higher sequence identity, a
gapped-alignment program (GMAP; v2017-06-20; Wu & Watanabe, 2005) was
used to identify orthologs between threespine and ninespine. For this,
any alignments with a mapping quality of < 80 or a percentage
identity < 90% were filtered out. Additionally, any genes
with multiple matches (1:many & many:many orthologs) or overlapping
positions within a species were removed.
To compare population divergence among species, the average
FST score was calculated per gene. A similar approach
could not be used to compare \({\overset{\overline{}}{H}}_{E}\) because
larger windows were required to obtain sufficiently precise estimates,
and multiple genes could be present within a single window. Instead, the
score for the whole window was applied to each gene and if a gene’s
location spanned two windows then it was assigned the score of the
window where most of that gene was located. This approach produces some
pseudoreplication in the data as a given gene will be present in several
neighbouring windows, but this should have only a minor effect, causing
an overestimation of the significance of any true correlation. Given
that we found less correlation in these metrics than previous studies
(see Discussion), this should be a conservative approach.