Identifying signatures of local adaptation (within species)
Genes showing signatures of differentiation across the latitudinal gradient were identified for each species using a top-candidate approach (Yeaman et al. , 2016). Initially, FST outliers were identified as any SNPs with scores in the top 999th quantile. Then, the number of FST ­outliers within each gene was compared to the expected number that could have arisen by chance, which was estimated from a binomial distribution with a probability of success of 0.001 (i.e. the probability of being an outlier). Any gene that had more observed FST outliers than the 999th quantile of this binomial distribution was considered a top candidate for local adaptation (using qbinom in R).
Determining orthologs: comparing patterns between species pairs
To assess patterns consistent with convergent evolution between species pairs, candidate genes were matched to orthologs in the other species. Orthologs were identified between threespines and tubesnouts using a table compiled by Li et al. (in review) using OMA (v2.3.0; Altenhoff et al., 2018; Glover, Altenhoff, & Dessimoz, 2019). As the two stickleback species are more closely related and share higher sequence identity, a gapped-alignment program (GMAP; v2017-06-20; Wu & Watanabe, 2005) was used to identify orthologs between threespine and ninespine. For this, any alignments with a mapping quality of < 80 or a percentage identity < 90% were filtered out. Additionally, any genes with multiple matches (1:many & many:many orthologs) or overlapping positions within a species were removed.
To compare population divergence among species, the average FST score was calculated per gene. A similar approach could not be used to compare \({\overset{\overline{}}{H}}_{E}\) because larger windows were required to obtain sufficiently precise estimates, and multiple genes could be present within a single window. Instead, the score for the whole window was applied to each gene and if a gene’s location spanned two windows then it was assigned the score of the window where most of that gene was located. This approach produces some pseudoreplication in the data as a given gene will be present in several neighbouring windows, but this should have only a minor effect, causing an overestimation of the significance of any true correlation. Given that we found less correlation in these metrics than previous studies (see Discussion), this should be a conservative approach.