2.6 Comparison of results from molecular tests of selection
Comparisons of genome-wide targets of positive selection across species used two approaches. First, we identified genes within any of the identified sweep regions, by intersecting the outlier sweep locations with gene annotations using bedtools intersect, v2.27.1 (Quinlan & Hall, 2010), for each species genome (Yang et al., 2021). The protein sequences of any gene, from start to stop, overlapping any sweep region interval, was then retrieved for each species. We then assessed the overlap of the sets of proteins identified for the three species using Orthovenn2 (Xu et al., 2019), which clusters proteins based upon sequence similarity. This approach provides a means of identifying any targets of selection that might be shared across the three species, without also requiring targeting of the same gene region (because this analysis allows for independent members of a given gene family to be targeted). In order to assess whether any species appeared to have a higher proportion of immune genes among the putative targets of positive selection, we also included the 96 candidate immune genes previously identified from G. calmariensis in the Orthovenn2 analysis. Second, we repeated the analysis above, but extended the candidate gene region to included 5kb on either side of the gene body (i.e. 5 kb before start and after stop), in order to detect any putative signatures of positive selection associated with the regulatory regions of a given gene. We chose a 10kb flanking region targeted for regulatory evolution based on information from other insect groups (Ghavi-Helm et al., 2014; Lewis & Reed, 2019).