Material and Methods

Sampled colonies

We sampled 265 Western honey bee colonies from two different subspecies, namely A. m. mellifera (MEL) and A. m. carnica (CAR) (Figure 1). Conserved MEL colonies were sampled in Switzerland (CS_CH) and France (CS_FR). The majority of the colonies belonged to MEL from the selection programme (SL_CH) in Switzerland, which simultaneously represents five different patrilines (P1-P5). The sample size, geographic origin and location of the five different patrilines and conserved MEL colonies are summarised in Table 1 and Supplementary Figure 1. It should be noted that P1 is located in close proximity to the conservation area (CS_CH) and that P4 and P5 have a common maternal origin. The 49 sampled CAR colonies originated from Switzerland (22), Sweden (3), Norway (3) and the United States of America (21). For each colony, approximately 500 workers were collected with a standardized sampling method to include all existing patrilines among workers in the colony.

DNA extraction and pool sequencing

DNA extraction and pool sequencing of the sampled colonies are described in detail by Guichard et al. (2021). Briefly, the approximately 500 workers per colony were shredded in a DNA extraction solution. Pair-end sequencing was performed on an IlluminaTM HiSeq 3000 or a NovaSeqTM 6000 platform. To significantly increase computing time, the pool sequence analysis was restricted to an informative marker panel including 7,023,977 genome-wide SNPs. Raw reads from pool sequencing of the 265 colonies were aligned to the honey bee reference genome Amel_HAV3.1, Genebank assembly accession GCA_003254395.2 (Wallberg et al., 2019). After the alignment, the resulting BAM files were converted into pileup files using the samtools mpileup utility (Li et al., 2009). Files produced by mpileup were interpreted by the PoPoolation2 utility mpileup2sync (Kofler, Pandey, & Schlötterer, 2011) for the Sanger Fastq format, with a minimum quality of 20. Finally, sync files were converted to a depth file containing a sequencing depth value for each SNP and count files summarising reference and alternative allele counts for each SNP.

Quality filtering and dosage data conversion

Based on the aforementioned count files, we removed 99,555 SNPs with multiple alternative alleles and 207,904 SNPs with an excessively high and low sequencing depth. After this quality control, we calculated the frequencies of the reference and the alternative alleles for 6,716,518 SNPs and additionally excluded 771,835 homozygous loci. The remaining 5,944,683 SNPs were summarised in PLINK dosage and map files. To detect ROH segments of the colonies, dosage data were converted to hard-called genotypes using the command – import dosage with ahard-call threshold of 0.4, as implemented in PLINK 2.0 (Chang et al., 2015). Genotypes set to missing during the conversion were imputed with the software BEAGLE 5.2 (Browning, Zhou, & Browning, 2018). For the population structure analyses, hard-called SNP genotypes were further pruned for minor allelic frequency above 5%, which resulted in 1,505,596 genome-wide SNPs.

High-resolution population network

To ascertain the high-resolution population structure of honey bees, we performed a population network visualization. The different components involved in the so-called NetView approach are described in detail by Neuditschko et al. (2012) and Steining et al. (2016). Briefly, we computed genetic distances by subtracting pairwise relationships identical-by-state (IBS) from 1 and applied the algorithm in its default setting (number of k nearest neighbours k -NN = 10). To illustrate the genetic relatedness between neighbouring honey bee colonies, we associated the thickness of edges (connecting lines) with the proportion of the genetic distance, with thicker edges corresponding to lower genetic distances. To identify highly inbred honey bee colonies, we scaled the node size of each colony based on the individual FROH. The node colour was associated according to the sampled subpopulations and the individual level of admixture at the optimal number of clusters.

Admixture

Colony admixture levels and genetic distances (FST) between the subspecies were determined using the program Admixture 1.23 (Alexander, Novembre, & Lange, 2009). We ran Admixture for 100 iterations increasing K from 2 to 10. Convergence between independent runs at the same K was monitored by comparing the resulting log-likelihood scores (LLs) following 100 iterations, and was inferred from stabilized LLs with less than 1 LL unit of variation between runs. Cross validation (CV) error estimation for each K was performed to determine the optimal number of clusters. Admixture results increasing K from 2 to 7 were visualized with the program Distruct 1.1 (Rosenberg, 2004) and integrated in the high-resolution population network, as described above.

Runs of homozygosity

Runs of homozygosity segments were determined with an overlapping window approach implemented in PLINK v.1.9 (Chang et al., 2015) including the aforementioned 5,944,683 genome-wide SNPs. The following settings were applied: a minimum SNP density of one SNP per 80 kb, a maximum gap length of 100 kb, and a minimum length of homozygous segment of 200 kb, while two heterozygotes were permitted in each segment. The total number of ROH (NROH), the total length of ROH segments (SROH) and the average length of ROH (LROH) were summarised for the two subspecies (CAR and MEL) and the respective subpopulations. The genomic-based inbreeding coefficients (FROH) were calculated by dividing SROH by the length of the autosomal genome (LAUTO), which was set to 220.76 Mb (Wallberg et al., 2019). Furthermore, we compared FROH of 74 SL_CH colonies with pedigree-based inbreeding coefficients (FPED) using a linear regression model as implemented in the statistical computing software R (R Core Team, 2013), whereas FPED were calculated following the method described by Brascamp et al. (2014).

Homozygosity islands and gene functions

Homozygosity islands of the two subspecies were determined based on overlapping homozygous regions present in more than 50% of the colonies with the R package detectRUNS (Biscarini, Cozzi, Gaspa, & Marras, 2019). The length and distribution of the homozygosity islands on the chromosomes were visualised using the R package RIdeogram (Hao et al., 2020). We used the NCBI genome data viewer (https://www.ncbi.nlm.nih.gov/genome/gdv/), and the reference genome assembly Amel_HAv3.1 (Wallberg et al., 2019) to identify genes located in homozygosity islands. For each subspecies, we regrouped the characterised genes by the function term based on the annotation chart from the open source Database for Annotation, Visualization, and Integrated Discovery v.6.8 package (https://david.ncifcrf.gov), using the Apis mellifera annotation file, with a minimum of two genes per functional group and a Bonferroni-corrected significance threshold of p<0.05. Furthermore, we specified the known functions of the identified genes by conducting a literature review.