Materials and Methods
Specimen collection. To evaluate marker amplification and sequencing capacity, we sampled bees from populations of O. lignaria from California (El Dorado, Fresno, and Placer Counties) and Idaho (Franklin County) sourced from commercial suppliers who trap bees on private lands. However, the sex of these specimens was not recorded. Thus, material derived from these specimens were only used for method development as they are inappropriate for population genetic analysis due to constraints of haplodiploidy on our analytical methods.
Additional collections of O. lignaria were later achieved by placing wooden blocks (n = 5) with paper straws inserted into drilled tunnels in two canyons in the Intermountain West of North America: Birch Creek, Utah (41.852556, -111.775259, 1620 m) and Cub River, Idaho (42.138868, -111.667406, 1678 m). These two trapping sites were ~30 km apart and both located in canyons in the Bear River Mountains (Fig. 1). The nesting blocks were deployed in April 2018, retrieved in May 2018, and brought to the U.S. Department of Agriculture - Agricultural Research Service - Pollinating Insect Biology, Management, Systematics Research Unit (USDA-ARS-PIBMSRU). The straws were removed so that contents of nest cells inside could be determined from x-radiological images. In October 2018, cells containing cocoon diapausing adults were excised from nests and stored at - 80°C until needed for species identification, sex determination, and molecular testing. The pharates were easily sexed based on dimorphic characteristics from dissected cocoons (males have an abundance of setae on their clypeus and frons whereas females lack this trait) and their position in the straw nest. In general, the first laid cells in a nest develop into female bees (distal end of straw), whereas the later laid cells develop into male bees (proximate end of straw). All O. lignaria individuals captured in the nesting blocks were used for population genetic analysis.
Microsatellite mining and assessment. To identify microsatellite loci, we surveyed the publicly available annotated genome assembly of O. lignaria (USDA_OLig_1.0), available on NCBI (https://www.ncbi.nlm.nih.gov/assembly/GCF_012274295.1/). Using Krait v1.3.3, we queried the genome assembly for microsatellite markers across 147 scaffolds and developed primers to be tested using PCR amplification (Du et al., 2018). In this study, we identified perfect, imperfect (i.e. contain indels or substitutions), and compound tandem repeats in silico . Microsatellites were filtered using custom scripts in the R programming language (R Core Team, 2020). Microsatellites that were di-, tri-, and tetra-nucleotide repeats were targeted. Products that were 100 – 500 nucleotides (nts) long were selected. Primer pairs that annealed to multiple sites across the genome were excluded. Loci that have primer pairs not exhibiting between 40 and 60% GC content were excluded. Only primers that have an annealing temperature near 55°C (54.5 - 55.4°C) were included. A more detailed narrative of the conditions used to filter microsatellites and design primers is provided in the Supplementary Material 1. Of the loci that fit our search criteria, we randomly selected 22 loci for further testing with molecular methods. We used NCBI Genome Data Viewer to examine the placement of the novel loci across intergenic (i.e.exon) and intragenic (i.e. intron) space in the annotatedO. lignaria genome assembly, as well as proximity and placement near or within a gene, respectively.
Genomic DNA was extracted from the legs of 124 adult specimens with a Quick-DNA Miniprep Plus Kit (Zymo Research, Irvine, CA, U.S.A.) following the manufacturer’s protocol, except that the Proteinase K incubation step with the following modifications: Proteinase K digestion incubated 12-16 hours, DNA elution buffer was warmed to 60 °C prior to addition, and elution buffer was added in two steps to increase DNA yield. The 22 novel loci were amplified with extracted DNA from a haphazardly selected sample of eight specimens, PCR amplified, and confirmed on 1.2% agarose gel alongside a 100 bp ladder (GeneRuler DNA Ladder, ThermoFisher Scientific). PCR reactions were performed using the following methods: 1-2 μl extracted DNA, 1x Promega (Madison, WI) reaction buffer, 0.6 mM dNTP mixture, 0.1-0.4 μM primer, 0.001 mg BSA, 0.4 units Taq polymerase (Promega, Madison WI), and 1.4 mM MgCl2. The PCR conditions for multiplex reactions were: one 4 min cycle at 95 °C, 30 cycles of 95 °C for 30 sec, and annealing at 55 °C for 75 sec, then 72 °C for 45 sec. The cycles were followed by a final extension period of 15 min at 72 °C.
Once a product was confirmed at the expected length in the PCR reactions, the 5’ ends of each forward primer were fluorescently labeled as either 6-FAM, VIC, NED, or PET and combined in a four-marker multiplex reaction. Each multiplex reaction represented one of the dye-labeled conditions, which were then separated on an ABI PRISM™ 3730 DNA Analyzer along with a dye-labeled size standard (GeneScan™ 500 LIZ™ dye size standard, Applied BiosystemsTM, ThermoFisher Scientific) at the Utah State University Center for Integrated Biosystems (Table 1). Allele sizes were scored using Geneious Prime 2021.0.1 (Kearse et al., 2012). Twenty percent of the specimens underwent additional PCR amplification to determine genotype error rates. We tested for differences in genotype error rates across the different microsatellite types with a Wilcoxon Rank-sum test.
Sibship identity and population genetic analysis. We first identified sibship relationships at the Birch Creek and Cub River field sites by assigning individuals to families using full-pedigree likelihood methods implemented in Colony v2.0 Linux software (Jones & Wang, 2010). We set the mistyping error rates of each locus based on the genotype error rates for the 20% of specimens re-genotyped (Table 1). We set the sex-determination system to ‘haplodiploid’ and mating system I to “Polygamous” for both males and females based on research by Bosch and Kemp (2001). Colony analyses were conducted in Ceres, a high-performance computing cluster maintained by the Agricultural Research Service SCINet (https://scinet.usda.gov/
To determine population genetic diversity within and across the twoO. lignaria populations, we first identified whether the loci were in Hardy-Weinberg equilibrium (HWE) and that the loci were in random association (i.e., linkage disequilibrium [LD]). Deviations from HWE and LD resulted in the removal of the loci for population genetic analysis. HWE and LD tests were conducted with the Genepop algorithm (Raymond & Rousset, 1995). To account for multiple comparisons inherent to HWE and LD analyses, sequential Bonferroni corrections were applied to the HWE and LD P values to minimize type I errors across population by marker combinations. Following the HWE and LD analyses, we calculated observed and Nei’s expected heterozygosity (Hoand Nei’s He , respectively), Simpson’s Index of diversity, allelic richness, number of alleles, and private alleles for the Birch Creek and Cub River study populations.Ho is the observed number of loci that are heterozygous for a sampled individual whereas Nei’sHe is the expected number of loci that are heterozygous for a sampled individual. Simpson’s Index of diversity (D) measures the probability that individuals randomly selected from a sample will belong to the same allele class. In our study, we present the index 1-D which represents the probability that two individuals randomly selected from a population will belong to a different allele class. Thus, the greater the 1-D index, the greater the diversity of alleles (Magurran, 2003). Allelic richness is calculated as the number of alleles at a locus divided by the number of specimens without data missing at the target locus. Number of alleles is the number of alleles at a locus, and private alleles is the number of alleles that were only detected in one population. We tested for differences in these genetic diversity indices between Birch Creek and Cub River with a two-sample t -test or Wilcoxon rank-sum test. The latter non-parametric test was used if the genetic diversity indices demonstrated unequal variances between the two populations.
To characterize population genetic structure, we conducted an analysis of molecular variance (AMOVA). This approach mirrors the analysis of variance approach based on Wright’s fixation indices (Weir & Cockerham, 1984). We conducted a hierarchical analysis of molecular variance where variance was partitioned to covariance components based on variation within sample, variation between samples, and variation between populations. These covariance components were calculated as Φ fixation indices (Grünwald & Hoheisel, 2006). We further tested for the significance of Φ through comparison of 999 nonparametric permutations of the data (Kamvar et al. 2015) . We conducted additional independent tests for structure between the two populations by calculating the following indices:FST , FIS ,GST , and Jost’s D. Definitions of these different fixation indices are articulated in Meirmans and Hedrick (2011). In general, higher values of FST ,GST , and Jost’s D implicate higher fixation of alleles at a locus in a population, whereas lower values of these indices suggest lower fixation of alleles at a locus in a population.FIS is a measure of inbreeding, where high values of FIS are evidence for inbreeding. In these tests of population structure, we calculated theFIS based on a global comparison between populations also with 999 permutations of the data (Archer et al., 2017).
Osmia subspecies, species, and clade microsatellite marker amplification. We aimed to determine the capacity for the novel microsatellite markers to amplify in taxa other than Osmia lignaria . We explored marker amplification for the apicata ,bicornis , emarginata , and ribifloris clades following Branstetter et al. (2021). We included three species/subspecies of apicata , 12 species/subspecies of bicornis (including O. lignaria ), seven species/subspecies of emarginata , and two species/subspecies of ribifloris in our assessment. DNA was sampled from a specimen library described in Branstetter et al. (2021).
Except for the Colony analysis conducted on SCINet, all population genetic analyses were conducted with the base, strataG (Archer, Adams, & Schneiders, 2017), poppr (Kamvar, Brooks, & Grünwald, 2015), and adegent (Jombart, 2008) libraries in R version 4.0.3 (R Core Team 2020). Data and R scripts of the described analysis are available athttps://github.com/jbkoch/OligMsatStudy.