Materials and Methods
Specimen collection. To evaluate marker amplification and
sequencing capacity, we sampled bees from populations of O.
lignaria from California (El Dorado, Fresno, and Placer Counties) and
Idaho (Franklin County) sourced from commercial suppliers who trap bees
on private lands. However, the sex of these specimens was not recorded.
Thus, material derived from these specimens were only used for method
development as they are inappropriate for population genetic analysis
due to constraints of haplodiploidy on our analytical methods.
Additional collections of O. lignaria were later achieved by
placing wooden blocks (n = 5) with paper straws inserted into
drilled tunnels in two canyons in the Intermountain West of North
America: Birch Creek, Utah (41.852556, -111.775259, 1620 m) and Cub
River, Idaho (42.138868, -111.667406, 1678 m). These two trapping sites
were ~30 km apart and both located in canyons in the
Bear River Mountains (Fig. 1). The nesting blocks were deployed in April
2018, retrieved in May 2018, and brought to the U.S. Department of
Agriculture - Agricultural Research Service - Pollinating Insect
Biology, Management, Systematics Research Unit (USDA-ARS-PIBMSRU). The
straws were removed so that contents of nest cells inside could be
determined from x-radiological images. In October 2018, cells containing
cocoon diapausing adults were excised from nests and stored at - 80°C
until needed for species identification, sex determination, and
molecular testing. The pharates were easily sexed based on dimorphic
characteristics from dissected cocoons (males have an abundance of setae
on their clypeus and frons whereas females lack this trait) and their
position in the straw nest. In general, the first laid cells in a nest
develop into female bees (distal end of straw), whereas the later laid
cells develop into male bees (proximate end of straw). All O.
lignaria individuals captured in the nesting blocks were used for
population genetic analysis.
Microsatellite mining and assessment. To identify
microsatellite loci, we surveyed the publicly available annotated genome
assembly of O. lignaria (USDA_OLig_1.0), available on NCBI
(https://www.ncbi.nlm.nih.gov/assembly/GCF_012274295.1/). Using Krait
v1.3.3, we queried the genome assembly for microsatellite markers across
147 scaffolds and developed primers to be tested using PCR amplification
(Du et al., 2018). In this
study, we identified perfect, imperfect (i.e. contain indels or
substitutions), and compound tandem repeats in silico .
Microsatellites were filtered using custom scripts in the R programming
language (R Core Team,
2020). Microsatellites that were di-, tri-, and tetra-nucleotide
repeats were targeted. Products that were 100 – 500 nucleotides (nts)
long were selected. Primer pairs that annealed to multiple sites across
the genome were excluded. Loci that have primer pairs not exhibiting
between 40 and 60% GC content were excluded. Only primers that have an
annealing temperature near 55°C (54.5 - 55.4°C) were included. A more
detailed narrative of the conditions used to filter microsatellites and
design primers is provided in the Supplementary Material 1. Of the loci
that fit our search criteria, we randomly selected 22 loci for further
testing with molecular methods. We used NCBI Genome Data Viewer to
examine the placement of the novel loci across intergenic (i.e.exon) and intragenic (i.e. intron) space in the annotatedO. lignaria genome assembly, as well as proximity and placement
near or within a gene, respectively.
Genomic DNA was extracted from the legs of 124 adult specimens with a
Quick-DNA Miniprep Plus Kit (Zymo Research, Irvine, CA, U.S.A.)
following the manufacturer’s protocol, except that the Proteinase K
incubation step with the following modifications: Proteinase K digestion
incubated 12-16 hours, DNA elution buffer was warmed to 60 °C prior to
addition, and elution buffer was added in two steps to increase DNA
yield. The 22 novel loci were amplified with extracted DNA from a
haphazardly selected sample of eight specimens, PCR amplified, and
confirmed on 1.2% agarose gel alongside a 100 bp ladder (GeneRuler DNA
Ladder, ThermoFisher Scientific). PCR reactions were performed using the
following methods: 1-2 μl extracted DNA, 1x Promega (Madison, WI)
reaction buffer, 0.6 mM dNTP mixture, 0.1-0.4 μM primer, 0.001 mg BSA,
0.4 units Taq polymerase (Promega, Madison WI), and 1.4 mM
MgCl2. The PCR conditions for multiplex reactions were:
one 4 min cycle at 95 °C, 30 cycles of 95 °C for 30 sec, and annealing
at 55 °C for 75 sec, then 72 °C for 45 sec. The cycles were followed by
a final extension period of 15 min at 72 °C.
Once a product was confirmed at the expected length in the PCR
reactions, the 5’ ends of each forward primer were fluorescently labeled
as either 6-FAM, VIC, NED, or PET and combined in a four-marker
multiplex reaction. Each multiplex reaction represented one of the
dye-labeled conditions, which were then separated on an ABI PRISM™ 3730
DNA Analyzer along with a dye-labeled size standard (GeneScan™ 500 LIZ™
dye size standard, Applied BiosystemsTM, ThermoFisher
Scientific) at the Utah State University Center for Integrated
Biosystems (Table 1). Allele sizes were scored using Geneious Prime
2021.0.1 (Kearse et al.,
2012). Twenty percent of the specimens underwent additional PCR
amplification to determine genotype error rates. We tested for
differences in genotype error rates across the different microsatellite
types with a Wilcoxon Rank-sum test.
Sibship identity and population genetic analysis. We first
identified sibship relationships at the Birch Creek and Cub River field
sites by assigning individuals to families using full-pedigree
likelihood methods implemented in Colony v2.0 Linux software
(Jones & Wang, 2010). We
set the mistyping error rates of each locus based on the genotype error
rates for the 20% of specimens re-genotyped (Table 1). We set the
sex-determination system to ‘haplodiploid’ and mating system I to
“Polygamous” for both males and females based on research by Bosch and
Kemp (2001).
Colony analyses were conducted in Ceres, a high-performance computing
cluster maintained by the Agricultural Research Service SCINet
(https://scinet.usda.gov/
To determine population genetic diversity within and across the twoO. lignaria populations, we first identified whether the loci
were in Hardy-Weinberg equilibrium (HWE) and that the loci were in
random association (i.e., linkage disequilibrium [LD]).
Deviations from HWE and LD resulted in the removal of the loci for
population genetic analysis. HWE and LD tests were conducted with the
Genepop algorithm (Raymond
& Rousset, 1995). To account for multiple comparisons inherent to HWE
and LD analyses, sequential Bonferroni corrections were applied to the
HWE and LD P values to minimize type I errors across population
by marker combinations. Following the HWE and LD analyses, we calculated
observed and Nei’s expected heterozygosity (Hoand Nei’s He , respectively), Simpson’s Index of
diversity, allelic richness, number of alleles, and private alleles for
the Birch Creek and Cub River study populations.Ho is the observed number of loci that are
heterozygous for a sampled individual whereas Nei’sHe is the expected number of loci that are
heterozygous for a sampled individual. Simpson’s Index of diversity (D)
measures the probability that individuals randomly selected from a
sample will belong to the same allele class. In our study, we present
the index 1-D which represents the probability that two individuals
randomly selected from a population will belong to a different allele
class. Thus, the greater the 1-D index, the greater the diversity of
alleles (Magurran, 2003).
Allelic richness is calculated as the number of alleles at a locus
divided by the number of specimens without data missing at the target
locus. Number of alleles is the number of alleles at a locus, and
private alleles is the number of alleles that were only detected in one
population. We tested for differences in these genetic diversity indices
between Birch Creek and Cub River with a two-sample t -test or
Wilcoxon rank-sum test. The latter non-parametric test was used if the
genetic diversity indices demonstrated unequal variances between the two
populations.
To characterize population genetic structure, we conducted an analysis
of molecular variance (AMOVA). This approach mirrors the analysis of
variance approach based on Wright’s fixation indices
(Weir & Cockerham, 1984).
We conducted a hierarchical analysis of molecular variance where
variance was partitioned to covariance components based on variation
within sample, variation between samples, and variation between
populations. These covariance components were calculated as Φ fixation
indices (Grünwald &
Hoheisel, 2006). We further tested for the significance of Φ through
comparison of 999 nonparametric permutations of the data (Kamvar et al.
2015) . We conducted additional independent tests for structure
between the two populations by calculating the following indices:FST , FIS ,GST , and Jost’s D. Definitions of these different
fixation indices are articulated in Meirmans and Hedrick
(2011). In
general, higher values of FST ,GST , and Jost’s D implicate higher fixation of
alleles at a locus in a population, whereas lower values of these
indices suggest lower fixation of alleles at a locus in a population.FIS is a measure of inbreeding, where high values
of FIS are evidence for inbreeding. In these
tests of population structure, we calculated theFIS based on a global comparison between
populations also with 999 permutations of the data (Archer et al.,
2017).
Osmia subspecies, species, and clade microsatellite
marker amplification. We aimed to determine the capacity for the novel
microsatellite markers to amplify in taxa other than Osmia
lignaria . We explored marker amplification for the apicata ,bicornis , emarginata , and ribifloris clades
following Branstetter et al.
(2021). We
included three species/subspecies of apicata , 12
species/subspecies of bicornis (including O. lignaria ),
seven species/subspecies of emarginata , and two
species/subspecies of ribifloris in our assessment. DNA was
sampled from a specimen library described in Branstetter et al. (2021).
Except for the Colony analysis conducted on SCINet, all population
genetic analyses were conducted with the base, strataG
(Archer, Adams, &
Schneiders, 2017), poppr
(Kamvar, Brooks, &
Grünwald, 2015), and adegent
(Jombart, 2008) libraries
in R version 4.0.3 (R Core Team 2020). Data and R scripts of the
described analysis are available athttps://github.com/jbkoch/OligMsatStudy.