DISCUSSION
Non-invasive samples (e.g. feces, hair snags) are viable sources of DNA
for genetic monitoring and may help mitigate some of the limitations of
traditional monitoring methods, while also providing opportunities to
integrate Traditional Ecological Knowledge and western science
perspectives. Here, we developed a final GT-seq assay of 324 SNPs
optimized for genotyping of polar bears based on DNA from non-invasively
collected fecal samples. We demonstrated 1) successful GT-seq genotyping
of DNA from field-collected polar bear feces, and 2) the practicality of
GT-seq for distinguishing individuals, assessing relatedness, and
expanding our understanding of genetic population structure and
diversity for Canadian polar bear subpopulations. Here, we determined
that all existing unrelated bears could be distinguished with as few as
34 SNPs (Figure S2). Given the current global polar bear population
estimate of 23, 315 (Hamilton & Derocher, 2019), our panel of 322
autosomal and 2 sex-linked SNPs is more than sufficient to reliably
discern bears among relationship categories (e.g. siblings, unrelated)
with high certainty. This is supported by the marked separation among
density plots for full siblings, half siblings, parent-offspring pairs
or unrelated individuals (Figure 1). Given this ability to differentiate
individuals based on both tissue and fecal samples, our GT-seq assay may
be particularly useful in genetic mark-recapture studies making use of
non-invasive scat surveys.
We have also shown that our GT-seq panel provides sufficient statistical
power to assess polar bear population structure and diversity. Diversity
metrics based on our GT-seq dataset resemble those of previous studies
based on SNPs, mtDNA, or microsatellites (e.g. Jensen et al., 2020;
Malenfant, Davis, Cullingham, & Coltman, 2016), although our estimates
of population diversity may have been slightly higher than expected due
to our selection of high diversity SNPs during panel selection. Levels
of genetic diversity did not vary markedly among subpopulations,
corroborating previous suggestions that these existing subpopulations do
not reflect the subtle patterns of polar bear dispersal and genetic
structure in the Canadian Arctic (Jensen et al., 2020; Malenfant et al.,
2016). Although many subpopulations have only subtle differences in
genetic diversity, high assignment rates for a few of the subpopulations
(e.g. GB, SH, MC) may have implications for conservation and management
planning. In our original selection of SNPs for our GT-seq panel, we
focused on selecting high diversity SNPs to maximize resolution of
individuals and population genetic structure. However, other
characteristics of a species may also influence the number of markers
required to resolve differences among individuals and subpopulations,
including its mating system and linkage patterns within its genome. For
example, mating systems and small populations with high relatedness may
require a larger panel of loci for individual identification. Final
panel selection also depends on intended application. Here, we aimed to
examine baseline polar bear population structure and thus, we selected
putatively neutral SNPs with high diversity by filtering for minor
allele frequency and departure from Hardy-Weinberg Equilibrium. In the
future, we may wish to develop a panel to assess adaptation in real time
by including SNPs that are flagged as potentially being under selection,
or perhaps a panel to distinguish between bear species (e.g. grizzly and
polar bears) through selection of species diagnostic SNPs, similar to
application of GT-seq to invasive brown (Rattus norvegicus ) and
black rats (R. rattus ; Sjodin, Irvine, & Russello, 2020). Thus,
the GT-seq assay can be modified and widely adapted to other study
systems to address a range of genetic questions.
GT-seq uses a reduced panel of SNPs that allows genotyping of samples
with degraded DNA. It thus enables us to combine GT-seq data with data
from other methods (e.g. ddRADseq), as long as there are loci in common.
However, differences in genotyping calling methods between GT-seq and
RADseq datasets may contribute to genotype discordance among datasets
(Schmidt et al., 2020). To address this possibility, we called genotypes
from our GT-seq data using both the original calling pipeline for GT-seq
(Campbell et al., 2015) and a BCFTOOLS workflow previously used to call
genotypes from ddRADseq data (Jensen et al., 2020). Genotypic
discordance between the two calling workflows was low (1.1%) but still
suggested some variation between calling methods for GT-seq and ddRADseq
data. In addition, levels of missing data were moderate for all
workflows (21.3-25.4%). Based on these results, we acknowledge that
workflow choice is primarily a matter of preference with discordance
among workflows being negligible. Regardless, we recommend being
consistent and using the same workflow while working with both GT-seq
and ddRADseq data.
Genotyping success estimated using the BCF-6 workflow was high across
all sample types, with FF samples in particular being genotyped
successfully (>50% data) for 85.7% of individuals with
qPCR-detectable polar bear DNA, with an average of 14.9% missing data.
We estimated genotyping error for field scat to be around 10% based on
MS:CF comparisons and FF sample replicates not screened before
sequencing, and much lower for other sample types (e.g. 0.17% for
biopsy). Genotyping success using GT-seq was expected to be relatively
low for field feces compared to tissue samples, as there is an unknown
length of environmental exposure and each sample contains a unique mix
of DNA from pathogens, gut microflora and microfauna, diet species, and
the host itself. High genotyping error rates may introduce significant
bias into estimates of population parameters, and greater uncertainty
into individual identification and relatedness estimates (Bonin et al.,
2004; Gagneux, Boesch, & Woodruff, 1997; Kleinman-Ruiz et al., 2017;
von Thaden et al., 2017). To address this limitation and minimize the
use of resources on fecal samples that are unlikely to have sufficient
polar bear DNA, we evaluated a qPCR screening assay (Hayward et al.,
2020). We found that the qPCR screening assay substantially increased
the percentage of field fecal samples successfully genotyped by GT-seq
(30.6 to 85.7%). In one case, duplicate subsamples retrieved from the
same fecal sample demonstrated a large difference in genotyping success
(1 failure, 1 success), suggesting that samples can be heterogenous in
amount of host DNA and samples for which genotyping is critical may be
repeatedly subsampled to achieve success. Thus, we recommend using a
similar sample quality screening step for any future research making use
of GT-seq and non-invasive samples.
By combining our new GT-seq data with our ddRADseq data (Jensen et al.,
2020), we were able to expand our sample sizes for certain
subpopulations (e.g. GB, MC). This allowed us to reassess Canadian polar
bear population structure using a larger dataset (642 vs. 358
individuals from Jensen et al., 2020) and make use of samples that were
unusable using ddRADseq methodology (i.e. fecal, degraded biopsy and
muscle samples), which requires high quality DNA. These combined data,
including genotypes from non-invasively collected feces, allowed us to
resolve clear genetic differentiation within the Canadian Arctic. Sample
clustering patterns and geographic distributions were largely consistent
with genetic groups called the “Hudson Complex,” “Arctic
Archipelago,” and “Polar Basin” described previously both in studies
using microsatellites and SNPs (Jensen et al., 2020; Malenfant, Coltman,
& Davis, 2015; Malenfant et al., 2016; Paetkau et al., 1999).
Boundaries between our clusters were not discrete, with the existence of
admixed individuals (n = 90) and individuals that belonged to one
genetic cluster being found within the geographic range of another. Due
to our small sample size for Norwegian Bay subpopulation (n = 1), we
were unable to confirm its status as its own genetic cluster, as has
been proposed (Malenfant et al., 2015; Malenfant et al., 2016; Paetkau
et al., 1999). However, a new genetic cluster coincident with the
M’Clintock Channel subpopulation emerged in both our STRUCTURE and DAPC
analyses (Figures 2 & 3), a pattern not evident from analysis of
ddRADseq data alone (Jensen et al., 2020). To rule out technical reasons
for this new cluster, we confirmed that our data do not cluster based on
sample type (e.g. FF vs. MS vs. BP) or data type (e.g. GT-seq vs.
ddRADseq; Figure S5). Rather than M’Clintock Channel existing as a clear
genetic cluster comparable to “Hudson Complex,” “Arctic
Archipelago,” and “Polar Basin”, it seems more likely that it is the
result of more subtle genetic differentiation within the Arctic
Archipelago or possibly a by-product of the timespan over which our
samples were collected.
Based on mitochondrial DNA and microsatellite data, the Gulf of Boothia
and M’Clintock Channel subpopulations have been recommended as
significantly differentiated conservation units with both historical and
contemporary barriers to gene flow (Campagna et al., 2013). Similarly,
analyses of SNP data have suggested 6 genetic clusters, including
Western and Eastern clusters within both the Arctic Archipelago and
Polar Basin major clusters (Malenfant et al., 2016). Preprint SNP data
(Malenfant et al., in press) also imply that the Arctic Archipelago
cluster may comprise 3 genetically differentiated groups: a Western
Archipelago cluster that includes Viscount Melville Sound and M’Clintock
Channel, a Gulf of Boothia cluster, and an Eastern Archipelago cluster
including Lancaster Sound, Kane Basin, and Baffin Bay. Thus, it may be
that our M’Clintock Channel subcluster is part of a larger Western
Arctic Archipelago cluster. Our small sample size for Viscount Melville
Sound may have precluded clear resolution of a Western Archipelago
cluster, as there exists an admixture zone between a M’Clintock
Channel-Viscount Melville Sound subcluster and the Polar Basin
(Malenfant et al., in press).
Our dataset contains samples collected between 1998 and 2018 (a 20-year
timespan) which presents a potential temporal confound that is not
uncommon to studies investigating polar bear population structure (e.g.
Malenfant et al., 2016; Paetkau et al., 1999; Paetkau, Calvert,
Stirling, & Strobeck, 1995). Considering that the mean sample
collection date for M’Clintock Channel was 1999 and the rapidness of
environmental changes in the Canadian Arctic, it is possible that these
data do not represent the contemporary genetic structuring of the
Canadian polar bear population. Regardless, we were able to discern
individual bears and to resolve genetic patterns comparable to other key
studies of polar bears (e.g. Malenfant et al., 2016; Paetkau et al.,
1995, 1999), implying that our panel does adequately capture these known
patterns. Importantly, we are able to do so with much greater
cost-efficiency and based on genetic data collected from non-invasive
and degraded samples. Thus, GT-seq presents a real opportunity to
efficiently and iteratively assess the genetic structure of Canada’s
polar bear populations – and a flexible and cost-effective way to
update our understanding of subpopulation structure as continued
sampling is performed.
We originally intended to include at least 30 individuals from each
Canadian subpopulation in our ddRADseq dataset (Jensen et al., 2020)
used to develop our GT-seq SNP panel, but some subpopulations had fewer
than 30 samples of suitable quality. Ascertainment bias may thus be of
concern as the GT-seq panel did not represent the full array of
subpopulations, nor did it include data from other Arctic nations. Thus,
application to the global polar bear population may require some panel
modifications. Establishing baseline global polar bear population
structure will be especially important for examining trends in dispersal
patterns, diversity and structure, and behaviour in the context of a
rapidly changing climate. High-quality genotyping of non-invasive
samples enabled by GT-seq will facilitate monitoring of admixture,
dispersal, and temporal trends in structure.
The need for monitoring using genetic methods that can yield results
quickly in the midst of quick environmental change, as well as the need
to integrate Indigenous ways of knowing and western science, is not
unique to polar bear monitoring. Here, we respond to a long-expressed
desire by northern communities for a means to non-invasively monitor
polar bears. We demonstrate that we can use GT-seq to differentiate
individuals with a high degree of certainty and comprehensively discern
population structure at levels comparable to other methods of genetic
monitoring (e.g. Malenfant et al., 2016; Paetkau et al., 1995, 1999),
but with much greater efficiency (Campbell et al., 2015) and based on
the inclusion of non-invasive and degraded samples. The GT-seq assay
that we developed here can provide a foundation for new community-based
programs that aim to improve temporal monitoring of the global polar
bear population and directly inform conservation efforts and government
policy. Importantly, envisioned community-based monitoring will
incorporate Traditional Ecological Knowledge and the perspectives of
Indigenous communities throughout the planning and monitoring processes,
while also providing both social and economic benefits to northern
communities. This GT-seq protocol can also serve as the foundation for a
comprehensive toolkit to assess important aspects of polar bear and
ecosystem health (e.g. contaminants, parasite load, diet). With GT-seq
at the core of the toolkit, a suite of data would be provided to
communities and territorial governments that can be incorporated into a
community-driven framework that will ultimately have societal and
management implications. As this framework and our GT-seq protocol can
be adapted to other species and for other research questions, the
monitoring methodology we propose here may be adapted and applied as a
model for inclusive wildlife monitoring worldwide.