DISCUSSION
In this study, we designed and developed four R functions that
automate tasks commonly needed in conservation genomic analyses: (1)filter.sex.linked to identify and remove sex-linked loci, (2)infer.sex to infer the genetic sex of individuals using
sex-linked loci, (3) filter.excess.het to remove loci with
abnormally high heterozygosity, and (4) gl2colony to produce
input files for parentage analysis software. Use of these functions on
genomic data for two bird species revealed that standard filters, such
as low read depth and call rate, are inefficient at removing sex-linked
loci, removing fewer than half of Z-linked loci and only 29-63% of
gametologs. In the two studied species, the failure to comprehensively
remove sex-linked loci led to one or more of: (i) overestimation of up
to 9% of population F IS, and up to 8% of the
number of private alleles (ii) incorrectly inferring sex differences in
individual heterozygosity, (iii) capturing sex genomic differences
instead of population structure, and (iv) inferring
~11% fewer parent-offspring relationships in parentage
analyses. We also found that our functions were capable of identifying
all sex-linked loci using as few as 15 known males and 15 known females,
through a preliminary run of filter.sex.linked , followed by
running infer.sex and then re-running filter.sex.linked .
Appropriate filtering is a challenging part of population genomic
analyses. It is widely acknowledged that filtering can significantly
affect the inferences drawn from different analyses, ranging from
‘simple’ standard measures like heterozygosity, all the way to GEA
(e.g., Fu 2014; Linck & Battey 2019; Graham et al. 2020; William et al.
2022; Ahrens et al. 2021). Given this awareness, there is surprisingly
little mention of best-practices for filtering out sex-linked loci from
SNP datasets in population genomics research (but see Benestan et al.
2017 and Trenkel et al. 2020). Unless using per-markerF ST or dartR ’s gl.report.sexlinked
function to explicitly identify sex-linked markers, studies rarely
address them, and seem to rely mainly on read depth and loci missing
data filters to remove sex-linked loci from large SNP datasets. We have
demonstrated that this untargeted approach fails to remove
~19-29% of all sex-linked loci. Filtering sex-linked
markers based only on assumed synteny with the chromosome location of a
heterospecific reference genome can also result in failing to account
for neo-sex chromosomes in evolutionary studies (Morales et al. 2018).
Recent discoveries of neo-sex chromosome systems in Sylvioidea (Sigeman
et al. 2020; Sigeman et al. 2022), Australian robins (Gan et al. 2019),
insects (Wang et al. 2022) and other systems highlight dangers of
assuming synteny with reference genomes of other species while detecting
sex-linked loci. Thus, we propose that use our filter.sex.linkedfunction to remove sex-linked loci before applying SNP quality
filters can comprise best-practice that will ensure that downstream
filters are in fact evaluating the quality of autosomal loci.
We showed that the failure to remove sex-linked loci meant that a
considerable proportion—7.8% and 5.7%—of the SNPs in the final
datasets were not autosomal, and therefore, yielded incorrect estimates
of population diversity. Interestingly, the effect of sex-linked loci on
genetic diversity biases varied among populations unpredictably, and was
not influenced by the within-population sex-ratio (Figure 5). This is
likely because there are many factors intervening in addition to sample
sex-bias, such as the proportions of different types of sex-linked loci,
their different allelic frequencies in the populations, the total amount
of sex-linked versus autosomal loci, the sex-chromosome-to-autosome
diversity ratio, and the level of recombination between sex chromosomes.
This highlights the necessity of searching for and carefully filtering
out sex-linked loci, because it would be hard to control for their
presence in other ways (e.g., by introducing sample sex ratio in
statistical models).
Despite the relatively small impact of the presence of sex-linked loci
on population Ho, there was a significant impact onindividual Ho that was large enough to erroneously indicate that
YTH females were 5% less heterozygous than males (Table 5). This
spurious significant difference could have mistakenly suggested that
females are philopatric (which is not true in cassidix ; Smales
2004) or that they experience less inbreeding depression for survival
(the reverse is true in cassidix ; Harrisson et al. 2019). If
these hypotheses were not known in advance to be incorrect, they might
have been accepted or at least further investigated; thus, poor
filtering of sex-linked loci can lead to incorrect ecological and
evolutionary inferences and wasted resources.
Our results also illustrated how the presence of sex-linked SNPs can
obscure population structure. The first PC on EYR data showed population
structure due to geographically separated groups. The second PC,
however, simply captured the genetic differences between sexes when
sex-linked markers were not removed, obscuring the fact that in reality,
the second largest source of genetic variation comes from within the
Muckleford population (Figure 6). This masking of population structure
has also been observed in the Discriminant Analysis of Principal
Components (DAPC) of two species of lobsters due to the presence of a
few sex-linked loci (Benestan et al. 2017). If not properly checked
against sex, the PC2 split in two could have been interpreted as, for
instance, the presence of two cryptic sympatric species. Researchers
studying populations with little genetic variation should be
particularly careful, because this effect is expected to be more
pronounced for populations with low genetic differentiation.
Importantly, we found that failing to remove sex-linked loci led to
~11% fewer correct parentage assignments (Table 6).
Such a substantial loss of correct assignments could have repercussions
for the management of endangered species. For example, releases of
captive-bred individuals or translocations/introductions are usually
done avoiding the release of close relatives in the same group in order
to maximize genetic diversity and discourage inbreeding (e.g.,cassidix , Harrisson et al. 2016; Frankham et al. 2017). Removing
sex-linked loci will be even more crucial in the absence of a set of
known parentages with which to calibrate parentage analyses as is likely
to apply to many species of conservation concern such as (i) those whose
breeding season cannot be monitored because it occurs in inaccessible
locations or because of lack of resources, (ii) polygamous and
cooperative-breeding species, (iii) those with external fertilisation
like amphibian and fish species (Nakamura 2009). Accounting for
sex-linked loci is also likely to have the largest impact on species
with large sex chromosomes (including neo-sex chromosomes, which have
been discovered in many taxa including EYR) because sex-linked loci will
represent a large proportion of the potential genomic markers for
parentage analysis (Sigeman et al. 2022; Beukeboom & Perrin 2014; Gan
et al. 201).
The functions we propose were created with the needs of conservation
genomicists and wildlife managers in mind. Sexing individuals is
especially important for species without sex dimorphism, or for
sexually-dimorphic species whose youngs’ sex is undistinguishable. With
the combination of the functions filter.sex.linked andinfer.sex we offer a formal statistical framework that
systematically identifies and uses sex-linked loci to make sex
assignments with as few as 15 known-sex individuals of each sex. Unlike
current practices, infer.sex was designed to use the
complementary information contained in all types of sex-linked loci
available, which makes the sex-assignments more robust. The use of all
types of sex-linked loci will be advantageous for low-density marker
datasets because it uses information that would otherwise be neglected,
and it facilitates development of SNP panels that include sex-specific
loci (Blåhed et al. 2018; Willis et al. 2020). It also allows for
error-checking and confirming congruence between genetic and phenotypic
sex of individuals, which may assist in detecting cases of environmental
sex-reversal (Stelkens & Wedekind 2010). The separation of sex-linked
loci can be used to validate the assembly of W and Y chromosomes, and to
study sex-specific processes (e.g., natural selection, philopatry).
Furthermore, it reduces the cost in time, genetic material and resources
of using other sexing methods (e.g., PCR amplification of CHD1-Z and
CHD1-W genes; Fridolfsson & Ellegren 1999).
The function filter.excess.het provides a statistically-backed
method to identify artefactual multilocus SNPs that show abnormally high
heterozygosity. The function circumvents the problem of choosing an
arbitrary heterozygosity threshold by, instead, testing loci whose
heterozygosity ≥ 0.5 and also
have significant excess of heterozygotes beyond sampling error. This has
the advantage of taking into account random sampling and genotyping
errors that affect loci differently. In fact, this approach is available
in VCFtools but not yet in dartR , snpR orSNPfiltR (Hohenlohe et al. 2011; Denecek et al. 2011; Mijangos et
al. 2022; Hemstrom & Jones 2022; DeRaad 2022). Nonetheless, we would
like to emphasize that this is not a Hardy-Weinberg equilibrium
filter (which requires critical thinking to be correctly applied and
interpreted; Waples 2015), and should be used only when looking to
obtain neutral autosomal loci (cf. looking for signatures of selection).
In conclusion, we demonstrated how incomplete removal of sex-linked loci
can bias conservation genomic inferences. We argue that comprehensively
removing sex-linked loci should be best practice when handling genomic
data, and we offer convenient easy-to-use resources to automate this and
other bioinformatic steps. The functions presented here can be
integrated into bioinformatic pipelines and widely used Rpackages such as dartR , sambaR , SNPfiltR andsnpR . By developing functions that can be easily adopted by
conservation biologists and incorporated in wildlife management
workflows, this study will contribute to a better understanding of the
processes occurring in threatened species, such as inbreeding,
inbreeding depression, population structure.