ABSTRACT
Identifying sex-linked markers in genomic datasets is important, because
their analyses can reveal sex-specific biology, and their presence in
supposedly neutral autosomal datasets can result in incorrect estimates
of genetic diversity, population structure and parentage. But detecting
sex-linked loci can be challenging, and available scripts neglect some
categories of sex-linked variation. Here, we present new Rfunctions to (1) identify and separate sex-linked loci in ZW and XY sex
determination systems and (2) infer the genetic sex of individuals based
on these loci. Two additional functions are presented, to (3) remove
loci with artefactually high heterozygosity, and (4) produce input files
for parentage analysis. We test these functions on genomic data for two
sexually-monomorphic bird species, including one with a neo-sex
chromosome system, by comparing biological inferences made before and
after removing sex-linked loci using our function. We found that
standard filters, such as low read depth and call rate, failed to remove
up to 28.7% of sex-linked loci. This led to (i) overestimation of
population F IS by ≤ 9%, and the number of
private alleles by ≤ 8%; (ii) wrongly inferring significant
sex-differences in heterozygosity, (iii) obscuring genetic population
structure, and (iv) inferring ~11% fewer correct
parentages. We discuss how failure to remove sex-linked markers can lead
to incorrect biological inferences (e.g., sex-biased dispersal and
cryptic population structure) and misleading management recommendations.
For reduced-representation datasets with at least 15 known-sex
individuals of each sex, our functions offer convenient, easy-to-use
resources to avoid this, and to sex the remaining individuals.