RESULTS
Identification and removal of sex-linked loci
The function filter.sex.linked identified and removed 3,807
sex-linked loci in EYR (10.7% of the total 35,663 loci tested; Table
3). Of these, 69.3% were identified based on differential call rate
between the sexes (i.e., W-linked and sex-biased; Figure 3a, b) and
30.7% based on differential heterozygosity between the sexes (i.e.,
Z-linked and gametologs; Figure 3c, d). For YTH, the function identified
3,414 sex-linked loci (4.6% of the total 74,470 loci tested; Table 3)
of which 65% were identified by call rate, and 35% by heterozygosity
(Figure S1).
Comparison of ‘before’ and ‘after’ datasets revealed that, when the
function filter.sex.linked was not used, 28.7% (n = 1,093) and
19.0% (n = 650) of the sex-linked loci remained in the final SNP
datasets of EYR and YTH, respectively. Standard locus-filters had
variable efficiency in removing different types of sex-linked loci
(Figure 4): together, read depth and loci missing data filters were
capable of removing all W-linked loci, and 90% and 99% of sex-biased
loci from EYR and YTH datasets, respectively. However, they were unable
to remove 75% and 57% of Z-linked loci (EYR: n = 620 were not removed;
YTH: n = 652), and 71% and 37% of gametologs (EYR: n = 241; YTH: n =
21). Other filtering steps such as removing individual missing data and
applying a minor allele count (MAC) had little effect on removing
additional sex-linked loci (Figure 4). This inefficiency translated in
7.8% and 5.7% of the final dataset SNPs being sex-linked in EYR and
YTH, respectively.
Impact of removing sex-linked loci on population genetic
diversity, individual heterozygosity, genetic structure and parentage
analyses
Population genetic diversity. In general, removal of sex-linked
loci produced a decrease in estimates of population genetic diversity
(Figure S2 and S3). However, the magnitude of this change varied with
different measures of genetic diversity and, importantly, magnitude and
direction of the change ranged across populations (Figure 5): the
largest impact was on F IS, which ranged from
9.3% decrease to 2% increase, and private alleles (PA), which ranged
from 8% decrease to 0.5% increase. Expected heterozygosity (He)
experienced decreases ranging from 0.7% to 2.4%. The direction and
magnitude of the change did not correspond to the F:M ratios of samples
(EYR: Crusoe = 0.87, Muckleford = 0.93, Timor = 0.79, Wombat = 0.39;
YTH: Cassidix = 0.94, Gippslandicus = 0.55, Melanops = 1.0, Meltoni =
0.1).
Individual observed heterozygosity (Ho). The removal of
sex-linked loci produced a statistically significant change in
individual Ho whose magnitude and direction varied between sexes and
species (Table 4). For EYR, the decrease in female and male Ho was
significant but small (F: 0.2% decrease, Cohen’s D = 0.35; M: 0.3%
decrease, Cohen’s D = 0.23). For YTH, the change was an order of
magnitude larger and went in opposite directions between the sexes:
female Ho increased 3.8% (p-value < 0.001, Cohen’s D = -8.7)
and male Ho decreased 2.9% (p-value < 0.001, Cohen’s D =
1.9). This opposite effect in male and female Ho translated into the
disappearance of the significant (but misleading) difference between
male and female Ho (p-value < 0.001) after the removal of
sex-linked loci from the YTH dataset (p-value = 0.1; Table 5). There
were no significant differences in Ho between the sexes in EYR before or
after removing sex-linked loci.
Genetic structure. Before the removal of sex-linked loci, PC1
explained 2.4% of the genetic variation in EYR, and divided the
individuals into two groups (Crusoe-Timor and Muckleford-Wombat; Figure
6a). PC2, on the other hand, explained 1.6% of variation and captured
genetic structure due to the presence of sex-linked loci: it divided the
individuals into males and females (Figure 6b). This division between
male and females disappeared from PC2 after removing sex-linked loci
(Figure 6c, d). For YTH, none of PC1, PC2, PC3 or PC4 showed sex genetic
structure, before or after using function filter.sex.linked(Figure S4).
Accuracy of parentage analyses. For EYR, before removing
sex-linked loci, an average of 3.83 runs out of five identified the
correct parent. After removing sex-linked loci, the average increased
significantly to 4.26 (p-value = 0.003; Table 6). We also found a
significant association between the removal of sex-linked loci and the
number of correct final parentage assignments (χ2 =
4.8, df = 1, p-value = 0.03): before removing sex-linked loci, 91 out of
119 (76.5%) final assignments were correct, compared to 104 (87.4%)
correct final assignments after removing sex-linked loci. For YTH
(cassidix ), we found that removing sex-linked loci did not
significantly change the average number of runs that correctly
identified parents, which started with the high average of 4.9 runs
(Table 6).
Minimum number of known-sex individuals forfilter.sex.linked function
For EYR, 24 known-sex individuals (12 females and 12 males) were the
minimum with which it was still possible to identify sex-linked loci:filter.sex.linked identified 267 loci which represented 7% of
the total sex-linked loci (Figure 7a, Table S1). For YTH, 30 known-sex
individuals (15 females and 15 males) were the minimum:filter.sex.linked identified 61 loci which represented 1.8% of
the total sex-linked loci in the full dataset (Figure 7b, Table S1).
With fewer known-sex individuals the function was unable to identify any
sex-linked loci.
For EYR, filter.sex.linked function identified, on average, only
7.2% (range = 6.6-7.9%) of all sex-linked loci for the five subsets of
24 known-sex individuals (91.5% of all W-linked loci, 0% of all
sex-biased, 0.1% of all Z-linked and 32.6% of all gametologs). For
YTH, filter.sex.linked function identified only 1.9% (range =
1.8-2.0%) of all sex-linked loci for the five subsets of 30 known-sex
individuals (99.3% of all W-linked loci, 0.1% of all sex-biased, 0%
of all Z-linked and 8.6% of all gametologs). These retrieved sex-linked
loci allowed infer.sex to correctly identify the sexes of all
individuals which it assigned as ‘M’ or ‘F’ (cf. marked as ‘*M’ or ‘*F’;
587 EYR and 519 YTH; the same individuals for the five sets). Using the
new 587 EYR and 519 YTH assignments to re-run filter.sex.linkedidentified 100% of all sex-linked loci for both EYR and YTH (3,807 and
3,414 sex-linked loci, respectively). It is likely that functionfilter.sex.linked was able to identify sex-linked loci with fewer
known-sex EYR individuals than YTH individuals because EYR has larger
sex chromosomes (i.e., it has neo-sex chromosomes in which a portion of
chromosome 1A got fused to the Z chromosome while the other portion got
fused to the W chromosome; Gan et al. 2019). We recommend the use of at
least 15 males and 15 females to allow the identification of all
sex-linked loci, although a larger number might be needed for species
with shorter, less differentiated or less variable sex chromosomes.