1.1 Function filter.sex.linked
Purpose: Detecting and filtering out sex-linked loci.
Input: One genlight object with at least 30 individuals of known sex (15 of each sex; see Results section 3), and a user-specified parameter declaring the sex-determination system of the species (‘zw’ or ‘xy’). Known sex is provided in ‘ind.metrics’ with a column named ‘sex’ and individuals assigned ‘F’ (females) or ‘M’ (males). Individuals with unknown sex (i.e., assigned anything other than ‘F’ or ‘M’) are ignored by the function.
How it works: The rationale behind this function is that the scoring rate and heterozygosity of autosomal loci should not differ between the sexes, but they do differ for sex-linked loci. Based on this, the function works in two phases:
Phase I. Use locus call rate to identify W-linked/Y-linked loci and other loci with sex-biased call rates. The function counts, for each locus, the number of known females and the number of known males with NA (i.e., missing data) and with a called genotype (i.e., ‘0’, ‘1’ or ‘2’). These four counts are used to build a 2 × 2 contingency table per locus on which a Fisher’s exact test is performed in order to test for the independence of call rate and sex (α = 0.05). The logic is that autosomal loci should present roughly the same call rate for males and females (Figure 2a, diagonal cloud in gray), and therefore, a locus in which one sex has significantly more missing data than the other is likely to be sex-linked. The p-values of all loci are adjusted for False Discovery Rate with R function p.adjust (Benjamini & Hochberg, 1995). Of the loci with adjusted p < 0.05, those whose male call rate is ≤ 0.1 are assigned as W-linked (because males lack a W chromosome; Figure 2a, in yellow), or as Y-linked if female call rate is ≤ 0.1 (because females lack a Y chromosome). Remaining loci with adjusted p < 0.05 are identified as ‘sex-biased’ (Figure 2a, in blue).Phase II. Use locus heterozygosity to identify Z-linked/X-linked loci and gametologs. The function counts, for each locus, the number of known females and the number of known males that are heterozygous (i.e., ‘1’), and homozygous (i.e., ‘0’ or ‘2’). In the same way as forPhase I, these four counts are used to build a 2 × 2 contingency table per locus and to perform a Fisher’s exact test to test for the independence of heterozygosity and sex (α = 0.05). Under the logic that autosomal loci should present no difference in proportion of heterozygous individuals between sexes (Figure 2b, diagonal cloud in dark gray), a locus in which one sex has significantly more heterozygous individuals than the other is likely to be sex-linked. P-values are adjusted for False Discovery Rate with R function p.adjust (Benjamini & Hochberg, 1995). Of the loci with adjusted p < 0.05, those whose proportion of heterozygous males is greater than the proportion of heterozygous females are identified as Z-linked (because females have only one Z chromosome, and should be mainly scored as homozygous; Figure 2b, in orange). On the other hand, loci whose proportion of heterozygous females is larger than the proportion of heterozygous males are identified as gametologs (because males have two Z chromosomes, and thus should present only the Z-associated allele and be scored as homozygous; Figure 2b, in green). The same logic, with reversed expectations for sexes, is applied to XY-sex determination system (X-linked: proportion of heterozygous females > proportion of heterozygous males; gametologs: proportion of heterozygous males > proportion of heterozygous females).
The loci that are not identified as belonging to any category of sex-linkage are inferred autosomal. The function finishes by splitting each category of loci into its own genlight object.