Identifying sex-linked loci using whole genome resequencing data for Macquarie perch
Whole genome resequencing data (WGS) was obtained for 100 known-sex Macquarie perch from Dartmouth and Yarra (25 individuals of each sex per population, all of which were also DArT-sequenced). DNA was extracted and an Illumina Novaseq library was constructed for each individual, as explained above. Libraries were sequenced at Deakin Genome Centre to at least 10X depth (>7 Gb of data per library; list of samples and individual read-depth coverage in Supplementary Material S1).
First, we searched the Macquarie perch genome for loci with sex-specific alleles (ii-iv above) using four in-silico pools: Dartmouth females, Dartmouth males, Yarra females, and Yarra males (25 individuals of each sex per population). Each pool combined 25 million paired-end reads randomly subsampled from each individual’s WGS data. Reads from the four pools were mapped onto the newly-assembled Macquarie perch reference genome (Table 1) using bwa-mem v0.7.17 (Li & Durbin, 2009) followed by filtering for mapping quality of >20 using samtools (Li et al., 2009) (samtools view -q 20). A Cochran-Mantel-Haenszel (CMH) test implemented in PoPoolation2(Kofler, Pandey, & Schlötterer, 2011) was used to detect significant and consistent allele-frequency differences (based on read counts) between sexes in two biological replicates. After examining the genomic spread of loci with sex-specific alleles detected under p<1e-5 significance threshold, we tested whether loci highly significantly (p<1e-20) differentiated between sexes clustered in the same genomic region.
One Macquarie perch scaffold contained a region enriched with loci highly differentiated between sexes, and using individual WGS genotypes we tested it for presence of XY-gametologs that could act as sex-determining loci. WGS genotypes were obtained by aligning the trimmed paired-end reads for each individual to the reference genome using bwa-mem v0.7.17-r1188, followed by variant-calling on the merged BAM files from all individuals, using strelka v2.9.10 (Kim et al., 2018). Individual alignments for the putative sexing region in males were visualized in Tablet (Milne et al., 2012) to investigate whether the mate-pair architecture of the sequenced libraries resolved female- and male-specific haplotypes (similar approach in Kijas et al., 2018). We then investigated read-depth coverage for each base of the sexing region, to test for sex-difference in ploidy. Read depth was collected for each individual using the CollectAllelicCounts function of GATK v4.1.0 (DePristo et al., 2011) from individual alignment files, normalized by that individual’s genome-wide average read depth (the number of all mapped reads multiplied by 151 bp (read length) and divided by genome size), and averaged across the four sex-by-population samples. We also calculated read depth for reference and alternative SNP alleles of the sexing region separately.