Calling workflow comparisons
Our final GT-seq dataset, called using the published GT-seq pipeline
(Campbell et al., 2015), included 325 autosomal SNPs and 2 sex-linked
markers. An additional 3 loci were removed from the BCFTOOLS workflow
datasets after filtering for minimum depth (depth 6, 10), leaving each
with 322 autosomal loci and 2 sex-linked markers. We removed the same 3
loci from our GT-seq pipeline dataset to enable direct comparison of
genotypes and missing data by locus across the calling methods. Based on
all 457 samples, there was an average of 25.4% missing data for the
GT-seq calling pipeline, whereas missing data were 23.9% and 21.3% for
BCF-10 and BCF-6 calling workflows, respectively. Regardless of BCFTOOLS
calling workflow, genotype mismatch with the GT-seq workflow was 1.1%
on average. Based on these results and the potential for easy comparison
with existing ddRADseq data, we chose to use the dataset generated from
the BCF-6 calling workflow to assess genotyping error and analyze
population structure.