Shared and private SNPs across datasets
Virtually all the SNPs in the P1+P2 dataset for each genus are shared
with either the corresponding P1 or P2 dataset (Figure 4). However, the
much larger P1 and P2 datasets are composed primarily (60-90%) of
private SNPs that are not present in the P1+P2 dataset. To investigate
the hypothesis that these private SNPs are enriched with homeo-SNPs, we
compared SAM files resulting from alignment of unique reads to P1, P2,
and P1+P2 genomes, and associated each SNP with its underlying reads in
the SAM files based on position. A multi-mapping index was defined for
each SNP as the proportion of underlying unique reads that mapped to
both parental genomes. Since homeo-SNPs arise from multi-mapped reads,
we hypothesized that the reads underlying private SNPs would display a
higher incidence of multi-mapping. Indeed, the mean multi-mapping index
was ~2X higher for private SNPs than for shared SNPs for
both parents of both genera (Figure S1), suggesting that private SNPs
are enriched with homeo-SNPs.