Effective sample size and population assignment
To assess our ability to accurately assign individuals of unknown origin
to breeding populations, we first determined the accuracy of assignment
of the known breeding origin individuals using a leave-one-out approach
implemented in WGSassign (DeSaix et al. in review ). Leave-one-out
avoids assignment bias by iteratively removing an individual from their
given source population, re-estimating the allele frequency of the
source population, and then calculating the likelihood of the
individual’s assignment to each population. Another source of bias in
assignment tests is variation in the precision of allele frequency
estimation, which arises from populations having different numbers of
samples and/or having differences in sequencing depth of their
individuals. To mitigate this bias, we tested two other approaches for
source population sampling design: 1) we reduced the number of samples
per breeding population to be the same as the population with the fewest
samples (size-standardized breeding populations; SSBPs) and 2) we
followed the guidelines in DeSaix et al. (in review ) to
standardize the effective sample sizes of the breeding populations
(effective-size-standardized breeding populations; ESSBPs).Effective sample size is a Fisher information metric that
determines the comparable number of individuals with known genotypes
that would reflect the same variance in estimated allele frequency as
the sampled low-coverage individuals (DeSaix et al., in review ).
The purpose of ESSBPs is to equalize the effective sample size among
populations by removing individuals from the populations with the
highest effect sample sizes, thereby making the precision of allele
frequency estimation similar among the different populations. We used
WGSassign to calculate each breeding population’s effective sample size
for the SSBPs and ESSBPs and performed leave-one-out assignment. We also
performed standard assignment with all breeding individuals not in the
standardized sets. Leave-one-out assignment for the full data set and
the combined leave-one-out assignment and standard assignment accuracy
were compared across all three source population sampling designs.
Posterior probabilities of assignment to a population were determined by
dividing the maximum likelihood of assignment over the sum of all
likelihoods. A cut-off of 0.8 was used for the posterior probability to
determine if an individual was confidently assigned to a population.