Introduction
Population genetics is a robust, cost- and time-efficient framework to predict, understand and infer the ecology and evolution of species (Ewens 2004, Ellegren & Galtier 2016). This paradigm at the center of biological evolution theory has stood the test of time to predict and track the ancestral relatedness between individuals at the scale of studied populations (Wakeley 2005). Using changes of genetic variations over time and space, population genetic models allow quantifying evolutionary forces in populations and interpreting them as hypothesized biological and environmental influences on lineages (Ellegren & Galtier 2016). Among all the possible biological features driving evolution, reproductive mode is one of the most significant evolutionary force impacting the dynamics of genetic diversity and its structure among populations as it determines the transmission of the hereditary DNA signal over generations (Duminil et al. 2007). In return, analysing the genetic diversity within populations allows inferring their reproductive modes, providing a precious knowledge to predict and understand their ecological and biological evolution. It also helps better targeting ecological scenarios and more robust inferences of other evolutionary forces (Fehrer 2010, Yu et al. 2016, Stoeckel et al. 2021). However, to date and despite nearly one century of research, population genetic models and tools were mostly developed for sexual, diploid species (Orive & Krueger-Hadfield 2021, Dufresne et al. 2014).
Eukaryotes with more than two sets of homologous chromosomes (autopolyploids) or duplicated genomic segments are very common in ferns, flowering plant and fungi species (Barker et al. 2015, Albertin & Marullo 2012, Wood et al. 2009). Polyploidy seems less frequent in animals albeit significant in a handful of clades such as in fishes, cnidarians, amphibians and reptiles (Gregory & Mable 2005, Mableet al. 2011, Boots et al. 2023). It also occurs in some species only for some chromosomes (aneuploidy), like commonly observed in partially clonal parasitic protozoa (Tibayrenc & Ayala 2013, Rougeron et al. 2015).
Polyploidization influences genetic and phenotypic diversity including potential ecological adaptations and radiations, with a long-term dynamic from whole genome duplication to re-diploidization (Baduel et al. 2018, Wu et al. 2019). Interestingly, polyploidy strongly co-occurs with reproductive modes involving partial clonality, both in natural and experimental populations (Herben et al. 2017; Van Drunen & Husband 2019). It also seems to be an influential complementary factor to the more classical Baker’s hypothesis of the advantage of uniparental reproductive mode, including selfing and clonality, when peripatric populations establish in new areas (Pandit et al. 2011, Barrett 2018, Rutland et al. 2021). Studying the reciprocal influences of reproductive modes on the ecology and evolution of populations is now usual in diploid populations using their genetic diversity, favoured by a wide range of tools adapted to analyse their genetic diversity like Genclone (Arnaud-Haond & Belkhir 2007), RMES (David et al. 2007) and Rclone (Bailleul et al. 2015). However, it is less common in polyploid populations. The lack of adapted and easily accessible analysis solution leads previous studies to consider such datasets as haplotypes or analyse them as diploid.
Indeed, population genetic studies of polyploid organisms were long limited by two main difficulties (Dufresne et al. 2014, Jighly et al. 2018). First, accessing robust genotyping in such populations has long been a true challenge due to the problematic allele dosage in individuals. For example, it was methodologically impractical to distinguish between AABB , ABBB and AAAB individuals at a tetraploid genetic marker with two alleles, A and B, without assuming hypotheses difficult to verify (Dufresne et al. 2014, Bourke et al. 2019). Allele dosage difficulties intensify with increasing ploidy and number of possible alleles at the considered genetic marker, as the number of combinations of alleles determining the number of possible genotypes itself increases. However, recent advances in genotyping methods exploiting deep sequencing with low errors rates combined to individuals and marker tags unlocked the possibility to genotype polyploid individuals with confident allele dosage, even in species with large sets of chromosomes (Delord et al. 2018). These methods benefit both from the advances made on the sequencing process itself that decrease sequencing errors and from the development of upstream molecular processing of genetic samples to tag and target very-specific genomic regions. These processings increase the sequencing depth of the genotyped marker and allow reproducible replicates. It is now easier to access for a limited cost to more than 20 to hundreds of replicated sequences per SNP or microsatellite allele within each individual in a pool of individuals using genotype-by-sequence method. For example, Hiplex genotyping method allows genotyping ~500 individuals at 100 SNPs using one sequencing run (e.g. , MiSeq 2x150 Heflin), with a sequencing depth of ~50 sequences per allele in tetraploids and ~33 sequences per allele in hexaploids, resulting in genotype assignations with a confidence superior to 99% (Delord et al. 2018, Besnard et al. 2023).
Second, we also long lacked of adapted models and analysis methods to compute population genetic indices and quantify evolutionary forces in polyploid populations (Dufresne et al. 2014), especially considering that partially clonal and selfed populations can result in repeated genotypes (i.e., the same multi-locus genotype found in different samples, Arnaud-Haond et al. 2007) or patterns of high probabilities of identity between genotypes (David et al. 2007; Jullien et al, 2019). Due to challenges introduced by data formats and difficulties in generalizing the mathematical formula of population genetic indices (Ewens 2004), common population genetics softwares, such as Genalex (Peakall & Smouse 2012) and GenClone (Arnaud-Haond & Belkhir 2007) are not designed to work with partially clonal populations with more than two allelic copies per gene (Excoffier & Heckel 2006). A handful of library and software emerged in the last years, like the command-line Spagedi (Hardy & Vekemans 2002), the more user-friendly recent and multiplateforme Polygene (Huang et al. 2020) or Genodive (Meirmans & Tienderen 2004) a software restricted to MacOS X operating system. However, all these programs do not compute all the population genetic indices used to understand and interprete all reproductive modes, including selfing and clonality in populations, such as indices based on genotypic diversity. Polygene for example cannot handle repeated genotypes that can be commonly observed in partially clonal populations. Polysat (Clarck & Jasieniuk 2011) cannot currently deal with data with confident allele dosage, which becomes a standard with massive sequencing & tagging methods. Some R librairies like Poppr (Kamvar et al. 2014), RClone and Polysat, and command-line solutions like Spagedi may help analysing genotypes of polyploid populations with different modes of reproduction, but they require an exhaustive exploration of their documentation and some trainings in scripting languages to use them. During practical courses, they involve a preliminary introduction about scripting or on the reasons for using some options over another, complicating teaching population genetics for polyploid species by dispersing the topic in technical considerations.