Phylogenomic studies in L. cidri
To initiate the study of the phylogeny and the genomic variations inL. cidri , we estimated the ploidy levels of the isolates. FACs
analysis revealed that all L. cidri isolates were haploids (Fig.
S2). Subsequently, we sequenced the complete genomes of 55 strains (30
from South America and 25 from Australia), and incorporated
previously-published data for the reference strain L. cidriCBS2950 (isolated from cider in France) and the L. fermentatistrain CBS707, which we used as an outgroup. On average, across the 30
South American genomes, we obtained 2,670 SNPs per strain relative to
the reference genome (SNPs were found on average every ~
3.8 kb between two strains). On the other hand, across the 25 Australian
genomes, we obtained on average 36 SNPs per strain relative to the
reference genome (SNPs were found on average every ~ 282
kb), indicating apparent differences compared to the South American
group of strains. In parallel, we found different numbers of insertions
and deletions depending on the strain relative to the reference genome,
ranging from 79 in the Australian strains to 124 in the South American
strains (the Bioinformatic Summary statistics are shown in Table S6).
Interestingly, the high number of INDELs are unique to the reference
strain, rather than a general trend between any two strains (Table S6).
The phylogeographic result with maximum-likelihood phylogeny revealed a
topology with two well-supported main clades separating South American
and Australian strains (Fig. 1b). Australian strains clustered into a
single clade (hereafter referred to as Aus), together with the European
reference strain L. cidri CBS2950. This clustering of Australian
and European strains, together with the low number of polymorphisms
found between them (Table S7), suggest a recent migration event between
both regions. In contrast, the South American strains were separated in
a more complex clade distribution, with substantial differences in
branch lengths, as well as different subclades with phylogeographic
structures (Fig. 1b). The South American clade (hereafter referred to as
SoAm) harbored 15,706 unique SNPs compared to the reference strain
(Table S7). Interestingly, the two isolates from Altos de Lircay
National Park (AL) (the northernmost locality) (Fig. 1a) cluster
together, and harbor the greatest genetic divergence within the SoAm
group (Fig 1b). Overall, these results suggest a broader genetic
diversity in SoAm compared to Aus lineages, where the AL branch showed
the highest genetic divergence in the latitudinal gradient (Fig. 1b).