Sanger sequencing data analysis
Sanger sequence data were evaluated, edited and aligned using Geneious Prime 2021.2.2 (Biomatters Limited, Auckland, New Zealand) and 4Peaks sequence viewer (Nucleobytes B.V., Aalsmeer, the Netherlands). We compared sequence data to respective orthologs available in GenBank using BLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi).Treponema sequence data were analysed for positive gene selection following the tools and algorithm described by Maděránková et al (2019). Briefly, positively selected sites were determined from sequence alignments using: (i) a codon-based Site model implemented in EasyCodeML package (Gao et al., 2019) and/or (ii) a mixed effects model of evolution (MEME) using hypothesis testing approach via the Datamonkey webserver (Murrell et al., 2012; Weaver et al., 2018). For CodeML analysis, the phylogenetic trees were constructed using RAxML-NG tool (Kozlov et al., 2019). Phylogenetic trees and networks were constructed with IQ-TREE 2.0.7 (Minh et al., 2020), Mr. Bayes 3.2.7 (Ronquist et al., 2012) and the minimum spanning trees were inferred using MSTree V2 algorithm within GrapeTree program (Zhou et al., 2018).
Maximum-likelihood trees in IQ-TREE were constructed with 1,000 ultrafast bootstrap replicates (Hoang et al., 2018) and the best-fit model as obtained by IQ-Tree’s ModelFinder (Kalyaanamoorthy et al., 2017) according to the Bayesian Information Criterion (BIC). Tree reconstructions based on Bayesian inference in MrBayes were conducted with 1,000,000 generations with sampling every 100 generations and a burn-in of 25%. To check for convergence of all parameters and adequacy of the burn-in, we investigated the uncorrected potential scale reduction factor (PSRF) (Gelman and Rubin, 1992) as calculated by MrBayes. We used T. pallidum subsp. endemicum strain Iraq B (GenBank CP032303.1) as an outgroup to root the tree.