3.1.1 Analysis of outbreak strains characterized by WGS
Firstly, a peak matrix was constructed using MALDI-TOF MS spectra from
the strains previously analyzed by WGS (n=35) by applying the Threshold
method. The isolates were initially classified according to PFGE
results, where the strains were grouped as P1 (outbreak) and other
pulsotypes considered as unrelated strains. The cross-validation of this
approach (k=10) yielded 97.1% isolates correctly classified using
PLS-DA, RF and NCA-KNN algorithms and 88.6% with SVM (Table S2).
Besides, using the Biomarker selection method three potential biomarkers
were found at 5169, 6915 and 7236 m/z . This peak matrix correctly
classified all strains (100%) by internal k-fold validation (k=10) in
all prediction models tested (PLS-DA, SVM, RF and NCA-KNN). The
implementation of unsupervised algorithms also achieved optimal
separation of the two main categories (“outbreak” and “control”
strains) displaying two well defined clusters in PCA plot and HCA
dendrogram (Figure 2).
In a second step, MALDI-TOF MS spectra were further compared according
to WGS clustering, where the outbreak strains clustered by PFGE in the
pulsotype 1 (P1) were divided into 3 outbreak groups: Group 1,
considered the main outbreak strains, Group 2 and 3 (separated by
<125 SNPs from Group 1) and Controls (>5.000 SNPs
difference) -Figure 1-. Differentiation of what WGS considered the main
outbreak (Group 1) from the rest of the strains (“Controls”, “Group
2” and “Group 3”) was attempted in this step. For this purpose, a
peak matrix was created by applying the Threshold method and used as
input data to PLS-DA, SVM, RF and NCA-KNN algorithms. They obtained a
correct classification of 97.1% by SVM (C optimized hyperparameter:
0.01), 91.4% by PLS-DA, 88.5% by NCA-KNN (Neighbors optimized
hyperparameter: 3) and 85.7% by RF (Number of estimators optimized
hyperparameter: 100) (Table S3; Figure S2). Group 2 strains (n=2)
appeared closer to the outbreak strains than Group 3 and control strains
(Figure 2C), as it is closer to the Group 1 strains in number of SNPs
(50 SNPs).