2.4 Self-Simulated Learning Artificial Intelligence (SSLAI)
Analysis
A classifier algorithm used to compare machine learning (ML) methods for
classification was developed in Python 3 using the library of sklearn
3.2. [50] Ten ML methods for classification were
employed: K-Neighbors Classifier, Support Vector Machine (SVM),
Nu-Support Vector Classification (NuSVC), Decision Tree Classifier,
Random Forest Classifier, AdaBoost Classifier, Gradient Boosting
Classifier, Gaussian Naive Bayes, Linear Discriminant Analysis, and
Quadratic Discriminant Analysis. A basic description of each ML method
used is included in Table 1 .
The most efficient ML method for the prediction of the data was selected
by the algorithm, considering the highest accuracy value and the lowest
Log Loss value (defined later). To accomplish this, it was necessary to
provide the algorithm with the simulated training data and the real test
data. The algorithm trained each of the ten methods, calculating the
accuracy and the log-loss, and comparing the results.
The package of libraries in scikit-learn [50]provides a set of open-source software of efficient AI techniques for
the Python programming language. They are accessible to non-experts of
ML and apply to various scientific disciplines. The calibration model
was developed with simulated training data. The simulated training data
are linear combinations of each of the spectra of neat HE with each of
the spectra of neat soil considering their compensation of intensities
in percentages. External evaluation of the predictions was accomplished
using HE/soil mixes. These data were not used in the calibration model.
A schematic representation of SSLAI analysis is shown in Fig.
1 .
The parameters that were used to evaluate the performance of the
classification model developed were: recall, Log Loss, precision,
f1-score, weighted average, support, and accuracy. In binary
classification, recall of the positive class is also known as
“sensitivity,” and recall of the negative class is “specificity.”
The Log Loss function is used in (multinomial) logistic regression and
extensions of it, such as Neural Networks, defined as the negative
log-likelihood of the true labels given a probabilistic classifier’s
predictions. The Log Loss is only defined for two or more labels. For a
single sample with true label yt in {0,1} and estimated probability yp
that yt = 1, the Log Loss is (Eq. (1) :
-log P(yt|yp) = -(yt) log(yp) + (1 - yt) log(1 - yp)). (1)