3.1.4 Validation and evaluation
Model evaluation is first conducted on the training dataset to measure how well the model can predict previously unseen data [87]. For instance, k-fold cross-validation is a standard method that evaluates the estimator’s performance by randomly splitting the training dataset into training and test sets. For better understanding, in this method, the training dataset is divided randomly into k equal parts, called k folds. Then, the model runs k times, and in each round, one particular fold is used as the test set and k-1 folds as training ones [88]. The accuracy of the model is computed for each test fold. When k-fold cross-validation is operated, one can see how sensitive the model is to the training dataset. Next, by evaluating the accuracy, the model’s parameters can be tuned to improve its generalization performance for unseen data [89, 90]. The final evaluation of the model is conducted with test data, and the predictive ability of the model is quantified by different evaluation metrics such as accuracy, sensitivity, specificity, precision, and recall [91, 92]. Figure 4 shows how evaluation methods can help to improve the model’s performance.