3.1.2 Preprocessing
The preprocessing operation must be performed based on the defined problem on the dataset. In this step, using statistical methods, the input data is prepared in such a form that it is desirable for the estimator in the modeling stage [81]. Data transformation is one of the most common preprocessing operations. Various tools and techniques can be implemented for data transformation based on the size, complexity, and structure of the dataset. For instance, data standardization is a transformation method in which the data of each column are transmitted in a specified range (usually between zero and one) [82]. Dimensionality reduction is another beneficial preprocessing method in which data is transferred from a high-dimension space to a low-dimension one preserving some key properties of the original dataset. Principal component analysis (PCA) is the most widely used dimensionality reduction approach that increases data interpretability by constructing new uncorrelated variables summarizing maximize variance [83]. Another prominent preprocessing operation is feature selection which tremendously affects the performance of the ML model. Feature selection is a method that manually or automatically selects only a subset of features that are aligned with the target value in the problem [84].