3.1.2 Preprocessing
The preprocessing operation must be performed based on the defined
problem on the dataset. In this step, using statistical methods, the
input data is prepared in such a form that it is desirable for the
estimator in the modeling stage [81]. Data transformation is one of
the most common preprocessing operations. Various tools and techniques
can be implemented for data transformation based on the size,
complexity, and structure of the dataset. For instance, data
standardization is a transformation method in which the data of each
column are transmitted in a specified range (usually between zero and
one) [82]. Dimensionality reduction is another beneficial
preprocessing method in which data is transferred from a high-dimension
space to a low-dimension one preserving some key properties of the
original dataset. Principal component analysis (PCA) is the most widely
used dimensionality reduction approach that increases data
interpretability by constructing new uncorrelated variables summarizing
maximize variance [83]. Another prominent preprocessing operation is
feature selection which tremendously affects the performance of the ML
model. Feature selection is a method that manually or automatically
selects only a subset of features that are aligned with the target value
in the problem [84].