1 | INTRODUCTION
The convolution neural Network (CNN) has become the pioneer method for
applying the neural network concept to computer vision tasks. The
advancement of the basic CNN-LeNet-5 (5 layers) model to more profound
and complex models like AlexNet (8 layers), VGG (11-19 layers), ResNet
(152 layers), GoogleNet (22 layers) have achieved superior efficiency in
real-life applications. Such models are called deep CNN (DCNN), the
combinations of deep learning structure and CNN. Even though DCNN is
based on specific mathematical models, due to its more profound and
complex design and nonlinear activation functions, the fundamental
insight remains as a Blackbox.
The fundamental constituents of DCNN are filters, activation functions,
and classifiers. The structure is divided into two major sections:
feature extraction and classification. Filters are often known as
weights that do the actual magic when it comes to learning, and it is
very critical to understand the exact end-to-end processing in them. In
all the layers of DCNN, different filters perform convolution operations
with the input to that layer for feature extraction. Activation
functions control the dynamics of information flow from one layer to the
next. A standard ML algorithm classifier categorizes the classes based
on learned features through convolution. It is more suitable to say that
the primary design attributes of filters are unknown, along with chosen
training algorithm and activation functions. The interrelation of filter
hyper-parameters, training algorithms, and activation functions lack
concrete theoretical background for DCNN.