FIGURE 1 LeNet-5 Architecture. Source: Adapted from
Filter, also known as weight or kernel, transforms the input space into
another space or output. Regardless of the architecture types and
objective functions, filters are the core element in the whole
”learning” process. The size of the filter is often considered a
hyperparameter. In MLP, it is evident that the filter matrix depends on
the number of neurons of current and previous layers. The CNN typically
has images as inputs, and the algorithm breaks them into smaller regions
called receptive fields (overlapped or non-overlapped), which are
2-dimensional (2D). Due to the nature of the convolution operation, the
receptive fields and filter(s) must be of the same size and shape (2D),
specifically in supervised learning. The logic is the same as the
fundamental algebraic concept
\(Outcome(Y)=Weight(W)*input(X)+bias(b)\) (1)
For a nonlinear neural network, an activation function is implemented
over the outputs.
\(\text{Outcome}\left(Y\right)=Activation\ [Weight\left(W\right)*input\left(X\right)+bias\left(b\right)]\)(2)
The purpose of the activation function is to control the flow of
information depending on its behavior. Nonlinear activation functions
add non-linearity, often claimed as biologically plausible or
implausible. A deeper model up to a certain level is believed necessary
for learning enough features depending on data complexity.
Theoretically, multiple cascaded linear models can be replaced by a
single linear layer, but the logic is disputed for computer vision
architecture. The most widely used structures are cascaded layers with
nonlinear activation functions. Bias term is also noted to have a
crucial role in firing neurons. Filters have some key parameters:
- filter size and shape
- Number of filters per layer
- Learning algorithm
The research on these critical parameters focuses on the learning
technic, classifying them into supervised, semi-supervised,
self-supervised, and unsupervised categories. Moreover, The NN models in
pattern recognition started from a multi-layer perceptron (MLP) (also
known as fully connected NN). Later, convolution NN took over the
computer vision field with promising results.
The contribution of this study can be listed as follows
Though many studies have been conducted on different architectures
focusing on specific applications, learning types, or arbitrary
groups, filter initialization and designing are often least discussed
and studied. The filter initialization impacts the algorithm’s
convergence more than the actual learning algorithm. The current study
comprises the most promising computer vision architectures regarding
filter designing arguments and discussion. The parameters of filters,
the filters’ size and numbers of filters in different layers in DCNN
architecture, training algorithm, training sample size, layers, and
many other factors are aimed to get understood with some recent
research in supervised and unsupervised learning approaches in DCNN.
The rest of the paper is organized as follows. Section 2 focuses on
filter initialization, its importance, and various technics. The main
part of the paper is Section 3, and 4 comprises the arguments for
specific selections of filter sizes and the number of filters throughout
the network in prominent supervised and unsupervised methods,
respectively. Section 5 summarizes the paper’s findings, and the
conclusion is presented in section 6.