FIGURE 1 LeNet-5 Architecture. Source: Adapted from
Filter, also known as weight or kernel, transforms the input space into another space or output. Regardless of the architecture types and objective functions, filters are the core element in the whole ”learning” process. The size of the filter is often considered a hyperparameter. In MLP, it is evident that the filter matrix depends on the number of neurons of current and previous layers. The CNN typically has images as inputs, and the algorithm breaks them into smaller regions called receptive fields (overlapped or non-overlapped), which are 2-dimensional (2D). Due to the nature of the convolution operation, the receptive fields and filter(s) must be of the same size and shape (2D), specifically in supervised learning. The logic is the same as the fundamental algebraic concept
\(Outcome(Y)=Weight(W)*input(X)+bias(b)\) (1)
For a nonlinear neural network, an activation function is implemented over the outputs.
\(\text{Outcome}\left(Y\right)=Activation\ [Weight\left(W\right)*input\left(X\right)+bias\left(b\right)]\)(2)
The purpose of the activation function is to control the flow of information depending on its behavior. Nonlinear activation functions add non-linearity, often claimed as biologically plausible or implausible. A deeper model up to a certain level is believed necessary for learning enough features depending on data complexity. Theoretically, multiple cascaded linear models can be replaced by a single linear layer, but the logic is disputed for computer vision architecture. The most widely used structures are cascaded layers with nonlinear activation functions. Bias term is also noted to have a crucial role in firing neurons. Filters have some key parameters:
  1. filter size and shape
  2. Number of filters per layer
  3. Learning algorithm
The research on these critical parameters focuses on the learning technic, classifying them into supervised, semi-supervised, self-supervised, and unsupervised categories. Moreover, The NN models in pattern recognition started from a multi-layer perceptron (MLP) (also known as fully connected NN). Later, convolution NN took over the computer vision field with promising results.
The contribution of this study can be listed as follows
Though many studies have been conducted on different architectures focusing on specific applications, learning types, or arbitrary groups, filter initialization and designing are often least discussed and studied. The filter initialization impacts the algorithm’s convergence more than the actual learning algorithm. The current study comprises the most promising computer vision architectures regarding filter designing arguments and discussion. The parameters of filters, the filters’ size and numbers of filters in different layers in DCNN architecture, training algorithm, training sample size, layers, and many other factors are aimed to get understood with some recent research in supervised and unsupervised learning approaches in DCNN.
The rest of the paper is organized as follows. Section 2 focuses on filter initialization, its importance, and various technics. The main part of the paper is Section 3, and 4 comprises the arguments for specific selections of filter sizes and the number of filters throughout the network in prominent supervised and unsupervised methods, respectively. Section 5 summarizes the paper’s findings, and the conclusion is presented in section 6.