FIGURE 12 DSOM (left) and E-DSOM (Right). SOMs act as filters
When traditional SOM is used as a filter and convolved on the input image, it was observed that the SOM tries to fit the dataset best which may lead to poor performance. To overcome this issue, the Hebbian learning-based masking layer was multiplied with the input patches before convolution with SOM maps (filter). The two-layered architecture used 10x10x1 and 16x16x1 sized SOM maps for the first and second layers, respectively. In three-layered architecture, the layer-wise SOM map sizes were selected as 12x12x3, 14x14x3, and 16x16x3. On a trained SOM map, a three-layer MLP classifier was used. It was also observed that more than one filter is required to learn enough distinct features. Also, multiple smaller maps are better than fewer large maps. Valued-SOM (VSOM) was proposed as an improvised version of DSOM with the introduction of the Lethe term to each output neuron. The purpose of this mechanism is to introduce a supervision mechanism for self-labeling to the clusters created by DSOM. However, the filter design was borrowed from the original DSOM. Kosmas et al. proposed Dendritic-S method, which uses SOM as feature extraction filters followed by a hit matrix for labeling. The cosine similarity was applied and compared with Euclidian distance in determining the BMU, and accuracy was improved by nearly 20% in the experiments.
SOM maps are known to have 1D and 2D maps, but a 3D SOM map as a filter was introduced in a model named deep convolutional self-organizing map (DCSOM). A total of 256 nodes were applied in various dimensions ranging from 1D to 6D. The 4D(4x4x4x4) was noted as the optimum map size balancing performance and complexity. The map dimensions higher than four resulted in overfitting. The input patch size of 5x5 was found optimal from 3x3 to 15x15. The other focus of the research was the radius of the neighborhood of the SOM map and the batch learning technic. The two-layered convolutional model was followed by a block-wise histogram for feature representation. A computationally effective, Unsupervised-DSOM (UDSOM) method was proposed in which the SOM maps were passed through ReLU activation after the learning phase. The aim is to remove the neurons from the maps that never get activated or become BMU, which could lead to fewer connections. The map size was chosen in the four-layered model as 10x10, 8x8, 6x6, and 4x4, respectively. The smaller-sized filters performed better for higher-level features. A faster version of UDSOM was proposed as G-UDSOM, which performed parallel processing of locating BMU for patches over different maps. However, the UDSOM was used as the backbone architecture, and there is no modification to the filter sizes (map).
4.2.3 | Sub-space learning (SSL)
The implementation of sub-space learning in object identification has gained much attention recently. The core of those proposed architectures is principal component analysis (PCA) which comes under unsupervised learning and is mainly used for dimensionality reduction.
The subspace approximation and kernel augmentation (Saak) is an earlier proposed SSL-based algorithm Saak is a one-pass feed-forward method that was proposed as a solution to the limitations of the older RECOS (REctified-COrrelations on a Sphere) method. The backpropagation and nonlinear activation functions ReLU in the RECOS method cause approximation and rectification losses, respectively. The structural insight of the Saak method is shown in Figure 13. Filters are generated in Saak using the subspace approximation with second-order statistics and orthonormal eigenvectors of the covariance matrix. The filters are based on truncated Karhunen-Loeve transform (KLT) or PCA, which are the unit eigenvectors of the data covariance matrix. Such filters are automated and generated based on the dataset, and their size can be varied. Saak has non-overlapping convolutional operations with filters and patches of size 2x2. ReLU is a widely used activation function where the negative inputs are truncated to zero, resulting in rectification loss. In the Saak, all the kernels are augmented with their opposite counterpart. When the original kernels and their augmented parts pass through ReLU, the positive sides survive, which can be either original or augmented. This mechanism would result in no rectification loss. Saak is an entirely new methodology for better interpretability of deep networks. However, it has some limitations because of the increased computation implicated by kernel augmentation.
Jay Kuo at el. proposed an improved version of Saak named Saab (Subspace approximation with adjusted Bias). Saab, a variant of PCA, has convolutional layers followed by MLP. Typically, the bias term varies for all layers. However, Saab was set as a constant equivalent to the most negative value of the input vector, making the nonlinear activation function redundant. The convolutional filters were obtained from the covariance matrix of bias-removed spatial-spectral cuboids and were chosen of size 5x5. The convolutional filters generated by PCA capture a large amount of energy, but the capacity decreases with more significant indices. The higher the cross entropy, the lower the discriminant power. After convolutional layers, the MLP was used for labeling, and parameters were calculated using linear least-squares regression (LSR). It was claimed as a novel approach to self-labeling.