A relatively recent study proposed SOM-based multi-layer architecture named convolutional SOM (CSOM). The method has convolutional layers placed between SOM layers. The novelty was the convolutional-based feature learning by the SOM. The SOM map was used as a filter for learning, followed by a convolution layer with that learned filter. Figure 11 shows a snapshot of filters generated by SOM. The structure was tested with two types of pooling layers: Traditional max-pooling and SOM-based pooling. The latter approach performed better with the feature map with learned SOM maps (filter). The winner neurons were chosen using Euclidean distance. The SOM map size was chosen 8x8 for input images of size 256x256. However, no specific reason was mentioned for choosing a specific map size.
FIGURE 11 Filter learned by SOM in the convolutional SOM (CSOM) method
Alexander and Ayanava proposed a biologically plausible architecture named Resilient self-organizing Tissue (ReST) that can be executed as a typical CNN. The continuous energy function of SOM was the core of the study. It was noted that traditional SOM suffers from optimum convergence state and model parameter values, while an energy function can provide a simple quality measure. With continuous energy function, Stochastic gradient descent (SGD) can be extended to SOM learning in deep learning. Unlike the traditional SOM, the learning rate over time was kept constant. The Map size KxK was treated as a hyperparameter for K ∈ { 10, 15,20, 30, 50} and chosen 10x10 for varying input batch size of NxN, where N ∈ {1, 5, 10, 20, 50, 100}. A larger map and batch size would significantly increase the training time.
In a similar approach to CSOM, two more architectures were proposed, named Deep SOM (DSOM) and extended DSOM (E-DSOM). The block diagrams of both architectures are shown in Figure 12. In DSOM, each activation space gives a winner on a SOM Map (filter) during the SOM phase. The next layer is the sampling phase, in which the feature map is formed. Each node in the feature map is a BMU from an activation space and is stored in the relevant space of the activation space. The E-DSOM has multiple DSOM architectures running in parallel, and the feature maps are combined at the end. In two-layered architecture, the output layer gives a single SOM map and is used as an input to a classification method. The map size varied from 4-24 for the first layer and 14-16 for the second layer using MNIST, GSAD, and SP-HAR datasets. The E-DSOM outperformed DSOM with a classification accuracy of up to 15% with time-saving by 19%. The downside is the requirement for more computational power for parallel architecture.