Visualization using Grad-CAM
Deep learning based on convolutional neural networks has achieved high accuracy in different areas of image recognition, such as image classification, object detection, and image segmentation. This high performance is obtained from processing substantial amounts of data with deep neural networks. However, the multilayer, nonlinear structure of deep learning causes difficulty in interpreting the model. This lack of interpretability is a major disadvantage of deep learning, and this technique is sometimes considered a “black box” method. In fact, in fields such as clinical medicine, the lack of interpretability in models is a barrier to the practical application of deep learning (Petch and Nelson 2022). Recently, several approaches have been developed to overcome these challenges. For example, Class Activation Mapping (CAM) provides a heatmap visualization of the regions that influenced the model’s predictions, which is valuable information for human interpretation of the results (Zhou et al. 2016). However, this method is not applicable when the Global Average Pooling (GAP) layer is absent, which means it depends on the network structure. Gradient-weighted class activation mapping (Grad-CAM), a generalized version of CAM unconstrained by model structure, has improved this problem (Selvaraju et al. 2017). The algorithm uses class-specific gradient information in the final convolutional layer of a CNN to visualize important regions. This study used Grad-CAM to show visual evidence in classifying native and hybrid species.