GNN
A GNN is a connection model that captures the dependency between graph nodes by connecting the edges of nodes13,14,15, which can be roughly divided into graph convolutional networks16,17 and graph attention networks18,19. In 2019, Yao et al . proposed text graph convolutional network (TextGCN)4, which applies the GNN to the text classification task for the first time. TextGCN first constructs a symmetric adjacency matrix based on the given diagram, then fuses the representation of each node with its neighbors through convolution operations, and finally sends the representation of the node to the softmax layer for classification. However, TextGCN assigns the same weight to each node, which is inconsistent with the actual contribution of each node to the final classification. To solve this problem, Petar et al . proposed graph attention network (GAT)18, which uses masked self-attention methods to assign different weights to each node according to the characteristics of adjacent nodes. The problem of using graph convolutional network for text classification is solved. Only the text information is considered when constructing the graph for the pre-ordering work, but the heterogeneous information such as text labels is ignored. In 2020, Xinet al . established a GNN based on label fusion20. This method combines label information by adding “text-tag-text” paths while constructing graphs, through which supervisory information can be transmitted more directly between graphs. Chang et al . designed a local aggregation function21, which is a shareable non-linear operation for aggregating local inputs with disordered arrangement and unequal dimensions over non-Euclidean domains. It can fit non-linear functions without activation functions and can be easily trained using standard back propagation. In 2021, Wang et al . proposed a new short text classification method based on GNN the better to utilize the interaction between nodes of the same type and capture the similarity between short texts22. This method first models the short text dataset as a hierarchical heterogeneous graph, then dynamically learns a short document graph to make label propagation between similar short documents more effective.
With the emergence of large-scale pre-training models in recent years, Devlin et al . proposed the pre-training model BERT (Bidirectional Encoder Representations from Transformers) based on self-attention mechanism6. BERT enhances a new representation of the input data at each layer of the encoder, obtaining a text representation with contextual information using multiple attention operations on different parts. Liu et al . made improvements on this basis7, cancelling the next sentence prediction task, using more diverse data for training and achieving better results. Some recent studies combined GCN and BERT. Jeong et al . proposed a citation graph model for paper recommendation tasks23, which combines the output of GCN and BERT to make the interaction between local information and global information conducive to downstream prediction tasks. Lu et al . established a BERT model based on graph embedding24, which connects word embedding with node representation, and makes local information and global information interact through BERT, to determine the final text representation.