As seen from Table 4, the performance of BERTGCN and PEGCN is much better than that of the single TextGCN or BERT models, with an improvement of 2 to 4% in each of the five datasets. Compared with BERTGCN, PEGCN is 0.36% better than BERTGCN on the 20NG dataset. The R8 and R52 datasets are improved by 0.12% and 0.05%; Ohsumed and MR datasets are improved by 0.96% and 3.59%, respectively. Both models, PEGCN and BERTGCN, combine GCN and pre-trained models, but PEGCN outperforms BERTGCN in terms of classification performance. This study analyzes that BERTGCN also uses BERT input representation, so this is not the main reason for the difference between the two. BERTGCN still uses the adjacency matrix for edge feature processing, while this study uses the processed edge matrix, which can extract richer edge features and improve the representation ability of node representation. At the same time, it also reduces the appearance of discrete points and improves storage efficiency. Therefore, PEGCN’s final classification results are superior to BERTGCN. The improvements made in TextGCN presented here are conducive to the improvement of classification accuracy and the competitiveness of the proposed model among similar models. As seen from Table 4, the classification accuracy of a series of combined models of TextGCN and BERT is generally higher than that of a single model of BERT or TextGCN. This indicates that the combination of TextGCN and large-scale pre-training model can significantly improve the classification accuracy, and the combination of TextGCN and pre-training model has significant advantages.
The further to prove the reliability of PEGCN, the proposed method is also compared with other classical methods (Table 5). Among them, the CNN proposed by Kim et al11 . in 2014, LSTM is the long and short-term memory network10, and Bi-LSTM is the two-way long and short-term memory network12. PTE is a network model based on word embeddings proposed by Tanget al .28 that learns word embeddings based on heterogeneous text networks with words, documents and labels as nodes, and then averages the word embeddings into document embeddings for text classification. FastText is a simple and efficient classification method proposed by Joulin29, which imparts the mean value of words or N-grams as documents and passes them to a linear classifier for classification. LEAM is an attention model based on tag embedding proposed by Wang et al .30, which imparts words and tags into the same space for text classification. SGC is a simplified graph network proposed by Wu et al .31; SSGC is a spectrogram network proposed by Zhuet al .32, which uses Markov diffusion nuclei to derive GCN, combining the advantages of spatial and spectral methods.
Table 5. Comparison of classification accuracy of different models. metric: accuracy (%)