Combining BERT and GCN
Herein, BERT is trained in another part of the network as an auxiliary classifier6. Combining BERT on GCN can make the network combine the advantages of large-scale pre-training model, resulting in more rapid convergence and better performance. In terms of specific implementation, an auxiliary classifier is constructed by embedding documents \(X\) directly into the Softmax layer:
Finally, linear interpolation is used to combine the representation of BERT and GCN with:
The reasons for better performance through interpolation are: ZBERT acts directly on the GCN input to ensure that the GCN input is tuned and optimized towards the goal. This helps the multi-layer GCN model to overcome inherent shortcomings, such as gradient disappearance or over-smoothing27, thus resulting in an improved performance.