Position coding
Previous network models that utilized One-Hot vectors as GCN inputs were
unable to consider the relative positional information between words. In
contrast to such inputs, the PEGCN model uses the sum of Token Embedding
and Position Embedding as input word embeddings, which are sourced from
BERT distributed representations6. The input
representation of BERT is the sum of Token Embedding, Segment Embedding,
and Position Embedding. The introduction of Segment Embedding in BERT is
primarily for the next sentence prediction task. As the specific
classification tasks in this study all involve single sentences, the
Segment Embedding used to distinguish between the preceding and
following sentences is considered redundant for this task. Therefore,
only the sum of Token Embedding and Position Embedding is used to
represent the network input in this study. Specific details are
illustrated in Figure 2. The Token Embedding layer converts each word
into a fixed-size vector that contains the semantic meaning of the text.
In this study, the length and dimension of word vectors are both based
on the BERT paper. Each word is converted into a 768-dimensional vector
representation. Assuming a sentence length of 128, the sentence is
represented as a (128, 768) matrix after the Token Embedding layer.
Figure 2. Input representation of PEGCN.
The network will learn a vector representation on each Position of the
Position Embedding. The vector representation is coded as the
information of the sequence order. The network will judge the relative
position relationship of words in the sentence through the offset of
each vector. The Position Embeddings layer is essentially a table
measuring (128, 768) with the first row (when seen as a vector)
representing the first position of the first sequence, the second row
representing the second position of the sequence, and so on. The data of
each row in this table are randomly generated at the beginning and
updated with the training of the network. In the specific training, the
network will also consider the batch size of the model batch_size,
therefore, the Token Embedding and Position Embedding are represented as
the tensor of (batch_size,128,768). When they are added together, the
final input representation can be obtained. The word vectors obtained in
this manner are used as the input representation for PEGCN document
nodes in this study. The node embedding X for the document is
represented as a matrix of dimensions ((ndoc+nword) × d), where ndoc
represents the number of document nodes, nword represents the number of
word nodes, and d represents the dimensionality of the node embedding.