Fig. 3 Structure of global regression module.
Feature maps are obtained by the channel conversion module. and have
high spatial resolution but low semantic information, while and have
more semantic information but low spatial resolution. In addition to
obtaining rich hand feature information, the fusion of feature maps can
obtain detailed information, such as that of fingertips and masked
edges. To fuse the feature information, feature maps in different
dimensions are subjected to dimensionality reduction, so that their
channels can be unified under the same dimension,
, (6)
, (7)
where Vk is the feature map obtained by
dimensionality reduction, Uk is the feature map
obtained by upsampling, R1 is the convolution
operation with a 1 × 1 convolution kernel, is the ReLU function, andB is the upsampling operation of bilinear interpolation, which
calculates the corresponding points in the new image by the four
adjacent points as
, (8)
, (9)
. (10)
Equations (8) and (9) are linear interpolation operations in thex -direction, and equation (10) is a linear interpolation
operation in the y -direction. , , , and are points in the
original image with coordinates , , , and , respectively. and are added
to fuse feature information of different spatial resolutions. The
calculation method is
. (11)
D. Local Optimization
Module
To reduce errors generated by the global regression module, a local
optimization module addresses the inaccuracy of predicting the joint
position under occlusion. This can extract deeper information from
feature maps obtained by the global regression module.