To evaluate the predictive performance of joint location, we performed 10-fold cross-validation experiments for each joint in the finger, wrist (Fig. 3a), and toe (Fig. 3b). Since the sizes vary across images, we used a normalized distance to measure the difference between predictions and ground truth labels (see details in Methods). Briefly, the coordinates of each point within an image are rescaled from pixel values to a continuous value between 0 and 1 by dividing the height or width of an image, so that the results are uniform and comparable across images with different sizes. The distributions of normalized distances are shown as boxplots in Fig. 3c-e. We selected a normalized distance of 0.02 as the cutoff (horizontal red dashed lines) to measure the predictive accuracy. Examples of normalized distances of 0.01 and 0.02 are shown in Fig. 3f-g. In general, joints from fingers are easier to locate and more than 98.0% of testing joints are within the 0.02 normalized distance (Fig. 3c). In contrast, joints from wrists are harder to distinguish, owing to their proximity (Fig. 3d). The accuracy of locating joints from toes is similar to that from fingers (Fig. 3e). Overall, the convolutional neural network model locates joints with high accuracy.