Based on the joint location obtained from the previous step, we cut full images into image patches that were centered around joints to be scored. Then the patches were treated as the input for a deep learning model to predict the damage score of each joint. We design a novel neural network architecture for the patch-based damage prediction that simultaneously outputs the damage score as well as the segmentation of the joint space region (Fig. 4). Specifically, the architecture contains two parts. The first part includes an encoder and a decoder so that it extracts features from multiple scales and resolutions. The output of the first part is the segmentation mask that is further used as the input for the second part of the neural network. The rationale is that with the guidance of the segmentation mask, the neural network will easily learn where to “look at” and focus primarily on the regions of interest to determine the damage level. The second part contains a regressor that generates one output value representing the damage score.