Benchmarking predictive performance of neural network models with or without segmentations of bone erosion. We compared two types of neural network models with or without the semantic segmentation of joint space regions. We benchmarked their performance in predicting joint space narrowing using a, Pearson’s correlation and b, root mean square error. We also benchmarked their performance in bone erosion prediction using c, Pearson’s correlation and d, root mean square error. In each comparison, we performed 10-fold cross-validation experiments. The paired Wilcoxon signed-rank tests were used to determine the statistical significance.