Mandora predictions approach the clinical upper limit estimated by human scorers. The joint damages were scored independently by two human scorers. The correlation and RMSE between them were calculated to estimate the clinical upper limit. We benchmarked Mandora’s predictions in predicting joint space narrowing using a, Pearson’s correlation and b, root mean square error. We also benchmarked the performance in bone erosion prediction using c, Pearson’s correlation and d, root mean square error. In each comparison, we performed 10-fold cross-validation experiments. The paired Wilcoxon signed-rank tests were used to determine the statistical significance.