Main findings
Stage 1 performed reasonably well at predicting surgical complexity level A, with high sensitivity and NPV, but moderate specificity and PPV. The intermediate stages 2 and 3 performed poorly for predicting corresponding surgical complexity levels. Stage 4 had poor PPV for predicting surgical complexity level D. Pre-determined staging thresholds performed well at discerning skill level A/B/C versus D (stage 4) but low specificity for A versus B/C/D and A/B/C versus D (stages 1, 2 and 3).