Analysis
We calculated the diagnostic performance of each observer to predict
level of surgical complexity for each stage, i.e. AAGL stage 1 to
predict level A, AAGL stage 2 to predict level B, AAGL stage 3 to
predict level C and AAGL stage 4 to predict level D. Data were analysed
to determine the kappa and weighted kappa scores, accuracy, sensitivity,
specificity, positive predictive value, negative predictive value,
positive likelihood ratio, and negative likelihood ratio, with 95%
confidence intervals. The AAGL system uses a cumulative point score
schema, and stage is determined by score thresholds. The paper byAbrão et al. describes logistic regression to determine the point
score thresholds defining stages 1-4, that would most accurately predict
skill levels A–D (2). Stage 1 was determined to be 0 – 8 points, stage
2 was 9 to 15 points, stage 3 was 16 to 21 points and stage 4 was above
21 points. We tested our dataset in the same manner: area under the
receiver operating characteristic curves (AUROC) were used to determined
overall performance of A vs B/C/D (for a threshold of 8), A/B vs C/D
(for a threshold of 15) & A/B/C vs D (for a threshold of 21), for each
observer.
Continuous data were summarised by mean and standard deviation, median
and interquartile range (25th to
75th percentile), and minimum to maximum. Categorical
data were summarised by counts and proportions expressed as percentages.
Ordinal data are described by cross-tabulation and summarised as
described for continuous data.