Statistical analysis
Patient data were divided between training (384 patients) and validation
datasets (95 patients) at a ratio of 8:2 using simple random sampling.
Normally distributed continuous variables (expressed as means and
standard deviations) were compared using two-sample t-tests.
Non-normally distributed variables (expressed as medians and
interquartile range [IQR]) were compared using the Mann–Whitney U
test. Categorical variables (expressed as frequency and percentage) were
analyzed using χ2 or Fisher’s exact tests. All
non-binary variables were dichotomized via univariable logistic
regression, and optimal cut-off points were estimated via receiver
operating characteristic (ROC) curve analysis and determined based on
the maximum Youden index. Stepwise multivariate analysis was performed
to identify variables that were independently associated with severe
postoperative ALI. Odds ratios (ORs) were presented with corresponding
95% confidence intervals (CIs), and P-values were considered
statistically significant at P < 0.05. Variables with values
of P < 0.05 were included in a multivariate logistic
regression, as follows:
\begin{equation}
logit\ p\ =\ ln\ [p/(1\ -\ p)]\ =\ B0X0\ +\ B1X1\ +\ \cdots\ +\ BkXk,\nonumber \\
\end{equation}Where p denotes postoperative severe ALI, and each B value is expressed
as a coefficient of an independent risk factor in the final model for a
particular risk. Observed and predicted incidence of postoperative
severe ALI was compared. The model was calibrated using the validation
dataset and assessed using the area under the ROC curve. The goodness of
fit of the final model was tested using the Hosmer–Lemeshow test, and
tenfold cross-validation was conducted by randomly dividing the dataset
into 10 equally sized samples, refitting the model to each of the 10
sets comprising 90% of the data, calculating the area under the ROC
curve for the unused 10% in each case, and averaging 10 areas under ROC
curves. We excluded variables with missing rates that were
>20%. Furthermore, missing data were not filled in.
Statistical analyses were performed using SAS software (version 9.4;
Cary, NC, USA).