Statistical analysis
Patient data were divided between training (384 patients) and validation datasets (95 patients) at a ratio of 8:2 using simple random sampling. Normally distributed continuous variables (expressed as means and standard deviations) were compared using two-sample t-tests. Non-normally distributed variables (expressed as medians and interquartile range [IQR]) were compared using the Mann–Whitney U test. Categorical variables (expressed as frequency and percentage) were analyzed using χ2 or Fisher’s exact tests. All non-binary variables were dichotomized via univariable logistic regression, and optimal cut-off points were estimated via receiver operating characteristic (ROC) curve analysis and determined based on the maximum Youden index. Stepwise multivariate analysis was performed to identify variables that were independently associated with severe postoperative ALI. Odds ratios (ORs) were presented with corresponding 95% confidence intervals (CIs), and P-values were considered statistically significant at P < 0.05. Variables with values of P < 0.05 were included in a multivariate logistic regression, as follows:
\begin{equation} logit\ p\ =\ ln\ [p/(1\ -\ p)]\ =\ B0X0\ +\ B1X1\ +\ \cdots\ +\ BkXk,\nonumber \\ \end{equation}
Where p denotes postoperative severe ALI, and each B value is expressed as a coefficient of an independent risk factor in the final model for a particular risk. Observed and predicted incidence of postoperative severe ALI was compared. The model was calibrated using the validation dataset and assessed using the area under the ROC curve. The goodness of fit of the final model was tested using the Hosmer–Lemeshow test, and tenfold cross-validation was conducted by randomly dividing the dataset into 10 equally sized samples, refitting the model to each of the 10 sets comprising 90% of the data, calculating the area under the ROC curve for the unused 10% in each case, and averaging 10 areas under ROC curves. We excluded variables with missing rates that were >20%. Furthermore, missing data were not filled in. Statistical analyses were performed using SAS software (version 9.4; Cary, NC, USA).