Prediction models:
a) ML Tool XGBoost: FVC(%), neutrophil(%), and FVC25-75(%) were the top three predictors, respectively based on ‘gain’ function(Table 3 ). MAPE for the model was 1.81%, indicating excellent performance.
b) Linear mixed-effects regression analyses: Hydroxyurea, FVC(%), neutrophil(%), and FVC25-75(%) were statistically significant and the top three predictors for adjusted DLCO (Table 2 ). The rest of the predictors analyzed, including FEV1/FVC, R5(%), and TLC(%), were not statistically significant. The regression model reproduced the exact rank list of six predictors as the XGBoost model (Table 3 ). MAPE between measured and eDLCO for the mixed-model was 9.1%, suggesting that XGBoost had superior prediction performance compared to the regression model (Figure 1 ).
Measured and estimated DLCO vs. outcome measures: Measured DLCO was significantly associated with the number of lifetime VOC/ACS events and TRJV (Table 4 ), but not with nocturnal hypoxemia (p=0.13). After adjusting for age and sex, each 1% decrease in DLCO was associated with 0.075 more lifetime ACS/VOC events (95%CI:-0.120 to -0.030) and 0.009 m/s higher TRJV (95%CI:-0.017 to -0.001). eDLCO, obtained from our predictive models, was also significantly associated with AOC/VOC events and TRJV (Table 4 ): after adjusting for age and sex, each 1% decrease in eDLCO was associated with 0.084-0.102 more lifetime ACS/VOC events (CI:-0.134 to -0.033 for the XGBoost model, and CI:-0.170 to -0.034 for the regression model) and with 0.009-0.014 m/s higher TRJV (CI:-0.017 to -0.001 for XGBoost, and CI:-0.025 to -0.003 for the regression model) (Table 4). Overall, results for modeled eDLCO were very close to those obtained with measured DLCO.
Validation of the prediction model: We tested the strength of the prediction model using LOOP method. Estimated DLCO (mean ± SD) was 87.9 ± 17.18 compared to measured DLCO of 87.79 ± 10.87, with good forecasting (MAPE of 17.3%) and significant correlation (r=0.40, p<0.001*) between two groups (figure 2).