Statistical analysis
Descriptive data are presented as the mean ± SD or as a frequency.
Categorical variables were analyzed using χ² or Fisher’s exact
probability tests as appropriate. Continuous variables were analyzed
using the Mann-Whitney U test (because the distribution of the
continuous variables included in this study was not normal).
Baseline variables that were considered clinically relevant or candidate
variables with a p-value <0.1 in univariate analysis model
were included in the multivariate binary logistic regression analysis.
The variables included in the multivariate analysis were strictly chosen
and assigned the number of events available to optimize the parsimony of
the final model. In addition, these variables were subjected to linear
regression for collinearity analysis before multivariate regression
analysis. Variables with tolerance was < 0.1 or variance
inflation factor (VIF) > 10 were excluded from the
multivariate binary logistic regression analysis. The goodness-of-fit
test for the regression model was performed using the Hosmer-Lemeshow
test and the Omnibus test.
The forward LR selection process was used to perform final model
selection for the nomogram using a threshold of p<0.05. At
this stage, factors that lacked clinical significance were excluded from
the model. The receiver operating characteristic (ROC) curve was used to
assess the discriminative power of the nomogram based on the cut-off
value and the area under the curve (AUC). It is generally accepted that
an AUC of 1.0 indicates perfect accuracy, an AUC of 0.7–0.8 indicates
satisfactory discrimination, AUC values > 0.8 represent
good discrimination and AUC of 0.5 indicates no
relationship14. A calibration curve was plotted to
evaluate the agreement between the actual results and the predicted
values of PTB. A diagonal line of 45 degrees reflects that the model is
robust. The nomogram was validated internally using relatively unbiased
estimates (1000 repetitions) obtained by the bootstrapping method. The
bootstrapping technique is a resampling approach used to randomly draw
data and replace them with samples from the original dataset. The
nomogram was calibrated by the Hosmer-Lemeshow test of the logistics
regression model mentioned above. All statistical analyses were 2-tailed
and p values<0.05 were statistically significant. The R
Studio V.3.4.1 was used to establish the nomogram and ROC curve. Other
analyses were performed using SPSS V.23.0.