Statistical analysis
Descriptive data are presented as the mean ± SD or as a frequency. Categorical variables were analyzed using χ² or Fisher’s exact probability tests as appropriate. Continuous variables were analyzed using the Mann-Whitney U test (because the distribution of the continuous variables included in this study was not normal).
Baseline variables that were considered clinically relevant or candidate variables with a p-value <0.1 in univariate analysis model were included in the multivariate binary logistic regression analysis. The variables included in the multivariate analysis were strictly chosen and assigned the number of events available to optimize the parsimony of the final model. In addition, these variables were subjected to linear regression for collinearity analysis before multivariate regression analysis. Variables with tolerance was < 0.1 or variance inflation factor (VIF) > 10 were excluded from the multivariate binary logistic regression analysis. The goodness-of-fit test for the regression model was performed using the Hosmer-Lemeshow test and the Omnibus test.
The forward LR selection process was used to perform final model selection for the nomogram using a threshold of p<0.05. At this stage, factors that lacked clinical significance were excluded from the model. The receiver operating characteristic (ROC) curve was used to assess the discriminative power of the nomogram based on the cut-off value and the area under the curve (AUC). It is generally accepted that an AUC of 1.0 indicates perfect accuracy, an AUC of 0.7–0.8 indicates satisfactory discrimination, AUC values > 0.8 represent good discrimination and AUC of 0.5 indicates no relationship14. A calibration curve was plotted to evaluate the agreement between the actual results and the predicted values of PTB. A diagonal line of 45 degrees reflects that the model is robust. The nomogram was validated internally using relatively unbiased estimates (1000 repetitions) obtained by the bootstrapping method. The bootstrapping technique is a resampling approach used to randomly draw data and replace them with samples from the original dataset. The nomogram was calibrated by the Hosmer-Lemeshow test of the logistics regression model mentioned above. All statistical analyses were 2-tailed and p values<0.05 were statistically significant. The R Studio V.3.4.1 was used to establish the nomogram and ROC curve. Other analyses were performed using SPSS V.23.0.