4. Discussion
Fitting Allometric
Regression Models
The above-ground biomass was strongly correlated with DBH so that DBH is
the most influential factor affecting the biomass of the trees. Height
is the second important factor correlating with biomass showing a strong
correlation coefficient while the biomass was poorly correlated with
wood density. Thus, if there is no multicollinearity between DBH and H,
the combination of these two variables gives the best regression model.
Conducting a correlation test between response and explanatory variables
is an important step to select an appropriate explanatory variable and
to develop the best-fit regression model (Maraseni et al ., 2005).
Model fitting for Pouteria adolfi-friederici involves choosing
the best model among the 3 statistically significant candidate models.
The third complete model (Model six) is selected against the first two
nested models since the p-values of coefficients of the complete model
are significant (p < 0.05). In the case of nested models, a
statistical test can be used to test one of the models against the
other. The null hypothesis of this test is that θ = 0, i.e. the
additional terms are not significant, which can also be expressed as the
nested model is better than the complete model. If the p-value of this
test proves to be below the significance level (typically 5 %), then
the null hypothesis is rejected, i.e. the complete model is best.
Conversely, if the p-value is above the significance threshold, the
nested model is considered to be the best (Picard et al., 2012).
Since the best-fit model selected for P. adolfi-friedericiinvolves two predictor variables, the absence of Multicollinearity was
also checked using the Variance Inflation Factor. The VIF value obtained
was 1.8 for each predictor variable and this value assures that
multicollinearity is not a problem in this model. A strong correlation
between independent variables causes ‘strong multicollinearity’ by which
the true effect of estimated regression coefficients would be lost. The
general rule of thumb is that VIFs exceeding 4 warrant further
investigations, VIFs exceeding 10 are signs of serious multicollinearity
requiring correction while VIFs less than 4 indicates an absence of
multicollinearity problem (Belsley et al., 1980).
The overall accuracy of the model is 95.11% with an overall p-value
< 0.001, in this model the Adjusted R-squared (0.9511) is
closer to 1, showing that 95.11% of the variability of the above-ground
biomass is explained by this model. The ANOVA test conducted also shows
that the overall allometric equation was found to be statistically
significant (F=283.042, P <
0.001).Model validationThe best-fit regression models developed were validated based on a
series of assumptions, those of major importance are the residuals are
independent, residuals follow a normal distribution and residuals
variance is constant (homoscedasticity). Violation of these assumptions
may result in biased parameter estimates and type I errors (Quinn &
Keough 2002; Picard et al., 2012).
The Independence Assumption was tested by Durbin-Watson statistics which
has values ranging between 0 and 4. However, the residuals are
considered not correlated (independent) if the Durbin-Watson statistic
is between 1.5 and 2.5 (Field, 2009). The Durbin-Watson statistics of
models is less than 2.5, indicating that the residuals for the model
selected are uncorrelated; therefore, the independence assumption is met
in this study. The independence assumption is a significant assumption
that should be investigated prior to any interpretation of multiple
regression analysis, as a violation of this assumption could hold
critical implications (Stevens, 2009). Even a slight violation of the
independence assumption should be taken seriously, as it can greatly
increase the risk of Type 1 error, resulting in the risk of falsely
rejecting the null hypothesis several times greater than the level of
error assumed for the test (Stevens, 2009).
The assumption of normality and homoscedasticity was tested by
thoroughly investigating a quantile-quantile graph (Q-Q plot) and the
residuals against the fitted values plot developed. The residual errors
plotted versus their fitted values for the best-fit model indicated that
the residuals are randomly distributed around the horizontal line
representing a residual error of zero; that is, no distinct trend in the
distribution of points is observed and confirms the homoscedasticity
assumption of the model. The standard Q-Q plot of the best-fit models
also suggests that the residual errors are normally distributed.
Screening for normality is an important early step when conducting
multiple regressions, as residuals are normally distributed is assumed
(Stevens, 2009; Tabachnick & Fidell, 2006). Non-normal distributions
that are positively or negatively skewed, contain large kurtosis or have
extreme outliers which can distort the obtained significance levels of
the analysis, resulting in the standard errors becoming biased (Osborne
& Waters, 2002).
The assumption of homoscedastic indicates that the variance of errors is
equal and constant across all levels of the variables (Osborne &
Waters, 2002; Stevens, 2009). Homoscedasticity is related to the
assumption of normality because when the assumption of normality is met,
the relationship between the variables is homoscedastic (Tabachnick &
Fidell, 2006). Heteroscedasticity occurs when the variance of errors
differs at different values of the independent variables (Osborne &
Waters, 2002). Slight heteroscedasticity has little effect on
significance tests; however, when heteroscedasticity is marked it can
lead to serious distortions of findings and seriously weaken the
analysis thus increasing the possibility of a Type 1 error for a small
sample size (Osborne & Waters, 2002).
The scale-location plot for all models shows the square root of the
standardized residuals (sort of a square root of relative error) as a
function of the fitted values. Again, there is no obvious trend in this
plot which further confirms the absence of heteroscedasticity. Finally,
the plot in the lower right (Figure 3) shows each point’s leverage,
which is a measure of its importance in determining the regression
result. Superimposed on the plot are contour lines for the Cook’s
distance, which is another measure of the importance of each observation
to the regression. Smaller distances mean that removing the observation
has little effect on the regression results. Distances larger than 1 are
suspicious and suggest the presence of a possible outlier or a poor
model. In this study, the model selected exhibits Cook’s distance of
less than 1 which confirms the absence of possible outliers.