Analysis Method
The Random Forest (RF) method 22 was adopted to determine the relative importance of each variable on cluster, flux, and segregation characteristics. The scaling of variables was not needed.
The self-organizing map (SOM) was also adopted to ascertain the influence of the variables or lack thereof. SOM is useful in reducing multi-dimensional data to two-dimensional representations to ease the interpretation of large datasets 23,24. The datasets were scaled to the same bounds between -0.5 and 0.5 25before SOM was carried out:
\(x_{\text{scaled}}=\frac{x-x_{\min}}{x_{\max}-x_{\min}}-0.5\)(1)
where \(x\), \(x_{\min}\) and \(x_{\max}\) are respectively the value of the data point considered, the minimum value of the variable, and the maximum value of the variable.
The two-sample t-test was used to evaluate if there is any significant difference between the datasets from the fast and the turbulent beds, without assuming that the populations have equal variances. The test statistic was calculated by:
\(t=\frac{\overset{\overline{}}{x}-\overset{\overline{}}{y}}{\sqrt{\frac{s_{x}^{2}}{n_{x}}+\frac{s_{y}^{2}}{n_{y}}}}\)(2)
where \(\overset{\overline{}}{x}\) and \(\overset{\overline{}}{y}\) are the sample means, sx andsy are the sample standard deviations, andnx and ny are the sample sizes. The null hypothesis that the two datasets are from populations with means that are not different was tested at the 95% confidence level.
All analyses were performed with Matlab R2016b.