Where IndVal is the Indicator Value of species i  in site cluster jAij , is a measure of specificity, Nindividualsij  is the mean number of individuals of species i  across sites of group j,  while Nindividualsi.  is the sum of the mean numbers of individuals of species i  over all groups. B,  is a measure of fidelity, Nsitesij  is the number of sites in cluster j  where species i  is present, while Nsites.j  is the total number of sites in that cluster. Bij  is maximum when species i  is present in all objects of cluster j. Indval greater than 50% were regarded as criteria to determine indicator species.
Five index of macroinvertebrate community namely species richness, abundance, biomass, Shannon’s diversity, and Pielou’s evenness were also determined in this study.
Statistical analysis
Prior to analysis, the macroinvertebrate abundance data was log-transformed Hellinger-transformed for SOM and Redundancy analysis (RDA), respectively. With exception of pH, all the environmental variables were log-transformed to satisfy the normality and variance assumption before doing PCA and RDA analysis. To analyze whether the classification of macroinvertebrate community was affected by environmental variables, Principal component analyses (PCA) was conducted to test the variation of environmental variables in each group, and the correlation between environmental variables was evaluated. Kruskal-Wallis test was performed to determine the important variables affecting the classification of macroinvertebrate community. The relationship between environmental variables and macroinvertebrate species composition were evaluated through RDA using rda function in the vegan package (Oksanen et al., 2013) of the R (version 3.6.3) statistical software (Team, 2019). Variance inflation factors (VIF) was used to test multicollinearity among environmental variables. Stepwise forward selection (Monte Carlo test with 999 permutations) was used to determine the environmental variables significantly correlated with the macroinvertebrate species. The statistical significance of species-environment correlations for the ordination axes were also determined based on 999 Monte Carlo permutation tests, and the eigenvalues of the first 2 axes were used to measure their importance (Ter Braak & Verdonschot, 1995). Spearman correlation analysis was used to evaluate the response of these community index to environmental variables. The two-tailed Student’s t  test (T-test) was used to test for significance (P < 0.05), while P -values were adjusted using the multiple comparisons test (Benjamini & Hochberg, 1995). SOM was conducted using the ANN Toolbox on Matlab software, R2010b (The MathWorks Inc., Natick, MA, USA). Shannon’s diversity and Pielou’s evenness were calculated in Primer (ver. E-v5) (Clarke & Gorley, 2001). K -means clustering analysis were performed in the vegan package (Oksanen et al., 2013) of the R (version 3.6.3) statistical software (Team, 2019). IndVal, PCA, ANOVA, and spearman correlation analysis were performed in the labdsv package (Roberts & Roberts, 2016), ade4 package (Dray & Siberchicot, 2020), agricolae package (de Mendiburu & de Mendiburu, 2019), and psych package (Revelle & Revelle, 2015) of R statistical software.
Results
A total of 44 macroinvertebrate taxa were collected and identified from all the sampling points during the study period which included 23 aquatic insects, 10 gastropods, 4 bivalves, 4 oligochaetes, 2 leeches and 1 crustacean (see Appendix S1in Table S2). Simple structure index (SSI) showed that when the neurons in the SOM output layer were divided into five groups, the clustering quality was the highest (Fig. 2a). SOM revealed both spatial and seasonal variation in the classification of macroinvertebrate community (Fig. 2b). Most of the sampling points in the autumn were grouped in Group I in the left-top area of the map. Group V located at the right-bottom area of the map mainly included the upper reaches of Lianhuan lake measured in spring and summer. All seasonal sampling points of Habuta Lake, which was farther away from other lakes, were grouped in Group II in the left-bottom area of the map. Most seasonal sampling points in the south-central part of Lianhuan Lake were grouped in Group III in the top area of the map. Most seasonal sampling points in the eastern part of Lianhuan Lake, which was closer to Durbote County, were grouped in Group IV in the right-top area of the map.
Variations of environmental variables, including WT, COND, pH, DO, TP, NH4-N, NO3-N, and CODMnin water in all seasons across the Lianhuan Lake are summarized in Fig. 4. All the environmental variables presented were statistically significant different among the groups. Group I which principally encompasses samples that were taken in autumn in the top-left area of the Lianhuan Lake, was characterized by lower values of WT, NH4-N and CODMn and high values of DO, COND, TP, and NO3-N. Group II which include sampling points located in Habuta lake on the bottom-left area of the Lianhuan Lake was characterized by high WT, pH and CODMn values. High pH values and relatively low CODMn characterized sampling points grouped in Group III located at the top area (South) of the map. Group IV which encompasses samples that were taken in spring and autumn at the bottom-right area of the map and included the upper reaches of Lianhuan lake was characterized by high values of WT, pH, and NH4-N and low values of DO and NO3-N. Sampling sites which were located closer to Durbote County grouped in Group V were characterized by high WT, DO and NH4-N.
According to IndVal ≥50% criterion, a total of 29 macroinvertebrates were found to be useful as indicator species for different groups (Table 1). However, 13 species with indicator values lower than 50% (31.89-48.98%) was also considered to be significant and important for particular groups. There were significant variations in the indicator species and the number of indicator species among the five groups. Group II had the most diverse indicator species, including one crustacean, three annelids, four molluscs, and five aquatic insects. Group III had only two indicator species, both of which belong to the Chironomidae. The indicator species of groups I and IV were dominated by Chironomidae and Mollusca. Group V indicator species were mainly characterized by Mollusca. It is worth noting that many indicator species such as Anatopynia sp and G. pervia , G. albus were distributed in two or more groups (see Appendix S1 in Table S2), implying that the differences in the indicator species between groups mainly resulted from differences in the abundance of the taxa for each group.
The PCA using 13 environmental variables explained 44.9% of the data variability in the first two axes (axis 1 = 23.8% of the total variance with eigenvalues of 3.10 and axis 2= 21.1% of the total variance with eigenvalues of 2.74). In axis 1, the most important variables which were positive correlated were NO2-N, TP, pH, SS, Chla, and DO while TN and WD were negatively correlated. Axis 1 fundamentally distinguished 2 groups: Group I and Group V. With respect to axis 2, the most important environmental variables which were positive correlated are TP, WT, NH4-N, and CODMn. Total phosphorus (TP), NO3-N and COND were negatively correlated with axis 2. Kruskal-Wallis test results indicated that pH, TP, NO3-N, WT, DO, COND, CODMn, and NH4-N had a significant effect on the classification of macroinvertebrate community (Fig. 4).
The RDA ordination of the macroinvertebrate composition with respect to environmental variables are presented in Fig. 5. Using the function ordistep from vegan package to conduct forward selection and screening of the environmental variables yielded 4 variables that were significant to the model. These variables were WT, pH, DO and Chla (Fig. 5). These 4 variables accounted for 77% of the total variance in the macroinvertebrate species composition. The first RDA axis which explained 45.3% of the total variability was positively correlated to WT while the second axis which explained 32.4% variability was positive correlated with pH. Among the strongest species-environment associations, we found that molluscs, such as G. albus , R. pereger , and S. glabra were significantly positively correlated with WT and negatively to pH and Chla. Annelids, such as B. sowerkyi and Herpobdella sp, were significantly positively correlated with Chla and DO, and negatively to pH. Aquatic insects, such as Chaoborus sp,Ephemera sp, and Anatopynia sp were were significantly positively correlated with Chla and DO,and negatively to WT.
From the Spearman correlation analysis (Table 2), the community index of macroinvertebrates was significantly affected by environmental variables. Macroinvertebrate abundance was most affected by environmental variables and was significantly negatively correlated with DO (R = -0.40, P = 0.01), NO2-N (R = -0.33, P = 0.04) and Chla (R = -0.32,P = 0.04). Both species richness and Shannon’s diversity were significantly negatively correlated with TP (R = -0.35, P = 0.04 and R = -0.34, P = 0.04). The biomass of macroinvertebrates was significantly negatively correlated with pH (R = -0.39,P = 0.01).