2.1 Statistical methods
The occurrence of the fish and lamprey species in relation to the
environmental variables was modelled using binary logistic regression
(BLR) analysis. In the preprocessing phase, highly (>0.7)
multicollinear predictors (latitude and precipitation) were removed from
the BLR analysis. The final number of environmental variables
(predictors) accepted for BLR analyses was therefore nine (Table 1). To
avoid pseudoreplication, only one randomly selected electrofishing
sample per site was included (N=487). Rare species, present in less than
3% of the sites, were excluded from the analysis, resulting in 13
species for the modelling (Table 2). The statistical significance of
each predictor was assessed by a chi-square test, with p-value
<0.05 indicating a significant impact. To assess the fit of
the models to our data, Nagelkerke (pseudo) R2 was
calculated for each model. Also Hosmer-Lemeshow goodness-of-fit test
(Hosmer & Lemeshow, 1989) was used with p-value >0.05
indicating an acceptable model fit. Accuracy of the BLR model was
calculated as the percentage (%) of the studied sites where the
presence or absence of a fish species was predicted correctly. BLR
analyses were conducted by IBM SPSS Statistics 26.
The interactions between 13 species occurrences and 9 environmental
variables were further studied using a self-organizing map (SOM,
Kohonen, 1982, 2001). In contrast to BLR, all species were processed in
a single model. In general, SOM is an unsupervised dimensionality
reduction method that visualizes high-dimensional data in a
low-dimensional map. In ecology, SOM has been extensively implemented
for information extraction, visualization, and clustering of community
data (Chon, 2011). Compared to some conventional statistical methods
(e.g. PCA, NMDS) used for community ordination, SOM has performed well,
for example, by allowing the visualization of interspecific association
even if it differs in different parts of the data space (Giraudel &
Lek, 2001). In addition, the network tolerates noise (Vesanto, Himberg,
Siponen, & Simula, 1998) by allowing outlying samples to affect only
one map unit and its neighborhood. The other areas of the map are not
affected by these data (Kaski, 1997). In this study, unsupervised SOM
was used to patternize 22 predictors (13 species + 9 environmental
variables) and 487 samples with a two-dimensional map which were then
grouped, i.e. clustered. This two-stage procedure, first using SOM to
produce the prototypes that are then clustered in the second stage, has
been found to perform well compared with direct clustering of the data
(Vesanto & Alhoniemi, 2000). The two dimensions of SOM were clustered
using the k-means algorithm (Kohonen, 2014). The Davies Bouldin validity
index (Davies & Bouldin 1979), which measures between- and
intra-cluster distances, was used as a performance criterion. In the
parameter optimization, SOM net sizes (number of nodes in x and y
dimensions) and the number of clusters in parameter k were altered,
using a grid search until the minimum of the Davies Bouldin index was
found, using the elbow criterion. In parameter optimization, the SOM net
size roughly followed the map size rule (of thumb) of Vesanto &
Alhoniemi (2000; N(nodes) = 5 x sqrt(Nrows)). Each trial
SOM consisted of 10,000 training rounds. In the preprocessing phase, the
occurrence of each fish species was dummy (zero or one, absence or
presence) coded. All predictors were then normalized with a zeroed mean
and variance of one. The learning rate function was inverse of time,
which ensures that all samples have an approximately equal influence on
the results. The statistical analyses were performed using RapidMiner
software (version Studio Large 9.7.000., https://rapidminer.com /,
Mierswa, Wurst, Klinkenberg, Scholz, & Euler, 2006).
Results
As anticipated for the small catchment areas of this study, there was
high variation among sites in the catchment land-cover variables (Table
1). The average catchment size and altitude at the sites occupied by
each of the fish species varied considerably. To illustrate this, the
positioning of three species in the
catchment size – altitude space
suggests that three-spined stickleback occupied small low-altitude
brooks, whereas brook trout, Salvelinus fontinalis (Mitchill
1814), dwelled in tributaries, and grayling, Thymallus thymallusL., in larger streams (Figure 1).
The BLR models were statistically significant (ꭓ2 (9)= 35.4 – 154.7, p<0.005) for all fish species, with the
exception of brook lamprey, Lampetra planeri (Bloch),
(ꭓ2 (9) = 10.1, p=0.340). The highest Nagelkerke
R2 values were recorded for the two stickleback
species (Table 2). In Hosmer-Lemeshow goodness-of-fit tests, the
p-values were >0.08 for all the models, indicating an
acceptable model fit for the data. The accuracy of the models was
usually high, ranging from 96.9% with ninespine stickleback to 62.8%
with brown trout. The absence of fish species was predicted by the
models much more correctly than presence, as indicated by specificity
(average 94.6%, SD 13.6%) vs. sensitivity (average 28.1%, SD 21.1%)
(Table 2).
The statistical performance of the best SOM model including all species
was good (explained variance 55.94%, Davies Bouldin index 0.71). The
net size (11 x 11 = 121 nodes) of the best-performing SOM model roughly
followed the map size rule of thumb (110 nodes). The smallest Davies
Bouldin index was attained with 4 clusters (Figure 2). Cluster 2 at the
top right of the SOM was occupied by bullhead, burbot, Lota lota(L.), grayling, and minnow, Phoxinus phoxinus (L.), and
characterized by a large catchment area, high altitude, and low mean
temperature (Figure 2, Table 3). Cluster 1 at the top left of the SOM
was occupied by perch, Perca fluviatilis L., roach, Rutilus
rutilus (L.), and northern pike, Esox Lucius L., and
characterized by a high water temperature at sampling, high annual mean
temperature, large catchment area, and low altitude (Figure 2, Table 3).
Cluster 0 at the bottom left of the SOM was occupied by the two
stickleback species, and characterized by a low altitude, high annual
mean temperature, low water temperature at sampling, and high percentage
of urban areas in the catchment. Brook trout was present in sites
clustered at the bottom right of the SOM, indicating preference for cold
high-altitude tributaries. Brown trout and stone loach, Barbatula
barbatula (L.), seemed to occupy two clusters simultaneously, whereas
the occurrence of brook lamprey could not be linked with any of the
studied environmental variables (Table 3, Figure 2).
The ranking of species in the mean air temperature gradient revealed the
two stickleback species favoured a warm environment, whereas minnow
appeared to be the ultimate cold-water species (Figure 3).
Discussion
Modelling fish species occurrence
in small boreal streams with a logistic regression and self-organizing
map indicated clear species-environment relationships. The obtained
species clusters and their associations with mainly map-based variables
appeared ecologically reasonable and largely concordant with species
groupings in the current bioassessment developed for larger boreal
streams (Vehanen, Sutela, & Korhonen, 2010). The results support the
development of fish-based bioassessment for small streams and help in
predicting fish assemblage changes in a warming climate.
The effect of small-scale local factors on controlling the occurrence of
lotic fish species has been found in numerous studies (e.g. Watson &
Hillman, 1997; Lamouroux, Capra, Pouilly, & Souchon, 1999; Wang et al.,
2003). However, the dominance of large-scale regional factors affecting
riverine fish assemblages has also been documented (e.g. Koel &
Peterka, 2003; DeRolph et al., 2015; Mitsuo, 2017). A wide variety of
hypotheses or theories has been put forward concerning the balance of
local and regional factors affecting riverine fish assemblages. It has
been hypothesized that large-scale processes determine the pool of the
fish species available to occur, whereas small-scale processes
eventually define the subset of fish species inhabiting a given site
(Pont, Hugueny, & Oberdorff, 2005). Although local habitat conditions
may be important determinants of fish abundance, they may be of limited
importance in determining presence and absence (Porter et al., 2000).
Sensitivity to local- and regional-scale processes has been found
species-specific (Pont et al., 2005). It was suggested that local
factors were most important to fish in minimally impaired watersheds,
but the effects of landscape-scale factors become increasingly important
as watersheds are increasingly modified by human activities (Wang et
al., 2003). However, a combination of local and regional variables has
often managed to explain a great deal of the variance in riverine fish
occurrence or density (e.g. Ripley, Scrimgeour, & Boyce, 2005; Pont et
al., 2005; Park, Grenouillet, Esperance, & Lek, 2006). Obviously, both
local and regional variables have an effect, and the inclusion of local
variables in our models would probably have enhanced the predictive
power. However, our results encourage the use of map-based (regional)
variables in modeling the species-environment relationships in small
streams, especially when confronting limited resources to control
site-specific local variables.
The sensitivity of the BLR model was rather poor, at least compared to
specificity (Table 2). The relatively small size of the electrofishing
area and the use of a single-run electrofishing sampling in this study
may have decreased the probability of getting all the fish species in
the catch. The information generated by single-visit surveys of fish
occurrence cannot account for intra-annual or interannual variation in
the upstream extent of fish distribution (Fransen et al., 2006). Small
streams are vulnerable to drought events inducing temporal variation in
fish assemblages (Grossman, Ratajczak, Crawford, & Freeman, 1998;
Keaton, Haney, & Andersen, 2005). Our model’s prediction of species
occurrence and absence may be of use in extending the current fish-based
bioassessment (Vehanen et al., 2010) to small brooks. For management and
inventory purposes, we recommend the application of larger data and
cross-validation in BLR.
The SOM clusters of fish species and environmental variables appeared
plausible. Cluster 0 was occupied by two stickleback species that seemed
to favour warm regions, low altitude, and the high share of urban areas
in the upper catchment. Sticklebacks have been considered to indicate
degradation in lowland brooks (Fieseler & Wolter, 2006). Freshwater
fish communities have been found sensitive to watershed urbanization
(Chen & Olden, 2020).
The occurrence of perch, roach, and northern pike (Cluster 1 in SOM) was
associated with a high annual mean temperature, a relatively large
catchment area, low altitude, and lakes in the upper catchment. These
three fish species are common lake species (Maitland & Campbell, 1992)
possibly spreading to small streams at warm water periods (Degerman &
Sers, 1994; Sutela, Vehanen, Huusko, & Mäki-Petäys, 2017). This trait
was supported by the frequent occurrence of these species with high
temperature at sampling (Table 3). The occurrence of bullhead, burbot,
grayling, and minnow was associated with a relatively large catchment
area, a high altitude, a low mean temperature, and open mires in the
catchment (Cluster 2). The fish species in this cluster can be
characterized as cold-water species (Logez, Bady, & Pont, 2012) living
in forested peatland regions. The only fish species centering cluster 3,
brook trout, favoured cold and small high-altitude streams. Brook trout
is an alien invader species in Europe, having been stocked in many
Finnish tributary streams. Brook trout also prefers small tributary
streams in its home district in North America (Kanno, Letcher, Rosner,
& O´Neil, 2015). Alien brook trout has been found to exclude brown
trout in small Finnish brooks (Korsu, Huusko, & Muotka, 2007).
The appearance of the most frequently encountered fish species, brown
trout, was centered in clusters 0 and 3 with avoidance of ditched
peatland in the upper catchment. The drainage ditching of peatland for
forestry causes the erosion and deposition of fine sediments in
headwater streams, accompanied by nutrient loading (Marttila & Kløve,
2010; Nieminen et al. 2018). Deposited sediment can diminish salmonid
embryo survival by decreasing redd gravel permeability, interstitial
water exchange, and therefore oxygen supply (Greig, Sear, & Carling,
2007; Louhi, Ovaska, Mäki-Petäys, Erkinaro, & Muotka, 2011; Michel et
al., 2014). These impacts may have suppressed the occurrence of brown
trout in catchments with a high coverage of ditched peatland in this
study.
Climate change scenarios forecast a high increase in the mean air
temperature for the European boreal ecoregion (Schneider, Laize,
Acreman, & Flörke, 2013). Fish species have evolved to fit distinct
thermal niches where they can optimize physiological, reproductive, and
ecological performance (Coutant, 1987; Graham & Harrod, 2009).
Temperature is one of the key abiotic factors affecting fish species
distribution (Matthews, 1998). Globally, fish species living in small
headwater streams are especially vulnerable to climate change (Buisson,
Thuillier, Lek, Lim, & Grenouillet, 2008; Buisson & Grenouillet,
2009). The presented ranking of the fish species along the mean air
temperature gradient can help in predicting the effects of a warming
climate on fish assemblages in the studied region. The breadth of the
thermal range largely delineates the ability of fish species to adapt to
climate change (Buisson & Grenouillet, 2009; Logez et al., 2012). In
this study, minnow expressed a relatively narrow thermal range at the
cold end of the gradient (Figure 3), suggesting high vulnerability to
the warming climate in this region. The thermal ranges of some fish
species (e.g. brown trout, perch, and northern pike) may vary, depending
on the size of the catchment area and stream power (Logez et al., 2012).
Accordingly, the inclusion of large rivers in the analyses could result
in a different outcome for fish species ordination along the mean
temperature gradient. These findings suggest that local stream
characteristics should be taken into account when predicting the effects
of climate change. Besides the increase in the mean air and river water
temperature in the European boreal ecoregion, future winter discharges
are likely to increase from the natural flow regime, while summer flows
will be less impacted (Schneider et al., 2013). The discharge aspect,
although probably of minor importance in the boreal region, should also
be taken into account when predicting the effects of a warming climate
on boreal riverine fish assemblages.
The assessment of the ecological status or integrity of surface waters
has been widely established around the world (Karr & Chu, 2000; Xu et
al., 2014; Poikane et al., 2020). In Europe, the legislation to achieve
a good ecological status in surface waters is guided by the WFD
(European Commission 2000). Bioassessment methods in rivers have been
developed using three biological groups: periphytic diatoms, benthic
invertebrates, and fish fauna. Stream biota is often impaired by
multiple pressures interacting in additive, synergistic, or antagonistic
ways (Schinegger, Trautwein, Melcher, & Schmutz, 2012). Diagnostic
tools for distinguishing the impacts of different pressures have been
called for to target the diminishing measures in water pollution control
(Lemm, Feld, & Birk, 2019; Poikane et al., 2020). In this study,
map-derived pressures of agriculture (fields), urban land cover, and
drainage ditching for forestry seemed to affect the occurrence of
certain fish species. These results encourage for the development of
diagnostic fish-based pressure-specific metrics for small boreal
streams.
A simple diagnostic tool (index) for evaluating direct effects of
climate change could be calculated as an average of two metrics, the
proportion of cold-water species (climate change intolerants, scaled to
0-1), and the proportion of warm-water species (climate change
tolerants, scaled to 0-1, inverse values) of an electrofishing sample.
Referring to Figure 3, in our case the cold-water species could be
minnow, grayling and brook trout, and the warm-water species
three-spined stickleback, ninespine stickleback, and stone loach. For a
wider use of this index, temperature preferences could be achieved like
in this study, or by using existing knowledge and references about
temperature preferences of fish species, such as Logez et al. (2012).
Possible indirect effects of climate change stemming from flushing of
nutrients (Wilby, Orr, Hefger, Forrow, & Blackmore, 2006), for
instance, could be integrated to the index following the basics
presented in Hering, Feld, Moog, & Ofenböck (2006).
In the fish-based integrity indices developed in bioassessment for
boreal and northern temperate zone, cool- or cold-water fish species are
often classified as intolerant species (Kanno, Vokoun, & Beauchene,
2010; Vehanen et al. 2010). This feature is also seen in the Figure 3,
where seven species from the left indicating favour of cold water can be
classified as intolerant (grayling, brook trout, bullhead, brook
lamprey, and brown trout) or intermediately tolerant (minnow and burbot)
referring to Holzer (2008). Respectively, the six species on the
right-hand side indicating favour of warm water can be classified as
tolerant (perch, roach, ninespine stickleback, and three-spined
stickleback) or intermediately tolerant (northern pike and stone loach).
The classification to tolerant and intolerant species by Holzer is
highly compatible with those used in fish integrity indices, such as
Pont et al. (2006), Hughes, Howlin, & Kaufman (2004), Vile & Henning
(2018), and Vehanen et al. (2010). Observed pattern in the sequence of
species in the temperature gradient in relation to tolerant–intolerant
division of species suggest that the integrity indices developed for
regions inhabited by these species should respond to the water
temperature rise in streams on itself, without the possible influence
via indirect effects such as altered discharge regime and flushing of
extra nutrients (Wilby et al., 2006). In other words, the effect of
climate change strictly as warming of the streams should be (by chance)
at least to some extent inborn in many of the present fish indices. As
an example, cool-water vs. warm-water species balance obviously affects
the FiFI index values, which can be easily approximated or calculated
based on the metrics by Vehanen et al. (2010). At any rate, when aiming
to integrate the effect of global warming to fish indices, the effect of
warming on the reference sites should be controlled by referring to the
earliest reliable electrofishing data or other historical fish data.
This somewhat different approach from the more adaptable attitude to the
direct effects of climate change in WFD (Nõges, Van de Bund, Cardoso, &
Heiskanen, 2007; Kristensen, Whalley, Néry, Zal, & Christiansen, 2018)
could be considered also with other biological quality elements.
Acknowledgments
We utilized data collected in the Life IP project FRESHABIT (LIFE
Programme of the European Union) in this study. The study reflects the
views of the authors, and neither the European Commission nor the EASME
is responsible for any use that may be made of the information it
contains. We thank Hanna Hentilä, Minna Kuoppala, and Kati
Martinmäki-Aulaskari from SYKE and Auli Immonen from Luke for catchment
delineation and other GIS work.