TY - CHAP
T1 - Prediction with artificial neural networks of diatom assemblages in headwater streams of Luxembourg
AU - Rimet, F.
AU - Ector, L.
AU - Hoffmann, L.
AU - Gevrey, M.
AU - Giraudel, J. L.
AU - Park, Y. S.
AU - Lek, S.
PY - 2005
Y1 - 2005
N2 - In rivers benthic diatom assemblages are controlled by many environmental variables, such as climate, geology or nutrients (Stevenson 1997, Snyder et al. 2002). It is therefore often difficult to understand the relationships between the different diatom taxa and the environmental parameters. Classically, multivariate analyses, such as principal component analysis, are used to project the abundance of diatom data on a two dimensional graph. The first two axes are in general used, but they represent only a part of the explained variability. Cluster analysis based on Bray-Curtis distances (Bray and Curtis 1957) or Twinspan ordinations (Hill 1979b) are also very often used techniques to sort samples of diatom assemblages into groups of homogeneous composition. The Self Organizing Maps (SOM; Kohonen 1982) are able to cluster the sites and to summarise a database on a two dimensional graph. The projection is made in a non-linear way with an artificial neural network onto a map composed of hexagons (Fig. 5.7.1), the Kohonen map. In contrast to multivariate analysis, the Kohonen map represents all the explained variability of the dataset. The projection is made thanks to an artificial neural network composed of two layers: the first one (input layer) is connected to the samples, the second one (output layer) to the hexagons of the map. The projection is made respecting the similarities between the samples. Samples with homogeneous assemblages are placed in a same hexagon or in neighbouring hexagons and samples with very different assemblages are placed in distant hexagons. Artificial Neural Networks (ANN), are models that function somewhat like a human brain as they have a learning ability. They establish links between information and solutions. Only a few studies have already used this kind of techniques and have demonstrated their clustering properties for biological communities (Chon et al. 1996, Foody 1999, Giraudel and Lek 2001). Artificial Neural Networks using backpropagation algorithms (ANN-BP) are models that can be used for predicting one or several variables using other, predictive, variables. This kind of model is composed of processing elements called "neurons" which are arranged in three layers (Fig. 5.7.2). The first layer, the "input layer", is corresponding to the input variables, as, for instance, environmental parameters. The last layer, the "output layer", is composed of neurons providing a prediction for the expected variables. The layer between both, the "hidden layer", is composed of neurons composed of a non-linear function. These neurons receive information from the input neurons, transform it with the nonlinear function (log sigmoid for instance), and send this new information to the output neurons. Connections between neurons are weighted in order to modulate the importance of the information fluxes. Output neurons collect the information and give a prediction. The network is trained with a part of the database. Predictions are given and compared with the reality. The aim of the training is to find the best weights for the connections in order to get a minimal error between predictions and observations. The other part of the database, which is not used for the learning phase, is used to test the network with fixed weights (validation procedure). Several studies have already shown the ability of ANN-BP to predict the structure of macro-invertebrate or fish communities in rivers (Brosse and Lek 2000a, Chon et al. 2000c among others), density (Chon et al. 2000c) or species richness (Park et al. 2003a) of macroinvertebrate in rivers, Pacific sardine biomass (Cisneros-Mata et al. 2000) or fish species abundance in the Seine basin in France using stream order, slope, width, water quality, habitat quality and the ecoregion as input parameters (Boët and Fuhs 2000). Several studies showed that ANN-BP gives better results than other techniques such as multiple regressions (Lek et al. 1995, Brosse and Lek 2000b, Scardi 2000). This can be partly explained by the ability of artificial neural networks "to take into account the non-linear relationships between dependent variables and each independent variable" (Lek et al. 1995). Few applications exist for algal communities, besides prediction of blue-green algae blooms in rivers (Recknagel 1997) or in lakes (French and Recknagel 1994). Scardi (2000) used ANN to predict phytoplankton productivity in the Gulf of Napoli from temperature, irradiance, geographical coordinates and the day of the year as predictive variables. The structure of benthic diatom assemblages is potentially determined by many environmental factors acting at different spatial scales (Stevenson 1997, Snyder et al. 2002). Several studies established correlations between diatoms and physico-chemical variables (Stoermer and Smol 1999), and, among other applications, these observations led to the development of biotic indices to assess the water quality. These biotic indices are generally calculated with species sensitivity to pollution, their indicator value and their relative abundance. They are now routinely used in Europe (Prygiel et al. 1999) and some of them are standardized (Kelly et al. 1998, AFNOR 2000). Recently, the European Parliament (2000) required to develop new assessment tools for inland waters, which must comply to the Water Framework Directive (WFD). This directive aims to restore surface and underground waters to a good ecological status, which implies to define typologies and reference conditions for several water bodies. The WFD specifies that the ecoregional setting has to be taken into account. Ecoregions (Omernik 1987, 1995, Wasson in Cemagref 2001) classify landscapes on the basis of climates, geology, soil topography and potential vegetation. River typology can be defined for instance on the basis of distance to the source, river width, slope and maximal annual temperature (Verneaux and Leynaud 1974). The reference conditions should be defined in each river type in each ecoregion, and this can be done using modelling techniques. The first aim of this study was to explore benthic diatom assemblages in headwater streams of Luxembourg in order to propose type assemblages for unpolluted rivers of this typological level. The interest to work on a spatially reduced region and on homogeneous typological levels lies in the homogeneity of diatom and environmental data and in the precision of the classification developed. A priori, such an approach should provide more detail than studies carried out at a wider scale using data coming from very different regions and river types. Different assemblages were defined using SOM (Kohonen 1982) and could be related to distinct regions of Luxembourg. The second objective was to test whether diatom assemblages of the headwater streams of Luxembourg can be accurately predicted by using environmental parameters, in order to assess the use predictive models for defining the reference conditions as required by the WFD. ANN-BP was used to predict the groups of samples formerly defined by SOM.
AB - In rivers benthic diatom assemblages are controlled by many environmental variables, such as climate, geology or nutrients (Stevenson 1997, Snyder et al. 2002). It is therefore often difficult to understand the relationships between the different diatom taxa and the environmental parameters. Classically, multivariate analyses, such as principal component analysis, are used to project the abundance of diatom data on a two dimensional graph. The first two axes are in general used, but they represent only a part of the explained variability. Cluster analysis based on Bray-Curtis distances (Bray and Curtis 1957) or Twinspan ordinations (Hill 1979b) are also very often used techniques to sort samples of diatom assemblages into groups of homogeneous composition. The Self Organizing Maps (SOM; Kohonen 1982) are able to cluster the sites and to summarise a database on a two dimensional graph. The projection is made in a non-linear way with an artificial neural network onto a map composed of hexagons (Fig. 5.7.1), the Kohonen map. In contrast to multivariate analysis, the Kohonen map represents all the explained variability of the dataset. The projection is made thanks to an artificial neural network composed of two layers: the first one (input layer) is connected to the samples, the second one (output layer) to the hexagons of the map. The projection is made respecting the similarities between the samples. Samples with homogeneous assemblages are placed in a same hexagon or in neighbouring hexagons and samples with very different assemblages are placed in distant hexagons. Artificial Neural Networks (ANN), are models that function somewhat like a human brain as they have a learning ability. They establish links between information and solutions. Only a few studies have already used this kind of techniques and have demonstrated their clustering properties for biological communities (Chon et al. 1996, Foody 1999, Giraudel and Lek 2001). Artificial Neural Networks using backpropagation algorithms (ANN-BP) are models that can be used for predicting one or several variables using other, predictive, variables. This kind of model is composed of processing elements called "neurons" which are arranged in three layers (Fig. 5.7.2). The first layer, the "input layer", is corresponding to the input variables, as, for instance, environmental parameters. The last layer, the "output layer", is composed of neurons providing a prediction for the expected variables. The layer between both, the "hidden layer", is composed of neurons composed of a non-linear function. These neurons receive information from the input neurons, transform it with the nonlinear function (log sigmoid for instance), and send this new information to the output neurons. Connections between neurons are weighted in order to modulate the importance of the information fluxes. Output neurons collect the information and give a prediction. The network is trained with a part of the database. Predictions are given and compared with the reality. The aim of the training is to find the best weights for the connections in order to get a minimal error between predictions and observations. The other part of the database, which is not used for the learning phase, is used to test the network with fixed weights (validation procedure). Several studies have already shown the ability of ANN-BP to predict the structure of macro-invertebrate or fish communities in rivers (Brosse and Lek 2000a, Chon et al. 2000c among others), density (Chon et al. 2000c) or species richness (Park et al. 2003a) of macroinvertebrate in rivers, Pacific sardine biomass (Cisneros-Mata et al. 2000) or fish species abundance in the Seine basin in France using stream order, slope, width, water quality, habitat quality and the ecoregion as input parameters (Boët and Fuhs 2000). Several studies showed that ANN-BP gives better results than other techniques such as multiple regressions (Lek et al. 1995, Brosse and Lek 2000b, Scardi 2000). This can be partly explained by the ability of artificial neural networks "to take into account the non-linear relationships between dependent variables and each independent variable" (Lek et al. 1995). Few applications exist for algal communities, besides prediction of blue-green algae blooms in rivers (Recknagel 1997) or in lakes (French and Recknagel 1994). Scardi (2000) used ANN to predict phytoplankton productivity in the Gulf of Napoli from temperature, irradiance, geographical coordinates and the day of the year as predictive variables. The structure of benthic diatom assemblages is potentially determined by many environmental factors acting at different spatial scales (Stevenson 1997, Snyder et al. 2002). Several studies established correlations between diatoms and physico-chemical variables (Stoermer and Smol 1999), and, among other applications, these observations led to the development of biotic indices to assess the water quality. These biotic indices are generally calculated with species sensitivity to pollution, their indicator value and their relative abundance. They are now routinely used in Europe (Prygiel et al. 1999) and some of them are standardized (Kelly et al. 1998, AFNOR 2000). Recently, the European Parliament (2000) required to develop new assessment tools for inland waters, which must comply to the Water Framework Directive (WFD). This directive aims to restore surface and underground waters to a good ecological status, which implies to define typologies and reference conditions for several water bodies. The WFD specifies that the ecoregional setting has to be taken into account. Ecoregions (Omernik 1987, 1995, Wasson in Cemagref 2001) classify landscapes on the basis of climates, geology, soil topography and potential vegetation. River typology can be defined for instance on the basis of distance to the source, river width, slope and maximal annual temperature (Verneaux and Leynaud 1974). The reference conditions should be defined in each river type in each ecoregion, and this can be done using modelling techniques. The first aim of this study was to explore benthic diatom assemblages in headwater streams of Luxembourg in order to propose type assemblages for unpolluted rivers of this typological level. The interest to work on a spatially reduced region and on homogeneous typological levels lies in the homogeneity of diatom and environmental data and in the precision of the classification developed. A priori, such an approach should provide more detail than studies carried out at a wider scale using data coming from very different regions and river types. Different assemblages were defined using SOM (Kohonen 1982) and could be related to distinct regions of Luxembourg. The second objective was to test whether diatom assemblages of the headwater streams of Luxembourg can be accurately predicted by using environmental parameters, in order to assess the use predictive models for defining the reference conditions as required by the WFD. ANN-BP was used to predict the groups of samples formerly defined by SOM.
UR - http://www.scopus.com/inward/record.url?scp=34147112614&partnerID=8YFLogxK
U2 - 10.1007/3-540-26894-4_28
DO - 10.1007/3-540-26894-4_28
M3 - Chapter
AN - SCOPUS:34147112614
SN - 3540239405
SN - 9783540239406
SP - 343
EP - 354
BT - Modelling Community Structure in Freshwater Ecosystems
PB - Springer Berlin Heidelberg
ER -