Potential of Machine learning techniques compared to MIKE-SHE model for drain flow predictions in tile-drained agricultural areas of Denmark
Abstract. Temporal drain flow dynamics and understanding of their underlying controlling factors are important for water resource management in tile-drained agricultural areas. The use of physics-based water flow models to understand tile drained systems is common. These models are complex, with large parameter sets and require high computational effort. The primary goal of this study was to examine whether simpler, more efficient machine learning (ML) models can provide acceptable solutions.
The specific aim of our study was to assess the potential of ML tools for predicting drain flow time series in multiple catchments subject to a range of climatic and landscape conditions. The investigation is based on unique data containing time series of daily drain flow in multiple field scale drain sites in Denmark. The data include: climate (precipitation, potential evapotranspiration, temperature); geological properties (clay fraction, first sand layer thickness, first clay layer thickness); and topographical indexes (curvature, Topographical wetness indexes, Topographical position index, elevation). Both static and dynamic variables are used in the prediction of drain flows. The ML algorithm extreme gradient boosting (XGBoost) and convolutional neural network (CNN) were examined, and the results were compared with a physics-based distributed model (MIKE-SHE).
The results show that XGBoost performs similarly to the physics-based MIKE-SHE models, and both outperform CNN. Both ML models required significantly less effort to build, train, and run than MIKE-SHE. In addition, the ML models support efficient feature importance analysis. This showed that climatic variables were important for CNN models and XGBoost. The results support the use of ML models for hydrologic applications with sufficient data for training. Further, the insights offered by the feature importance analysis may support further data collection and developments of physics-based models when existing data are insufficient to support ML approaches.