Preprints
https://doi.org/10.5194/egusphere-2025-4049
https://doi.org/10.5194/egusphere-2025-4049
01 Sep 2025
 | 01 Sep 2025
Status: this preprint is open for discussion and under review for Biogeosciences (BG).

A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Abstract. Predicting water quality variables in lakes is critical for effective ecosystem management under climatic and human pressures. Dissolved organic matter (DOM) serves as an energy source for aquatic ecosystems and plays a key role in their biogeochemical cycles. However, predicting DOM is challenging due to complex interactions between multiple potential drivers in the aquatic environment and its surrounding terrestrial landscape. This study establishes an open and scalable workflow to identify potential drivers and predict fluorescent DOM (fDOM) in the surface layer of lakes by exploring the use of supervised machine learning models, including random forest, extreme gradient boosting, light gradient boosting, catboosting, k-nearest neighbors, support vector regression and linear model. It was validated in two contrasting systems: one natural lake in Ireland with a relatively undisturbed catchment, and one reservoir in Spain with a more human-influenced catchment. A total of 24 potential drivers were obtained from global reanalysis data, and lake and river process-based modelling. Partial dependence and SHapley Additive exPlanations (SHAP) analises were conducted for the most influential drivers identified, with soil moisture, soil temperature, and Julian day being common to both study sites. The best prediction was found when using the CatBoost model (during hold-out testing period, Irish site: KGE > 0.69, r² > 0.51; Spanish site: KGE > 0.66, r² > 0.54). Interestingly, when only using drivers from globally accessible climate and soil reanalysis data, the prediction capacity was maintained at both sites, showcasing potential for scalability. Our findings highlight the complex interplay of environmental drivers and processes that govern DOM dynamics in lakes, and contribute to the modelling of carbon cycling in aquatic ecosystems.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Status: open (until 13 Oct 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Data sets

Data used in the manuscript for the first study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/feeagh/data

Data used in the manuscript for the second study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/sau/data

Model code and software

Codes used to obtain the results shown in the manuscript Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Viewed

Total article views: 51 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
43 5 3 51 0 0
  • HTML: 43
  • PDF: 5
  • XML: 3
  • Total: 51
  • BibTeX: 0
  • EndNote: 0
Views and downloads (calculated since 01 Sep 2025)
Cumulative views and downloads (calculated since 01 Sep 2025)

Viewed (geographical distribution)

Total article views: 51 (including HTML, PDF, and XML) Thereof 51 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 03 Sep 2025
Download
Short summary
Understanding what shapes lake water quality is vital in a changing world. We studied dissolved organic matter, a key part of water quality in lakes and the carbon cycle, to analyse its environmental drivers and make predictions, by using machine learning. Tested in lakes in Ireland and Spain, it showed predictive potential, even when relying only on global climate and soil data. This helps explain how land and climate conditions influence freshwater resources. It can be reproduced worldwide.
Share