A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems

Mercado-Bettín, Daniel; Paíz, Ricardo; McCarthy, Valerie; Jennings, Eleanor; de Eyto, Elvira; Gallegos, Angeles M.; Dillanee, Mary; Garcia, Juan C.; Rodríguez, José J.; Marcé, Rafael

doi:10.5194/egusphere-2025-4049

Preprints

https://doi.org/10.5194/egusphere-2025-4049

Preprints

01 Sep 2025

| 01 Sep 2025

Status: this preprint is open for discussion and under review for Biogeosciences (BG).

A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Abstract. Predicting water quality variables in lakes is critical for effective ecosystem management under climatic and human pressures. Dissolved organic matter (DOM) serves as an energy source for aquatic ecosystems and plays a key role in their biogeochemical cycles. However, predicting DOM is challenging due to complex interactions between multiple potential drivers in the aquatic environment and its surrounding terrestrial landscape. This study establishes an open and scalable workflow to identify potential drivers and predict fluorescent DOM (fDOM) in the surface layer of lakes by exploring the use of supervised machine learning models, including random forest, extreme gradient boosting, light gradient boosting, catboosting, k-nearest neighbors, support vector regression and linear model. It was validated in two contrasting systems: one natural lake in Ireland with a relatively undisturbed catchment, and one reservoir in Spain with a more human-influenced catchment. A total of 24 potential drivers were obtained from global reanalysis data, and lake and river process-based modelling. Partial dependence and SHapley Additive exPlanations (SHAP) analises were conducted for the most influential drivers identified, with soil moisture, soil temperature, and Julian day being common to both study sites. The best prediction was found when using the CatBoost model (during hold-out testing period, Irish site: KGE > 0.69, r² > 0.51; Spanish site: KGE > 0.66, r² > 0.54). Interestingly, when only using drivers from globally accessible climate and soil reanalysis data, the prediction capacity was maintained at both sites, showcasing potential for scalability. Our findings highlight the complex interplay of environmental drivers and processes that govern DOM dynamics in lakes, and contribute to the modelling of carbon cycling in aquatic ecosystems.

Received: 19 Aug 2025 – Discussion started: 01 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Status: open (until 07 Jan 2026)

Post a comment Subscribe to comment alert

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Data sets

Data used in the manuscript for the first study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/feeagh/data

Data used in the manuscript for the second study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/sau/data

Model code and software

Codes used to obtain the results shown in the manuscript Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main

Daniel Mercado-Bettín, Ricardo Paíz, Valerie McCarthy, Eleanor Jennings, Elvira de Eyto, Angeles M. Gallegos, Mary Dillanee, Juan C. Garcia, José J. Rodríguez, and Rafael Marcé

Viewed

Total article views: 252 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
171	65	16	252	17	18

HTML: 171
PDF: 65
XML: 16
Total: 252
BibTeX: 17
EndNote: 18

Views and downloads (calculated since 01 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	102	27	5	134
Oct 2025	28	21	5	54
Nov 2025	38	16	6	60
Dec 2025	3	1	0	4

Cumulative views and downloads (calculated since 01 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	102	27	5	134
Oct 2025	28	21	5	54
Nov 2025	38	16	6	60
Dec 2025	3	1	0	4

Viewed (geographical distribution)

Total article views: 244 (including HTML, PDF, and XML) Thereof 244 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 Dec 2025

Short summary

Understanding what shapes lake water quality is vital in a changing world. We studied dissolved organic matter, a key part of water quality in lakes and the carbon cycle, to analyse its environmental drivers and make predictions, by using machine learning. Tested in lakes in Ireland and Spain, it showed predictive potential, even when relying only on global climate and soil data. This helps explain how land and climate conditions influence freshwater resources. It can be reproduced worldwide.


Total:	0
HTML:	0
PDF:	0
XML:	0