the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems
Abstract. Predicting water quality variables in lakes is critical for effective ecosystem management under climatic and human pressures. Dissolved organic matter (DOM) serves as an energy source for aquatic ecosystems and plays a key role in their biogeochemical cycles. However, predicting DOM is challenging due to complex interactions between multiple potential drivers in the aquatic environment and its surrounding terrestrial landscape. This study establishes an open and scalable workflow to identify potential drivers and predict fluorescent DOM (fDOM) in the surface layer of lakes by exploring the use of supervised machine learning models, including random forest, extreme gradient boosting, light gradient boosting, catboosting, k-nearest neighbors, support vector regression and linear model. It was validated in two contrasting systems: one natural lake in Ireland with a relatively undisturbed catchment, and one reservoir in Spain with a more human-influenced catchment. A total of 24 potential drivers were obtained from global reanalysis data, and lake and river process-based modelling. Partial dependence and SHapley Additive exPlanations (SHAP) analises were conducted for the most influential drivers identified, with soil moisture, soil temperature, and Julian day being common to both study sites. The best prediction was found when using the CatBoost model (during hold-out testing period, Irish site: KGE > 0.69, r² > 0.51; Spanish site: KGE > 0.66, r² > 0.54). Interestingly, when only using drivers from globally accessible climate and soil reanalysis data, the prediction capacity was maintained at both sites, showcasing potential for scalability. Our findings highlight the complex interplay of environmental drivers and processes that govern DOM dynamics in lakes, and contribute to the modelling of carbon cycling in aquatic ecosystems.
- Preprint
(28308 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4049', Thelma Panaïotis, 12 Dec 2025
-
AC1: 'Reply on RC1: Thelma Panaïotis', Daniel Mercado-Bettín, 01 Feb 2026
We would like to thank the reviewers for their time in reviewing this manuscript. All the suggestions and comments have been carefully considered and will be taken into account in the revised version of the manuscript. Here, we propose actions to enhance our work, incorporating their feedback. We appreciate their time and the valuable insights provided during the revision of the manuscript.
Note:
In bold text: the comments, observations and suggestions of the reviewers.
In normal text: the responses of the authors of this manuscript.
Disclaimer: I have strong expertise in machine learning but my primary background is in oceanography, so I am less familiar with the specific impacts and conclusions relevant to lake ecosystems. Consequently, my review emphasizes the technical and methodological aspects of the manuscript and provides fewer comments on the ecological interpretation of the results.
The study by Mercado-Bettín et al. investigates how dissolved organic matter dynamics in two contrasting freshwater lakes can be modeled using a set of machine‑learning algorithms. By assembling a comprehensive set of physicochemical and meteorological predictors, the authors train and compare several regression techniques (including linear models, random forests, support vector regression and three gradient‑boosting frameworks) to predict fluorescent dissolved organic matter (fDOM) concentrations. Their analysis explores which environmental drivers most strongly influence fDOM variability and evaluates model performance across the two lake systems, aiming to demonstrate that data‑driven approaches can capture the temporal patterns of organic matter turnover in inland waters. Furthermore, they show that reducing the set of predictors merely decreases the prediction performance of the ML model.
We thank the reviewer for the careful reading of the manuscript and for the clear and constructive summary of our work. The focus on the technical and methodological aspects of the study is appreciated, as the expertise in machine learning aligns closely with the objectives of the manuscript. Below, we address specific comments in detail.
General comments
The manuscript is clearly written and follows a logical progression, which makes the authors’ objectives easy to grasp. Nonetheless, several critical steps are lacking to support the central claim.
We thank the reviewer for this observation and will address the missing steps outlined below in order to better support the central claim of the manuscript.
- First, the manuscript does not show whether the target variable was inspected for distributional anomalies such as skewness, outliers, or zero‑inflation before model fitting. A simple histogram or density plot of fDOM (and a note on any transformation applied) would let readers judge whether the data were appropriately conditioned.
We thank the reviewer for this valuable suggestion. We will include an explicit exploratory data analysis of the target variable prior to model fitting, focusing on distributional properties, skewness, outliers, and potential zero-inflation. This analysis has already been conducted, and the corresponding code is available in the public repository (codes/1_exploratory_data_analysis.R). The results will be added to the Supplementary Material of the manuscript.
Preliminary results show that fDOM at one of the study sites (Feeagh) exhibits low to moderate skewness (approximately 0.5) with relatively few outliers, whereas the other study site (Sau) displays higher skewness (greater than 1 but below 2) and several extreme values. Zero-inflation does not appear to be an issue at either site.
Feeagh study site: in Supplementary1.pdf
Sau study site: in Supplementary1.pdf
No transformation was applied to the fDOM data prior to model fitting in the main analysis. We agree with the reviewer that transformations can be beneficial for distance-based models such as k-Nearest Neighbors (KNN), kernel-based models such as Support Vector Regression (SVR), and linear models, particularly when strong skewness is present. In the case of Feeagh, the low skewness suggests that this is not critical, while for Sau, it may be more relevant, although skewness remains moderate rather than extreme. By contrast, the models that showed the best performance in this study, Random Forest (RF), XGBoost, LightGBM (LGB), and CatBoost (CTB), are tree-based methods, which are generally robust to skewed distributions and do not require data transformation (see, e.g., the “Decision Trees – Strengths” section in Molnar, 2025). To directly address the reviewer’s point, we will additionally include, in the Supplementary Material, a comparison of model performance for all machine-learning methods after applying a log-transformation to the fDOM data.
- Second, the choice of six machine‑learning models, including three gradient‑boosting implementations, is left unexplained. Because these three models are largely interchangeable, the authors should either justify retaining each (for example, to compare computational efficiency or regularisation strategies) or reduce the set to a smaller, well‑motivated collection, explicitly outlining the strengths and weaknesses of each algorithm and stating whether a broad model comparison is a declared aim of the study.
We thank the reviewer for this important point. We acknowledge that the rationale for including multiple machine-learning models, particularly several gradient-boosting implementations, was not sufficiently explained in the original manuscript.
Our primary objective was not to conduct an exhaustive benchmark comparison of machine-learning algorithms. Rather, given the complexity of fDOM dynamics and the relatively high number of heterogeneous input drivers, we adopted a pragmatic modelling strategy that explored a small but diverse set of commonly used approaches. Prior to the analysis, it was not clear which modelling family would be most suitable for capturing the nonlinear and site-specific behaviour of fDOM, particularly across two contrasting lake systems.
The inclusion of multiple gradient-boosting frameworks (XGBoost, LightGBM, and CatBoost) was motivated by their known differences in tree construction, regularisation strategies, and handling of feature interactions, rather than by the assumption that they would perform similarly. Including these models allowed us to assess the robustness of the results to algorithmic choices within the same methodological family, while also comparing them against different approaches.
We will revise the manuscript to (i) explicitly state that a broad model benchmark is not the primary aim of the study, (ii) provide a concise justification for retaining each model, including their main strengths and limitations in the context of fDOM prediction, and (iii) clarify that the comparative results are intended to offer practical guidance on which modelling approaches may be prioritised for similar lake-based applications, including considerations of robustness and computational efficiency.
- Third, hyper‑parameter tuning is mentioned but not described. The manuscript should specify which parameters were tuned for each model, the search space explored, the optimisation strategy (grid, random, Bayesian, etc.) and the validation split used. Detailing this process is essential for assessing model robustness and guarding against over‑fitting. An early subsection that summarises key performance metrics of the different models would also ground the subsequent discussion of variable importance in a known predictive skill.
We thank the reviewer for this very important point. We agree that a clearer description of the hyperparameter tuning procedure will improve transparency, reproducibility, and assessment of model robustness.
To address this, we will expand the Methods section to explicitly describe, for each model, the hyperparameters that were tuned, the search space explored, the optimisation strategy applied, and the validation split used during tuning.
Regarding model performance metrics, we note that the metrics currently used in the manuscript were selected because they are commonly employed in lake modelling and water quality studies. However, we agree with the reviewer that an earlier and clearer contextualisation of these metrics across models would strengthen the manuscript. We will therefore add a short subsection summarising key performance metrics for all models prior to the driver-attribution analysis, in order to ground the subsequent discussion of variable importance in terms of predictive skill.
These additions will clarify the modelling workflow and address the concerns raised.
- All figures suffer from overly small text; increasing the font size for axes, legends, and annotations is essential for readability. In Figures 3 and 4 the legends contain interpretative statements, which blurs the line between description and analysis : legends should merely describe the visual content, leaving interpretation to the main text.
We thank the reviewer for this observation. We agree that figure readability can be improved, we will increase the font size of axes, legends, and annotations across all figures. In addition, we will revise the legends of all figures to remove interpretative statements, ensuring that legends are strictly descriptive and that all interpretation is confined to the main text.
- Finally, the code is shared in a reasonably structured way, but two key improvements would greatly enhance its usability. First, assigning a clear execution order, either by numbering the scripts or by providing a master driver script, would allow anyone to run the workflow sequentially without guesswork. Second, replacing absolute paths such as “~/Documents/intoDBP/driver_attribution_fdom/” with relative paths or a configurable settings file would make the repository portable across different machines and operating systems. Implementing these changes would substantially increase the rigor and reproducibility of the study.
We thank the reviewer for raising this important point, which is fully aligned with the FAIR principles (Findable, Accessible, Interoperable, and Reusable) and good practices for reproducible research. In response, we have added a clear execution order by numbering all scripts in the repository and replaced absolute paths with relative paths to improve portability across different machines and operating systems.
These changes will ensure that the workflow is fully checked and functional at the time of resubmission, so that the code can be executed sequentially across platforms.
Specific comments
Figure 1: the caption should define DOC; otherwise the figure nicely illustrates the workflow
Thank you for noticing this. The definition of DOC (dissolved organic carbon) will be added to the figure caption in the revised manuscript.
L21-23: please indicate the direction of variation for each process (increase or decrease)
We will indicate the direction of variation for each process described in these lines.
L59: insert the word “respectively”
We will insert it.
L93-96: you mention a 2 minutes measurement resolution but also give a number of points that roughly matches the number of days between the dates. Clarify whether the data were averaged daily before analysis.
We agree that this requires clarification. The high-frequency (2-minute) fDOM data were averaged to daily values prior to analysis, and this will be explicitly stated in the revised manuscript.
Figure 2: define the acronyms “GWLF” and “GLM” (they are only explained later, at lines 114‑116). Also define “NSE”; you introduce NSE as a model‑evaluation metric here but it does not appear elsewhere in the paper.
We thank the reviewer for pointing this out. We will define the acronyms GWLF (Generalised Watershed Loading Functions model) and GLM (General Lake Model) directly in the Figure 2 caption for clarity. In addition, NSE (Nash–Sutcliffe Efficiency) will be defined at its first occurrence, and we will ensure consistent usage throughout the manuscript.
L126-127: see also Molnar (2025) for interpretable machine learning.
We thank the reviewer for this suggestion. We will include a citation to Molnar (2025) at this point in the manuscript to further support the statement on interpretability and clarify how supervised machine-learning approaches can provide insight into driver attribution beyond a “black box” perspective.
L128-135: the description of how the data were split into training and test sets, and how the models were evaluated, is unclear. Align this description with the "Prediction Workflow" paragraph and distinguish clearly between validation (hyperparameter optimisation) and testing (final performance).
We acknowledge that the description of the data split and model evaluation was unclear. This section will be revised to align explicitly with the Prediction Workflow paragraph and to clearly distinguish between validation (used for hyperparameter tuning) and testing (used for final performance assessment), as also outlined in our response to General Comment 3.
L136: the term “statistical” is vague (all these models belong to supervised machine learning), and the choice of models needs justification. Explain why these particular algorithms were selected, why e.g. a neural network was not considered, and what the relative strengths and weaknesses of each method are. If three gradient‑boosting frameworks (XGBoost, LightGBM, CatBoost) are used, state the reason for testing all three rather than picking one.
The term “statistical” will be replaced with “supervised machine-learning models” throughout the manuscript. The rationale for the selection and retention of the different modelling approaches, including the relative strengths and limitations of each method and the motivation for testing multiple gradient-boosting frameworks (XGBoost, LightGBM, and CatBoost), will be clarified in a revised Methods subsection, as outlined in our response to General Comment 3. In line with that revision, we will also explicitly discuss the consideration of neural networks and the reasons for their inclusion or exclusion in the updated analysis.
L140-141: consider using permutation‑importance as an alternative that works across all models
We thank the reviewer for this suggestion. We will explore permutation importance as a model-agnostic alternative for feature importance across all methods and include these results in the Supplementary Material to support the main findings.
L165: I would like to see a short “Model performance” subsection before the analysis of important drivers, so readers first see how well the models actually performed.
We agree with this suggestion and will add a short Model performance subsection prior to the driver-attribution analysis to clearly present predictive skill before discussing variable importance.
Figure 3: I’m confused by what you mean regarding the “four ML models that directly provide feature importance: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), CatBoost (CTB)”. These models do not provide direct feature importance, you have to compute it (that you did with node purity and gain contribution. On the opposite, a linear model directly provides feature importance, through its parameters (see Molnar (2025)).
We thank the reviewer for this important clarification. We agree that the wording “directly provide feature importance” is inaccurate and will be removed. In the revised manuscript, we will explicitly state that feature importance is computed for these models using metrics such as node purity (Random Forest) and gain contribution (XGBoost, LightGBM, and CatBoost).
This revision will correct the terminology and more accurately reflect the feature-importance methodology used in the study.
Table 1: RMSE should be reported with its units, since it shares the same unit as the predicted variable. The fact that XGBoost yields a very poor performance (R² = 11%) compared to others (including a linear model with R² = 45%) suggests that the training could be largely improved. Which hyperparameters did you choose for the XGBoost? I dug in the code and found that only a few hundred trees (100-300) were used, which may be insufficient (but this also depends on the learning rate). Typically, boosting procedures require thousands of trees (see Hastie et al., 2009; Chapter 10). Moreover, the reported R² of 99 % on the training data in Table A1 strongly suggests overfitting.
We agree that RMSE units should be reported and will add them in the revised manuscript. The reviewer is correct that the low number of trees used during XGBoost hyperparameter tuning led to poor performance. We have rerun the analysis using thousands of trees, and XGBoost performance is now comparable to the other boosting methods. The issue of overfitting, as indicated by the high training performance, is acknowledged as a limitation in the current discussion section; we will make sure that this limitation is explicitly clarified in the revised manuscript.
Figure 4: mixing a partial‑dependence plot (PDP) for the Random Forest with SHAP values for CatBoost makes the interpretation confusing. Choose one model and interpret it thoroughly. Do not embed interpretation inside the legend. For the SHAP plot, indicate how the variables are ordered (e.g. by mean absolute SHAP value). Also, the cosine‑scaled Julian day axis should be accompanied by a note translating the cosine values back to calendar seasons, because “cos(julian day) = ‑0.5” is not intuitive.
We thank the reviewer for this helpful comment. We included both PDPs (Random Forest) and SHAP values (CatBoost) to provide complementary perspectives on feature influence derived from two different models. However, we agree that presenting both in the main figure can make interpretation less clear. In the revised manuscript, we will evaluate using a single model for the main interpretation and move the complementary analysis to the Supplementary Material.
In addition, we will remove interpretative statements from the figure legend, in line with our response to General Comment 4, and ensure that interpretation is presented only in the main text. We will also explicitly state how SHAP values are ordered. Finally, we will replace or annotate the cosine-transformed Julian day axis to translate values back to calendar timing (e.g., seasons/months), so that the seasonal signal is more intuitive to interpret.
L198 - 208 & Figure 4: the text mentions an experiment using “the most influential drivers” and another using “a reduced subset of reanalysis‑based and easily accessible drivers.” In the figure the purple line is labelled “Testing with all drivers,” which appears to correspond to the reduced set described in the text. Either correct the label or clarify the distinction between the two experimental setups.
We thank the reviewer for pointing out this inconsistency. We will clarify the distinction between the two experimental setups in both the text and figure labels. The violet colour corresponds to simulations using all selected influential drivers, while the green colour corresponds to the reduced subset of reanalysis-based and easily accessible drivers.
L249: discuss the implications of predicting fDOM instead of total DOC. Is the former a simpler target, and does that affect the ecological relevance of the results?
We thank the reviewer for this comment. We will add text to the Discussion section explicitly addressing the implications of focusing on fDOM as the simulated variable rather than DOC. While both fDOM and DOC are widely used indicators of organic matter in lakes, they differ in their measurement principles and in the properties of organic matter they represent. We will clarify these differences and discuss the ecological relevance accordingly.
L273: good point, thank you for highlighting the limitation of dataset shift.
We thank the reviewer for pointing this out and appreciate the comment.
Figure A5: define all acronyms
All acronyms used in Figure A5 will be clearly defined in the revised manuscript.
References:
Molnar, C. (2025). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.). christophm.github.io/interpretable-ml-book/
-
AC1: 'Reply on RC1: Thelma Panaïotis', Daniel Mercado-Bettín, 01 Feb 2026
-
RC2: 'Comment on egusphere-2025-4049', Anonymous Referee #2, 12 Jan 2026
This study incorporated multiple machine learning approaches to identify the key drivers among 24 environmental variables to predict the fDOM in two contrasting lentic systems. This is overall an interesting study and its methodologies may be applied for future research in similar fields. Please find my detailed comments below:
Line 69: The word “also” is confusing. Do you mean both natural processes and human activities are important for Sau? If so, this sentence is not consistent with lines 67-68. If only human activity is the driver, “also” should be removed.
Lines 84- 85 and Figure 2 step (4): It is unclear why the second simulation should only use the reanalysis data, since you already know that the human drivers are also important, at least for Sau. Could you please explain the study objective or hypothesis? Also, please double-check if you did include the Julian day for the second simulation. If it was included, a purple rectangle that represents “Others – Julian day” and relevant text should be added in step 4 model 2 because the last two lines in the figure caption said so.
Lines 102-103: Was fDOM data for Lake Feeagh also corrected? Figures A2 and A3 only show the correction for Sau. Line 301: Please add the linear regression equation in Figure A2 and cite Figure A2 in A3.
Lines 106-107: “All input 107 data variables, including their respective units and source, are displayed in Step 1 of Figure 2.” should be moved to the beginning of this paragraph.
Lines 217-218: Besides boosting biological activity, the increased soil moisture also indicated the stronger soil-lake hydrological connectivity. More terrestrial DOM can be exported into the lake during wet periods.
Lines 257: the indices in Figure A5 should be spelled out instead of abbreviations or acronyms.
Figure 1: (1) For the “land cover“ legend, I do not think “organic rich soil” is one of the land cover classes in CORINE, and it is merely used as a land cover metric. Please explain which classes were accounted for in it, or replace it with a commonly used category. Does it refer to peatland according to line 72? (2)“Regular seasonality” is a vague description in the fourth line of the figure caption. What does “regular” mean? Does it have distinguished seasonality every year or have no seasonal difference every year? Both patterns can be “regular“.
Figure 2: The abbreviations in step 1 should be explained in the figure caption, such as the ERA, GLM, and GWLF.
Citation: https://doi.org/10.5194/egusphere-2025-4049-RC2 -
AC2: 'Reply on RC2: Anonymous Referee #2', Daniel Mercado-Bettín, 01 Feb 2026
We would like to thank the reviewers for their time in reviewing this manuscript. All the suggestions and comments have been carefully considered and will be taken into account in the revised version of the manuscript. Here, we propose actions to enhance our work, incorporating their feedback. We appreciate their time and the valuable insights provided during the revision of the manuscript.
Note:
In bold text: the comments, observations and suggestions of the reviewers.
In normal text: the responses of the authors of this manuscript.
This study incorporated multiple machine learning approaches to identify the key drivers among 24 environmental variables to predict the fDOM in two contrasting lentic systems. This is overall an interesting study and its methodologies may be applied for future research in similar fields. Please find my detailed comments below:
We thank the reviewer for the assessment. We appreciate the recognition of the applicability and reproducibility of our study, which are key objectives of this work. Below, we address the specific comments in detail.
Line 69: The word “also” is confusing. Do you mean both natural processes and human activities are important for Sau? If so, this sentence is not consistent with lines 67-68. If only human activity is the driver, “also” should be removed.
We will clarify, in the new version of the manuscript, that we intended to convey that both natural processes and human activities are important drivers at Sau, reflecting its more heavily intervened and populated catchment. We will revise the wording accordingly and ensure consistency between the referenced lines to avoid confusion.
Lines 84- 85 and Figure 2 step (4): It is unclear why the second simulation should only use the reanalysis data, since you already know that the human drivers are also important, at least for Sau. Could you please explain the study objective or hypothesis? Also, please double-check if you did include the Julian day for the second simulation. If it was included, a purple rectangle that represents “Others – Julian day” and relevant text should be added in step 4 model 2 because the last two lines in the figure caption said so.
We thank the reviewer for this thoughtful comment. The rationale for using only reanalysis-based drivers in the second simulation is directly linked to one of the main objectives of the study, namely to evaluate the scalability and reproducibility of the proposed modelling workflow. Specifically, this experiment was designed to test whether fDOM dynamics can be reasonably captured using only globally available data sources, without relying on site-specific or human-activity variables that are often difficult to obtain or unavailable in many regions.
We agree that human drivers can be important at specific sites, such as Sau. Our results show that a model trained exclusively with reanalysis-based variables can still capture a substantial part of DOM dynamics. This is because reanalysis variables, such as soil moisture and temperature, can implicitly reflect aspects of land use and human influence on catchment functioning (e.g. altered soil properties or hydrological responses in more urbanised or intensively managed catchments). As such, this second simulation is not intended to replace site-specific modelling, but rather to demonstrate the utility of the workflow in data-limited contexts.
Regarding the reviewer’s second point, we confirm that Julian day was included in the second simulation. We will revise Figure 2 and its caption accordingly by explicitly adding the “Others – Julian day” component to step (4), ensuring consistency between the figure and the text.
Lines 102-103: Was fDOM data for Lake Feeagh also corrected? Figures A2 and A3 only show the correction for Sau. Line 301: Please add the linear regression equation in Figure A2 and cite Figure A2 in A3.
We thank the reviewer for raising this point. For Feeagh, the fDOM data used in this study were corrected for the temperature quenching effect following previous work. The correction follows established methods for compensating temperature-induced quenching of CDOM/fDOM fluorescence sensors using the same fDOM Feeagh data, as described by Watras et al. (2011) and Ryder et al. (2012). These corrected data are the ones used here for the modelling analysis. In the earlier studies, the temperature correction was applied for different research objectives, but the corrected time series remains valid and appropriate for the present study. This is briefly mentioned in the Supplementary Material (lines 296–297). However, we agree that additional clarity would be beneficial. We will therefore expand the Supplementary Material to provide more detail on the temperature correction applied at Feeagh, including a short description of the correction approach, the coefficient used, and its basis in the published literature. The respective correction equation will be explicitly included in Figure A2 (Sau) and in the new figure for Feeagh. In addition, we will improve the clarity of the Supplementary Material by cross-referencing Figures A2 and A3 (Sau) and equivalent figures for Feeagh, respectively.
Lines 106-107: “All input 107 data variables, including their respective units and source, are displayed in Step 1 of Figure 2.” should be moved to the beginning of this paragraph.
We agree that this sentence provides a useful starting context and will move it to the beginning of the paragraph to improve clarity and flow.
Lines 217-218: Besides boosting biological activity, the increased soil moisture also indicated the stronger soil-lake hydrological connectivity. More terrestrial DOM can be exported into the lake during wet periods.
We thank the reviewer for highlighting this point. We agree that increased soil moisture can also indicate stronger soil–lake hydrological connectivity, particularly during flushing events triggered by high precipitation. We will expand this section of the Discussion to explicitly emphasise the role of enhanced hydrological connectivity in facilitating terrestrial DOM export to the lake.
Lines 257: the indices in Figure A5 should be spelled out instead of abbreviations or acronyms.
All indices shown in Figure A5 will be spelled out in full in the revised manuscript to improve clarity.
Figure 1: (1) For the “land cover“ legend, I do not think “organic rich soil” is one of the land cover classes in CORINE, and it is merely used as a land cover metric. Please explain which classes were accounted for in it, or replace it with a commonly used category. Does it refer to peatland according to line 72? (2)“Regular seasonality” is a vague description in the fourth line of the figure caption. What does “regular” mean? Does it have distinguished seasonality every year or have no seasonal difference every year? Both patterns can be “regular“.
We thank the reviewer for the careful reading of Figure 1. (1) Yes, that is correct—organic-rich soils is not a standard land-cover class in the CORINE classification. In both catchments, the overlap between the catchment boundaries and the CORINE Land Cover 2018 dataset resulted in a relatively large number of individual land-cover classes. To improve readability and facilitate comparison between sites, we grouped the original CORINE classes into a smaller number of categories (the five categories displayed in Figure 1) representing dominant catchment characteristics.
In this context, the category labelled organic-rich soils aggregates CORINE classes associated with high organic matter content, including peat bogs, moors and heathlands, and related classes. Similarly, CORINE classes such as Broad-leaved forest, Coniferous forest, and Mixed forest were grouped into a single category (Forest), and analogous groupings were applied to the remaining classes. This aggregation was used solely for visualisation and interpretative clarity in Figure 1 and is not intended as a replacement for the original CORINE classification.
To avoid ambiguity, we will add a table to the Supplementary Material explicitly listing the original CORINE land-cover classes identified in each catchment and showing how they were aggregated into the grouped categories used in the figure. We will also clarify this grouping and explicitly refer to it in the revised caption. As an example here, we show the full CORINE land-cover classification for Sau and its corresponding grouping: In Suplementary pdf.
(2) We agree that the term regular seasonality is vague in this context. We will revise the figure caption to remove this wording and replace it with a more precise description of the observed seasonal behaviour in Feeagh, which has a temperate maritime climate characterized by mild, changeable weather and 4 seasons that are rarely extreme.
Figure 2: The abbreviations in step 1 should be explained in the figure caption, such as the ERA, GLM, and GWLF.
We thank the reviewer for this comment. All abbreviations used in Step 1 of Figure 2, including ERA5 (ECMWF Reanalysis v5), GLM (General Lake Model), and GWLF (Generalised Watershed Loading Functions model), will be explicitly defined in the figure caption in the revised manuscript.
References:
Watras, C.J., Hanson, P.C., Stacy, T.L., Morrison, K.M., Mather, J., Hu, Y.H. and Milewski, P., 2011. A temperature compensation method for CDOM fluorescence sensors in freshwater. Limnology and Oceanography: Methods, 9(7), pp.296-301.
Ryder, E., Jennings, E., de Eyto, E., Dillane, M., NicAonghusa, C., Pierson, D.C., Moore, K., Rouen, M. and Poole, R., 2012. Temperature quenching of CDOM fluorescence sensors: temporal and spatial variability in the temperature response and a recommended temperature correction equation. Limnology and Oceanography: Methods, 10(12), pp.1004-1010.
-
AC2: 'Reply on RC2: Anonymous Referee #2', Daniel Mercado-Bettín, 01 Feb 2026
Data sets
Data used in the manuscript for the first study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/feeagh/data
Data used in the manuscript for the second study site Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main/sau/data
Model code and software
Codes used to obtain the results shown in the manuscript Daniel Mercado-Bettín https://github.com/danielmerbet/driver_attribution_fdom/tree/main
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 304 | 142 | 35 | 481 | 26 | 29 |
- HTML: 304
- PDF: 142
- XML: 35
- Total: 481
- BibTeX: 26
- EndNote: 29
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Disclaimer: I have strong expertise in machine learning but my primary background is in oceanography, so I am less familiar with the specific impacts and conclusions relevant to lake ecosystems. Consequently, my review emphasizes the technical and methodological aspects of the manuscript and provides fewer comments on the ecological interpretation of the results.
The study by Mercado-Bettín et al. investigates how dissolved organic matter dynamics in two contrasting freshwater lakes can be modeled using a set of machine‑learning algorithms. By assembling a comprehensive set of physicochemical and meteorological predictors, the authors train and compare several regression techniques (including linear models, random forests, support vector regression and three gradient‑boosting frameworks) to predict fluorescent dissolved organic matter (fDOM) concentrations. Their analysis explores which environmental drivers most strongly influence fDOM variability and evaluates model performance across the two lake systems, aiming to demonstrate that data‑driven approaches can capture the temporal patterns of organic matter turnover in inland waters. Furthermore, they show that reducing the set of predictors merely decreases the prediction performance of the ML model.
General comments
The manuscript is clearly written and follows a logical progression, which makes the authors’ objectives easy to grasp. Nonetheless, several critical steps are lacking to support the central claim.
First, the manuscript does not show whether the target variable was inspected for distributional anomalies such as skewness, outliers, or zero‑inflation before model fitting. A simple histogram or density plot of fDOM (and a note on any transformation applied) would let readers judge whether the data were appropriately conditioned.
Second, the choice of six machine‑learning models, including three gradient‑boosting implementations, is left unexplained. Because these three models are largely interchangeable, the authors should either justify retaining each (for example, to compare computational efficiency or regularisation strategies) or reduce the set to a smaller, well‑motivated collection, explicitly outlining the strengths and weaknesses of each algorithm and stating whether a broad model comparison is a declared aim of the study.
Third, hyper‑parameter tuning is mentioned but not described. The manuscript should specify which parameters were tuned for each model, the search space explored, the optimisation strategy (grid, random, Bayesian, etc.) and the validation split used. Detailing this process is essential for assessing model robustness and guarding against over‑fitting. An early subsection that summarises key performance metrics of the different models would also ground the subsequent discussion of variable importance in a known predictive skill.
All figures suffer from overly small text; increasing the font size for axes, legends, and annotations is essential for readability. In Figures 3 and 4 the legends contain interpretative statements, which blurs the line between description and analysis : legends should merely describe the visual content, leaving interpretation to the main text.
Finally, the code is shared in a reasonably structured way, but two key improvements would greatly enhance its usability. First, assigning a clear execution order, either by numbering the scripts or by providing a master driver script, would allow anyone to run the workflow sequentially without guesswork. Second, replacing absolute paths such as “~/Documents/intoDBP/driver_attribution_fdom/” with relative paths or a configurable settings file would make the repository portable across different machines and operating systems. Implementing these changes would substantially increase the rigor and reproducibility of the study.
Specific comments
Figure 1: the caption should define DOC; otherwise the figure nicely illustrates the workflow
L21-23: please indicate the direction of variation for each process (increase or decrease)
L59: insert the word “respectively”
L93-96: you mention a 2 minutes measurement resolution but also give a number of points that roughly matches the number of days between the dates. Clarify whether the data were averaged daily before analysis.
Figure 2: define the acronyms “GWLF” and “GLM” (they are only explained later, at lines 114‑116). Also define “NSE”; you introduce NSE as a model‑evaluation metric here but it does not appear elsewhere in the paper.
L126-127: see also Molnar (2025) for interpretable machine learning.
L128-135: the description of how the data were split into training and test sets, and how the models were evaluated, is unclear. Align this description with the "Prediction Workflow" paragraph and distinguish clearly between validation (hyperparameter optimisation) and testing (final performance).
L136: the term “statistical” is vague (all these models belong to supervised machine learning), and the choice of models needs justification. Explain why these particular algorithms were selected, why e.g. a neural network was not considered, and what the relative strengths and weaknesses of each method are. If three gradient‑boosting frameworks (XGBoost, LightGBM, CatBoost) are used, state the reason for testing all three rather than picking one.
L140-141: consider using permutation‑importance as an alternative that works across all models
L165: I would like to see a short “Model performance” subsection before the analysis of important drivers, so readers first see how well the models actually performed.
Figure 3: I’m confused by what you mean regarding the “four ML models that directly provide feature importance: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), CatBoost (CTB)”. These models do not provide direct feature importance, you have to compute it (that you did with node purity and gain contribution. On the opposite, a linear model directly provides feature importance, through its parameters (see Molnar (2025)).
Table 1: RMSE should be reported with its units, since it shares the same unit as the predicted variable. The fact that XGBoost yields a very poor performance (R² = 11%) compared to others (including a linear model with R² = 45%) suggests that the training could be largely improved. Which hyperparameters did you choose for the XGBoost? I dug in the code and found that only a few hundred trees (100-300) were used, which may be insufficient (but this also depends on the learning rate). Typically, boosting procedures require thousands of trees (see Hastie et al., 2009; Chapter 10). Moreover, the reported R² of 99 % on the training data in Table A1 strongly suggests overfitting.
Figure 4: mixing a partial‑dependence plot (PDP) for the Random Forest with SHAP values for CatBoost makes the interpretation confusing. Choose one model and interpret it thoroughly. Do not embed interpretation inside the legend. For the SHAP plot, indicate how the variables are ordered (e.g. by mean absolute SHAP value). Also, the cosine‑scaled Julian day axis should be accompanied by a note translating the cosine values back to calendar seasons, because “cos(julian day) = ‑0.5” is not intuitive.
L198 - 208 & Figure 4: the text mentions an experiment using “the most influential drivers” and another using “a reduced subset of reanalysis‑based and easily accessible drivers.” In the figure the purple line is labelled “Testing with all drivers,” which appears to correspond to the reduced set described in the text. Either correct the label or clarify the distinction between the two experimental setups.
L249: discuss the implications of predicting fDOM instead of total DOC. Is the former a simpler target, and does that affect the ecological relevance of the results?
L273: good point, thank you for highlighting the limitation of dataset shift.
Figure A5: define all acronyms
References
Hastie, T., Tibshirani, R., and Friedman, J.: The elements of statistical learning: data mining, inference, and prediction, Springer Science & Business Media, 2009.
Molnar, C. (2025). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.). christophm.github.io/interpretable-ml-book/