Diagnosing Dissolved Organic Carbon Simulation of SWAT-C model Using Machine Learning Approaches

Huang, Zehong; Chen, Shouzhi; Gong, Yufeng; Wang, Zheng; Duan, Zheng; Fu, Yongshuo H.

doi:10.5194/egusphere-2025-5503

Preprints

https://doi.org/10.5194/egusphere-2025-5503

Preprints

27 Dec 2025

| 27 Dec 2025

Diagnosing Dissolved Organic Carbon Simulation of SWAT-C model Using Machine Learning Approaches

Zehong Huang, Shouzhi Chen, Yufeng Gong, Zheng Wang, Zheng Duan, and Yongshuo H. Fu

Abstract. Dissolved organic carbon (DOC) plays a critical role in the terrestrial carbon cycle, and accurate simulation of its dynamics is essential for understanding carbon balance and climate change mitigation. However, DOC simulations still involve large uncertainty under complex environmental conditions. To address this challenge, we proposed a Module Diagnosis Framework (MDF) that quantitatively identifies the module-level sources of uncertainty in DOC modeling. The SWAT-MDF integrates the physically based SWAT-Carbon (SWAT-C) model with a data-driven module that employs machine learning algorithms and applies Shapley additive explanations (SHAP) and residual analysis to diagnose the uncertain source of DOC simulation in the Yalong River Basin.We found that the the data-driven module based on bidirectional long short-term memory (Bi-LSTM) networks achieved good performance for daily DOC predictions with an average NSE = 0.62 and R² = 0.67 while the original SWAT-C model yielded average NSE = 0.51 and R² = 0.61. Despite this improvement, the testing performance remains limited, suggesting that the main uncertainty arises from the structural limitations of SWAT-C and highlighting the need for further structural improvement and module-level diagnosis. The MDF results revealed that the carbon cycle module and pollutant transport module mainly regulated the magnitude and variation of DOC predictions in the original SWAT-C model, and the vegetation growth module and the carbon cycle module were major sources of DOC prediction uncertainties. We therefore proposed that further improvements in DOC prediction in the SWAT-C model should focus on the vegetation growth, carbon cycle modules. Our proposed SWAT-MDF framework significantly enhances the reliability of DOC simulations, and provides a quantitative basis for improving the SWAT-C model and offers a generalizable approach to module optimization in similar coupled modeling frameworks.

Received: 06 Nov 2025 – Discussion started: 27 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4851 KB)

Supplement (287 KB)

Download & links

Zehong Huang, Shouzhi Chen, Yufeng Gong, Zheng Wang, Zheng Duan, and Yongshuo H. Fu

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-5503', Anonymous Referee #1, 27 Jan 2026
Review on „Diagnosing Dissolved Organic Carbon Simulation of SWAT-C model Using Machine Learning Approaches“ by Huang et al 2026.
The authors of the article by Huang et al aim to improve the simulation of dissolved organic carbon (DOC) in the Yalong River Basin using a coupled modeling framework called SWAT-MDF. The study integrates a process-based hydrological model (SWAT-C) with a data-driven learning module implemented through machine learning algorithms. The authors used SHAP and residual analysis to conduct a comprehensive diagnosis of model components and identify the structural sources of uncertainty in DOC simulation.
The authors find that the Bi-LSTM-based calibration showed the most reliable performance in simulating DOC dynamics with an average NSE of 0.67, which improved the original calibrated SWAT-C results slightly (NSE of 0.51). The SHAP-based global interpretation identified DOC_Simulate, TOT_P, and PRE as the most important predictors of DOC, and a residual analysis revealed that LAI, RH, and DOC_Simulate were the most significant contributors to prediction errors.

General comments
Overall, the authors present an impressive amount of analysis and go very deep into the technical implications of their research. They surely provide a good piece of work here. As nice as this reads, I have two main concerns:
Underlying DOC processes: The article provides very little information on which processes are actually driving the DOC exports from the study area, nor does it provide any information about the trends of these exports, which seem to be relevant (Xu et al., 2024). The authors work with DOC loads, which are mainly driven by the streamflow of the catchment. Why not use the measured DOC concentration data? This would allow a better process analysis, in general and specifically for the model. So in a nutshell: Which processes drive the DOC export of the particular catchment and is this process also the relevant one in the model? Only then one can start thinking about improving the model performance.

Underlying method: The complex method builds on 72 measurement points in one catchment. I acknowledge the author's trials on training the machine learning methods before applying it on the measured data, but still, this is not a sufficiently rich dataset for a machine learning study (Kratzert et al., 2024).

Working on these general concerns would require going deeper into the rich literature body on DOC processes and how the export can be explained in general in this study catchment (before starting any of the modelling exercises). And it would require more data. As more data is always easy to ask for, but hard to get, why not focus on the observed discharge data as well? Or test the method in more catchments with DOC concentration observations?
At the current state, I can not recommend this manuscript for publication.

Specific comments
There is a lack of discussion of the limitations of this study. The article mentions some limitations, but it does not provide a comprehensive discussion of the potential limitations and biases of the research.
Overemphasis on technical details: The article focuses heavily on the technical details of the methodology and models used, which may be of interest to experts in the field but may not be as relevant to a broader audience. It would rather benefit from studying the processes being actually relevant for the DOC export. Where does it come from, why does it get exported, and is there a trend? If so, can the model reproduce the trend? There are not that many studies out there predicting DOC exports in streams, so why make it overly complicated with the additional layer of machine learning?
Limited discussion of practical implications: The article does not provide a clear discussion of the practical implications of the research findings, such as how they can be applied in real-world settings, how other researchers can benefit from the shown results?
Lack of transparency in data analysis: The article does not provide a clear description of the data analysis procedures used, which may make it difficult for readers to understand how the results were obtained. There is some information hidden in the supplement material, and the authors also provide their source code (which I highly appreciate), still it remains unclear at key points what has actually been done. There are no results shown of the sensitivity analysis and there is no information on the model calibration (e.g., which algorithm has been used, how many model runs were performed, which performance criterion has been used for the calibration, etc.).
Limited discussion of uncertainty: The article does not provide a clear discussion of the uncertainty associated with the research findings, there are too many places confusions between uncertainty and a lack of model performance. After reading the article, I have the feeling that the authors interpret uncertainty as some sort of reason for decreasing model performance. However, one should be specific here. There are several sources of uncertainty and they should always be clearly stated. Are we talking about input data uncertainty, model process uncertainty, model parameter uncertainty, model structure uncertainty, etc.?
Lack of clear conclusions and recommendations: The article does not provide clear conclusions and recommendations based on the research findings. What is the main benefit of applying this additional layer of machine learning? What new knowledge can be gained here and in general? There are some elements of this in the manuscript already, but I think a separate section in the discussion and putting the results in the context of available literature would help the readers.

Technical corrections
L46: I do not see the argument why the lack of observation data justifies the need for DOC modelling.
L60: What is meant by “incomplete module designs and vague parameter representations”? Please provide examples.
L63: Please provide references to this statement.
L66: Please state again what is meant by the “coupled modelling approach”.
L68: Which “variables” are meant here?
L75: Provide reference
Line 107: Please provide the runoff amount also as an annual average sum in mm to allow for comparison with other catchments.
Line 108: What exactly is meant by particulate matter? How was it measured? Please provide a reference.
Line 110: Please provide an overview of the temperature in the study area as well, to give the reader an impression of the local climate.
L127: State the temporal resolution of the measured DOC data.
L129: Sentence out of context.
Table 1: Please state the year for the sources of Runoff and Sediment data. Please also try to provide a link where information about this source can be found.
Table 1: The reference of Xu et al 2024 for DOC data does not provide any daily observations, as the table implies here. Xu et al measured monthly, and increased the sampling to weekly in the monsoon season. Additionally, I have my doubts that there are daily sediment observations, which can not be double checked however, as there is no reference.
Table 2: There are no parameters shown here, even though the table title suggests it. I think those are model in and outputs? And Table S4 is incomplete as it shows “etc.” at some points, e.g. at the core of the study, the DOC simulation. As such it is not possible to judge whether the selection of the parameters is meaningful. I have my doubts, e.g. because why is the meteorological forcing part of the calibration? I do not see any justification for this in the manuscript.
Table S3: What is YLR? Wouldn’t it be interesting to study here which model processes are relevant for DOC export?
L162: What is meant with “Bayesian optimization was applied”? Which algorithm was applied and on what? Keep in mind that Bayesian optimization is not only used in Machine Learning, it can also be used for model calibration, so it is important to state specifically what has been done.
Structure: I think it would be more intuitive to explain SWAT-C first (currently chapter 2.4) and then the additional layer of SWAT-MDF (currently chapter 2.3)
Chapter 2.3: This chapter needs to be better explained and better linked with Figure 2. Why are there so many transfers necessary? As this method is not a standard approach in hydrology (yet), try to keep it understandable for readers who are not familiar with it.
Line 227: There is no information on the sensitivity analysis in the supplement material. It is only stated that it was performed, without providing any details. Please provide those details.
Figure 3: What is meant by the unit DOC/kg? A kilogram of what? Sediment or discharge, or? Is this supposed to quantify the DOC load? Why not work with DOC concentration? I doubt that any DOC-related process can be fitted by working with DOC loads, as it seems mainly discharge driven (Figure 3e).
Figure 5 and 6: Why is the LAI the least important in Figure 5 and the most important in Figure 6b?

References
Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrology and Earth System Sciences, 28, 4187–4201, https://doi.org/10.5194/hess-28-4187-2024, 2024.
Xu, S., Li, S.-L., Bufe, A., Klaus, M., Zhong, J., Wen, H., Chen, S., and Li, L.: Escalating Carbon Export from High-Elevation Rivers in a Warming Climate, Environ. Sci. Technol., 58, 7032–7044, https://doi.org/10.1021/acs.est.3c06777, 2024.
Citation: https://doi.org/10.5194/egusphere-2025-5503-RC1
- AC1: 'Reply on RC1', zehong huang, 16 Mar 2026
  
  Response to Referee #1
  [Comment 1] The authors of the article by Huang et al aim to improve the simulation of dissolved organic carbon (DOC) in the Yalong River Basin using a coupled modeling framework called SWAT-MDF. The study integrates a process-based hydrological model (SWAT-C) with a data-driven learning module implemented through machine learning algorithms. The authors used SHAP and residual analysis to conduct a comprehensive diagnosis of model components and identify the structural sources of uncertainty in DOC simulation.
  The authors find that the Bi-LSTM-based calibration showed the most reliable performance in simulating DOC dynamics with an average NSE of 0.67, which improved the original calibrated SWAT-C results slightly (NSE of 0.51). The SHAP-based global interpretation identified DOC_Simulate, TOT_P, and PRE as the most important predictors of DOC, and a residual analysis revealed that LAI, RH, and DOC_Simulate were the most significant contributors to prediction errors.
  [Response 1] We thank the referee for the supportive comments. Please see below our responses to each comment.
  General Comment:
  [Comment 2] Underlying DOC processes: The article provides very little information on which processes are actually driving the DOC exports from the study area, nor does it provide any information about the trends of these exports, which seem to be relevant (Xu et al., 2024). The authors work with DOC loads, which are mainly driven by the streamflow of the catchment. Why not use the measured DOC concentration data? This would allow a better process analysis, in general and specifically for the model. So in a nutshell: Which processes drive the DOC export of the particular catchment and is this process also the relevant one in the model? Only then one can start thinking about improving the model performance.
  [Response 2] We thank the reviewer for this constructive comment. In the revised manuscript, following the reviewer's suggestions, we supplemented the trend analysis of dissolved organic carbon export in Section 3.2, please see the details in our response to comment#5. We completely replaced the target variable from load to concentration, please see the details in our response to comment#29. Furthermore, we supplemented the model driving mechanisms in the discussion Section 4.1, please see the details in our response to comment#23.
  [Comment 3] Underlying method: The complex method builds on 72 measurement points in one catchment. I acknowledge the author's trials on training the machine learning methods before applying it on the measured data, but still, this is not a sufficiently rich dataset for a machine learning study (Kratzert et al., 2024).
  [Response 3] We thank for referee for these thoughtful comments and suggestions.. We clarified that machine learning inputs are SWAT-C calibrated feature variables and the limited data are used solely for fine tuning. We added four years of runoff data to more accurately calibrate the SWAT-C model:
  “DOC observations from 2013-2014 were used as training data, with 20% randomly withheld for validation, while observations from 2019-2020 were used for testing. Because DOC observations are limited, the ML models were not trained directly from observations alone. Instead, the input features were mainly derived from long-term simulated feature parameters generated by the SWAT-C model. Given the limited availability of DOC measurements, a transfer learning strategy was adopted. First, models were pretrained on long-term simulated feature parameters from the SWAT-C model (1972-2020) to establish fundamental relationships between inputs and DOC outputs. Second, the pretrained models were fine-tuned using observed data to better match the true data distribution. In this way, the long-term SWAT-C simulations provide physically consistent training features, while the observed DOC data are primarily used to adjust the model to local conditions. This process mitigated overfitting caused by data scarcity by leveraging generalized patterns learned from simulations.
  Please see the supplementary material for the corresponding figures.
  Figure S3. Comparison between observed and simulated daily streamflow used for calibration of the SWAT-C model. ”
  Specific Comments
  [Comment 4] There is a lack of discussion of the limitations of this study. The article mentions some limitations, but it does not provide a comprehensive discussion of the potential limitations and biases of the research.
  [Response 4] We thank the reviewer for this comment. We expanded Section 4.4 to comprehensively discuss the study limitations and specific uncertainty sources, as detailed in our response to comment#8.
  [Comment 5] The article focuses heavily on the technical details of the methodology and models used, which may be of interest to experts in the field but may not be as relevant to a broader audience. It would rather benefit from studying the processes being actually relevant for the DOC export. Where does it come from, why does it get exported, and is there a trend? If so, can the model reproduce the trend? There are not that many studies out there predicting DOC exports in streams, so why make it overly complicated with the additional layer of machine learning?
  [Response5] We thank the referee for this helpful comment. We added Section 3.2 to analyze the spatial distribution and long term temporal trends of dissolved organic carbon export. Furthermore, we clarified the role of the machine learning component as a diagnostic tool grounded in physical processes in the revised manuscript:
  “3.2. Spatiotemporal evolution characteristics of simulated DOC
  The simulated DOC concentrations in the Yalong River Basin using the SWAT-C model demonstrate significant spatial heterogeneity and pronounced seasonal temporal dynamics(Fig. 4). Spatially, the DOC concentration exhibits a general increasing trend from the northwestern headwaters to the southeastern middle and lower reaches. The northwestern headwater region is characterized by relatively low concentrations, whereas the southeastern downstream region constitutes a high-value zone, with localized maximum concentrations reaching up to 1.75 mg/L. A comparison of the spatial distribution patterns across the years 2000, 2010, and 2020 reveals a gradual contraction in the spatial extent of the high-concentration zones in the lower reaches, accompanied by a trend toward concentration homogenization over time.
  Temporally, Figure 4b illustrates the long-term variations in DOC concentrations spanning the period from 2000 to 2020. The monthly concentration curve reveals substantial intra-annual fluctuations, with concentration peaks occurring consistently during the monsoon season, reaching maximum values exceeding 1.8 mg/L. During the non-monsoon season, the DOC concentration recedes and stabilizes at a baseline level of approximately 0.6 mg/L. The long-term trend line, indicative of interannual variability, demonstrates that the mean annual DOC concentration fluctuates mildly within the range of 0.8 to 1.1 mg/L, notably experiencing a distinct trough period around the year 2012.
  Please see the supplementary material for the corresponding figures.
  Figure 4. Spatiotemporal evolution of simulated DOC concentrations in the Yalong River Basin. (a) Spatial distribution patterns of DOC concentrations for the representative years 2000, 2010, and 2020. (b) Long-term temporal dynamics of DOC concentrations from 2000 to 2020, illustrating both the monthly fluctuations and the interannual trend.”
  [Comment 6] Limited discussion of practical implications: The article does not provide a clear discussion of the practical implications of the research findings, such as how they can be applied in real-world settings, how other researchers can benefit from the shown results?
  [Response 6] We thank the reviewer for this comment. We expanded Section 4.4 to discuss the practical value of the SWAT-MDF framework as a diagnostic tool for identifying model structural bottlenecks and serving as a transferable reference for developing future carbon modules in the revised manuscript:
  “More importantly, the SWAT-MDF framework offers a systematic diagnostic tool for identifying performance bottlenecks in process-based models. Such diagnostic capability has practical value for model development, as it helps researchers identify which modules or environmental drivers contribute most to simulation errors and therefore where model improvements should be prioritized. Beyond improving DOC simulation in the SWAT-C model, the proposed strategy also shows potential for application in other hydrological and biogeochemical modeling contexts. In principle, the framework is not restricted to SWAT-C and could be extended to models such as SWAT+. Although SWAT+ represents the latest version of the SWAT family, carbon-related functionalities comparable to SWAT-C are still under development. In this context, the SWAT-MDF framework may serve as a transferable diagnostic reference to support the development and validation of future SWAT+ carbon modules and to assist researchers in evaluating and improving carbon simulation modules in other watershed models.”
  [Comment 7] Lack of transparency in data analysis: The article does not provide a clear description of the data analysis procedures used, which may make it difficult for readers to understand how the results were obtained. There is some information hidden in the supplement material, and the authors also provide their source code (which I highly appreciate), still it remains unclear at key points what has actually been done. There are no results shown of the sensitivity analysis and there is no information on the model calibration (e.g., which algorithm has been used, how many model runs were performed, which performance criterion has been used for the calibration, etc.).
  [Response 7] We thank the reviewer for this comment. We clarified the calibration workflow using the SUFI 2 algorithm in SWAT CUP and incorporated parameter sensitivity results, as detailed in our response to comment#28.
  [Comment 8] Limited discussion of uncertainty: The article does not provide a clear discussion of the uncertainty associated with the research findings, there are too many places confusions between uncertainty and a lack of model performance. After reading the article, I have the feeling that the authors interpret uncertainty as some sort of reason for decreasing model performance. However, one should be specific here. There are several sources of uncertainty and they should always be clearly stated. Are we talking about input data uncertainty, model process uncertainty, model parameter uncertainty, model structure uncertainty, etc.?
  [Response 8] We thank the reviewer for this comment. We expanded Section 4.4 to explicitly differentiate model performance limitations from specific sources of uncertainty, including input data, model parameters, and model structural uncertainty in the revised manuscript:
  “Despite these contributions, several limitations and sources of uncertainty should be acknowledged. First, long term DOC observations in the study basin remain limited, which introduces input data uncertainty and constrains the direct training of data driven models, thereby necessitating the use of simulated variables from the SWAT-C model as input features. Second, parameter uncertainty may arise from the calibration process of the SWAT-C model, as different parameter combinations may yield similar model performance. Third, the current SWAT-C structure simplifies several carbon related processes, such as vegetation derived carbon inputs, soil organic matter decomposition, and their interactions with hydrological transport, which may introduce model structural uncertainty in DOC simulations. Although these uncertainties may influence the interpretation of the results, the combined SHAP and residual analysis still provides a useful diagnostic perspective for identifying key drivers and potential structural weaknesses in the model, offering valuable guidance for future process refinement and model development.”
  [Comment 9] Lack of clear conclusions and recommendations: The article does not provide clear conclusions and recommendations based on the research findings. What is the main benefit of applying this additional layer of machine learning? What new knowledge can be gained here and in general? There are some elements of this in the manuscript already, but I think a separate section in the discussion and putting the results in the context of available literature would help the readers.
  [Response 9] We thank the reviewer for this comment. We expanded Section 4.2 to clarify the specific benefits of the machine learning component and highlighted new insights gained from SHAP and residual analyses in the context of watershed carbon modeling in the revised manuscript:
  "Coupling data-driven learning with the process-based SWAT-C model enables these two approaches to complement each other. Within the SWAT-MDF framework, the SWAT-C model provides physically meaningful feature parameters representing watershed processes, which serve as informative inputs for the ML component. The data-driven module then learns nonlinear relationships between these variables and DOC outputs, allowing the framework to capture complex DOC dynamics under varying hydrological and environmental conditions. This hybrid strategy does not merely aim to improve predictive performance; instead, it introduces a data-driven diagnostic layer that helps reveal nonlinear relationships and identify structural bottlenecks within the SWAT-C model.”
  Response to Technical corrections
  [Comment 10] L46: I do not see the argument why the lack of observation data justifies the need for DOC modelling.
  [Response 10] We thank the reviewer for this comment. We clarified that modeling serves to provide continuous spatiotemporal estimates of carbon dynamics across the basin to complement limited observational data in the revised manuscript:
  “Currently, watershed-scale DOC monitoring predominantly relies on water-quality sampling. However, the limited spatial distribution of fixed monitoring stations and the lack of long-term observational data hinder comprehensive characterization of DOC’s spatiotemporal variability (Wang et al., 2025). In this context, process-based modeling approaches can complement field observations by providing spatially and temporally continuous estimates of DOC dynamics, facilitating spatiotemporal analysis of DOC variability and enabling investigation of the underlying hydrological and biogeochemical processes at the basin scale. “
  [Comment 11] L60: What is meant by “incomplete module designs and vague parameter representations”? Please provide examples.
  [Response 11] We thank the reviewer for this comment. We clarified this statement by providing concrete examples of structural simplifications, such as simplified soil organic matter decomposition and vegetation carbon input representations, in the revised manuscript:
  “In addition, structural simplifications in certain model components and uncertainties in parameter representations may further increase prediction uncertainty. For example, the representation of vegetation-derived carbon inputs, soil organic matter decomposition, and DOC mobilization during hydrological transport is often simplified in process-based watershed models(Qi et al., 2020a). Moreover, several parameters controlling these processes are empirical or poorly constrained by observations, which may introduce additional uncertainty in DOC simulations. ”
  [Comment 12] L63: Please provide references to this statement.
  [Response12] Thank you for pointing this out. In the revised manuscript, we have added an appropriate reference to support this statement.
  [Comment 13] L66: Please state again what is meant by the “coupled modelling approach”.
  [Response 13] We thank the reviewer for this comment. We removed the term and emphasized the need for quantitative diagnostic frameworks to identify key processes and uncertainties in process based watershed models in the revised manuscript:
  “Therefore, developing quantitative module diagnosis frameworks is essential for identifying the key processes and sub-modules controlling DOC simulations. Such frameworks can help reveal the sources of model uncertainty and provide guidance for targeted improvements in process-based watershed models.”
  [Comment 14] L68 :Which “variables” are meant here?
  [Response 14] We thank the reviewer for this comment. We clarified this sentence by specifying that the variables refer to hydrological, meteorological, and carbon related factors influencing carbon dynamics in the revised manuscript:
  “In recent years, machine learning (ML) has demonstrated substantial potential in hydrological modeling, particularly for capturing complex nonlinear interactions among hydrological, meteorological, and carbon-related variables influencing watershed processes (Fan et al., 2020).”
  [Comment 15] L75:Provide reference
  [Response 15] Thank you for pointing this out. In the revised manuscript, we have added an appropriate reference to support this statement.
  [Comment 16] L107:Please provide the runoff amount also as an annual average sum in mm to allow for comparison with other catchments.
  [Response 16] We thank the reviewer for this comment. We added the annual average runoff depth in mm based on the mean discharge and basin area to facilitate comparison with other catchments in the revised manuscript:
  “The average annual runoff of the basin is 1,914 m³/s, corresponding to an annual runoff depth of about 444 mm yr-1.”
  [Comment 17] L108:What exactly is meant by particulate matter? How was it measured? Please provide a reference.
  [Response 17] We thank the reviewer for this comment. We clarified that particulate matter refers to suspended sediment and added references detailing the measurement methods and data sources from the hydrological station in the revised manuscript:
  “the average annual export of suspended sediment matter reaches approximately 2.55*10¹⁰ kg, based on long-term observations from hydrological stations in the basin(Liu et al., 2019).”
  [Comment 18] L110:Please provide an overview of the temperature in the study area as well, to give the reader an impression of the local climate.
  [Response 18] Following the referee’s suggestion, we added a description of the temperature conditions to provide a clearer overview of the regional climate in the revised manuscript: “The mean annual temperature in the basin ranges from −4.9℃ to 19.7℃, decreasing from south to north and with increasing elevation. ”
  [Comment 19] L127:State the temporal resolution of the measured DOC data.
  [Response 19] Following the referee’s suggestion, we clarified the temporal resolution and nature of the DOC observations as discrete measurements collected on specific sampling dates in the revised manuscript: “DOC observations for 2013–2014 and 2019–2020 were derived from discrete DOC measurements collected on specific sampling dates at the Tongzilin Hydrological Station, as reported by Xu et al., (2024).”
  [Comment 20] L129:Sentence out of context.
  [Response 20] Following the referee’s suggestion, we removed the disconnected sentence to improve the clarity and coherence of the text in the revised manuscript.
  [Comment 21] Table 1:Please state the year for the sources of Runoff and Sediment data. Please also try to provide a link where information about this source can be found.
  [Response 21] Following the referee’s suggestion, we added the temporal coverage and specified the data source as the Hydrological Yearbook of the People's Republic of China, as detailed in our response to comment#22.
  [Comment 22] Table 1:The reference of Xu et al 2024 for DOC data does not provide any daily observations, as the table implies here. Xu et al measured monthly, and increased the sampling to weekly in the monsoon season. Additionally, I have my doubts that there are daily sediment observations, which can not be double checked however, as there is no reference.
  [Response 22] Following the referee’s suggestion, we clarified that DOC data consist of discrete daily measurements on specific sampling dates and specified the official source for daily sediment observations as the Hydrological Yearbook of the People's Republic of China in the revised manuscript:
  Please see the supplementary material for the corresponding tables.
  
  [Comment 23] Table 2:There are no parameters shown here, even though the table title suggests it. I think those are model in and outputs? And Table S4 is incomplete as it shows “etc.” at some points, e.g. at the core of the study, the DOC simulation. As such it is not possible to judge whether the selection of the parameters is meaningful. I have my doubts, e.g. because why is the meteorological forcing part of the calibration? I do not see any justification for this in the manuscript.
  
  [Response 23] We thank the referee for this helpful comment, we clarified Table 2 lists simulated state variables as machine learning features rather than calibration parameters, expanded Table S4 with precise source code locations, and justified this variable selection by linking them to physical processes within the newly added Section 4.1 in the revised manuscript:
  
  “Table 2. SWAT-C simulated variables used as machine learning feature variables
  
  Please see the supplementary material for the corresponding tables.
  
  Note: The specific locations of the SWAT-C modules in the source code are provided in Table S4 in the Supplementary Materials. All feature parameters were derived from the calibrated SWAT-C model.
  
  4.1.Spatiotemporal variation of dissolved organic carbon and its driving mechanisms
  
  The spatiotemporal evolution of DOC concentrations in the Yalong River Basin provides a direct reflection of how climatic and hydrological conditions drive the carbon cycle. Spatially, DOC concentrations exhibit a progressive increase from the northwest to the southeast, aligning closely with the hydrothermal gradients of the basin. The relatively warm and humid climate in the southeastern region not only promotes vegetation productivity but also enhances microbial activity in decomposing organic matter, thereby increasing local carbon release(Davidson and Janssens, 2006). This elevated local supply, combined with the downstream accumulation of streamflow, collectively shapes the high concentration zones observed in the downstream reaches. Temporally, the seasonal distribution of precipitation governs the annual fluctuations of DOC(Blaurock et al., 2025). Concentrated rainfall during the monsoon season triggers a pronounced flushing effect, rapidly leaching soil DOC into the river channel and resulting in significant concentration peaks(Yan et al., 2024).
  
  Whether through spatial biogeochemical control by hydrothermal conditions and vegetation or temporal physical flushing by precipitation, these actual catchment processes driving DOC export are accurately represented within the underlying computing equations of the SWAT-C model. As illustrated in Figure 8, the model characterizes these complex physical and biochemical processes by partitioning them into four core stages, thereby clarifying the specific physical significance of each key feature parameter (as detailed in Table 2) within the model code. This integration of natural phenomena with physical equations demonstrates that the selected feature parameters possess a robust mechanistic foundation.
  
  In the specific computational logic of the model, (1) LAI and (2) BIOM within the vegetation growth and carbon input stage reflect the control of plant growth by the vegetation and biomass modules; these variables determine the quantity of litter and plant residues, providing the initial carbon source for DOC generation(Ji et al., 2022). In the dynamic environmental regulation stage, meteorological forcing conditions including (3) TMAX/TMIN, (5) SOLAR, (7) PCP, and (8) WIND are coupled with (4) SW. Within the underlying code, these are converted into dynamic regulatory factors that continuously modulate the rate of microbial decomposition(Davidson and Janssens, 2006). In the hydrological processes and dissolved transport stage, the water balance calculated by (9) FLOW_OUT and (6) ET, combined with the (12) DOC simulation results, jointly drives the leaching and routing of soil carbon into the river network(Laudon et al., 2011). Furthermore, in the soil erosion and associated transport stage, (11) SED_OUT and (10) TOT_P simulate soil erosion processes, quantifying the amount of organic carbon that enters the river while attached to sediment particles(Galy et al., 2007). The complete closed loop formed by these four stages ensures that the model accurately captures the comprehensive impacts of environmental changes on the total carbon.
  
  Please see the supplementary material for the corresponding figures.
  
  Figure 8. Conceptual diagram of physical mechanisms for DOC in the SWAT-C model. Numbers (1) through (12) correspond to the key feature parameters and their specific roles in the catchment DOC export process.
  
  Table S4. Source code locations of the selected SWAT-C process variables
  
  Please see the supplementary material for the corresponding tables.
  
  [Comment 24] Table S3:What is YLR? Wouldn’t it be interesting to study here which model processes are relevant for DOC export?
  
  [Response 24] Following the referee’s suggestion, we updated the abbreviation YLR to Yalong River in Table S3 and added a mechanistic analysis of relevant model processes for dissolved organic carbon export alongside a conceptual diagram in Section 4.1, as detailed in our response to comment#23.
  
  [Comment 25] L162:What is meant with “Bayesian optimization was applied”? Which algorithm was applied and on what? Keep in mind that Bayesian optimization is not only used in Machine Learning, it can also be used for model calibration, so it is important to state specifically what has been done.
  
  [Response 25] Following the referee’s suggestion, we clarified that Bayesian optimization via the Tree structured Parzen Estimator algorithm was applied strictly to machine learning hyperparameter tuning rather than physical model calibration, and listed the optimal values in Table S2 in the revised manuscript:
  
  “To ensure model robustness and prevent overfitting during this fine-tuning phase, Bayesian optimization utilizing the Tree-structured Parzen Estimator (TPE) algorithm was applied strictly to tune the hyperparameters of the machine learning models. This optimization process targeted the minimization of validation loss by systematically searching the parameter space. The finalized optimal hyperparameter values are presented in Table S2. This established a stable and reliable foundation for the subsequent SHAP-based and residual analyses.”
  
  [Comment 26] Structure :I think it would be more intuitive to explain SWAT-C first (currently chapter 2.4) and then the additional layer of SWAT-MDF (currently chapter 2.3)
  
  [Response 26] Following the referee’s suggestion, we reorganized the methodology to introduce the SWAT-C model in Section 2.3 before presenting the machine learning SWAT-MDF framework in Section 2.4, and revised the transitional paragraphs for coherent logical progression in the revised manuscript.
  
  [Comment 27] 2.3:This chapter needs to be better explained and better linked with Figure 2. Why are there so many transfers necessary? As this method is not a standard approach in hydrology (yet), try to keep it understandable for readers who are not familiar with it.
  
  [Response 27] We thank the reviewer for this constructive suggestion. We clarified that transfer learning prevents overfitting on sparse observations by pretraining machine learning models on massive physical simulation data to capture mechanisms before fine tuning to correct biases, and explicitly linked this workflow to Figure 2 in the revised manuscript:
  
  “Because deep learning methods are not yet standard approaches for catchment DOC modeling when observed field data are highly limited, a transfer learning strategy was implemented to bridge the gap between abundant physical simulations and sparse field observations. During the transfer process, the models were initially pretrained using massive long term simulation outputs from the SWAT-C model. This vital pretraining step forces the machine learning algorithms to learn the fundamental physical and hydrological baseline patterns of the basin. Subsequently, the models were finely tuned using the limited observed DOC data. This essential transfer is necessary because it allows the algorithms to correct the systematic biases of the physical model using real world observations while preserving the physical logic learned during pretraining. ”
  
  [Comment 28] L227:There is no information on the sensitivity analysis in the supplement material. It is only stated that it was performed, without providing any details. Please provide those details.
  
  [Response 28] We thank the reviewer for pointing out this omission. We added the methodological details of the global sensitivity analysis using the SUFI-2 algorithm within SWAT-CUP and summarized the calibrated parameters alongside their t statistic and p value in Table S3 and Text S1 in the revised manuscript:
  
  “The 2022 release of the SWAT-C model was employed in this study (Yang and Zhang, 2016; Zhang et al., 2013). Model calibration and global sensitivity analysis were conducted using the SUFI-2 algorithm implemented in SWAT-CUP. Model performance during calibration was evaluated using three widely used statistical indicators, including the coefficient of determination (R²), Nash–Sutcliffe efficiency (NSE), and Kling–Gupta efficiency (KGE). Parameter sensitivity was quantified based on the T-statistic and corresponding P-value generated by the SUFI-2 algorithm. The final parameter values and their sensitivity statistics (T-stat/P-value) for the Yalong River Basin are summarized in Table S3.
  
  Table S3. Sensitivity analysis and calibration results of SWAT-C parameters.
  
  Please see the supplementary material for the corresponding tables.
  
  [Comment 29] Figure 3: What is meant by the unit DOC/kg? A kilogram of what? Sediment or discharge, or? Is this supposed to quantify the DOC load? Why not work with DOC concentration? I doubt that any DOC-related process can be fitted by working with DOC loads, as it seems mainly discharge driven (Figure 3e).
  
  [Response 29] We thank the reviewer for pointing out this critical issue. We replaced the simulation target from dissolved organic carbon load to concentration, retrained all machine learning models, and updated all related figures and statistics based on the new concentration data in the revised manuscript:
  
  “In contrast, the model's ability to simulate daily DOC was relatively limited, with lower performance (calibration: NSE = 0.57, R2 = 0.75, KGE = 0.74; validation: NSE = 0.44, R2 = 0.71, KGE = 0.81). Specifically, the model tended to overestimate DOC concentrations during certain high-flow periods, while underestimation was observed under low-flow conditions. This flow-dependent bias contributed to the overall reduction in model performance at the daily scale.
  
  Please see the supplementary material for the corresponding figures.
  
  Figure 3. Simulation performance of the SWAT-C model. Line plots (a) streamflow (Q), (c) sediment yield (SED), and (e) dissolved organic carbon (DOC). Scatter plots (b), (d), and (f) show the correlations between observed and simulated values.”
  
  [Comment 30] Figure5 and 6:Why is the LAI the least important in Figure 5 and the most important in Figure 6b?.
  
  [Response 30] We thank the reviewer for this observation. We clarified that Figure 5 measures direct driving effects on predicted concentration via feature importance, whereas Figure 6b measures structural uncertainty via residual attribution analysis, identifying the LAI vegetation growth module as the largest source of simulation error in the revised manuscript:
  
  “The vegetation growth module, represented by LAI, exhibited low importance in the global SHAP analysis but emerged as a dominant factor in the residual analysis. To investigate this discrepancy, a single feature SHAP analysis was conducted for LAI (Text S4 in the Supplementary Materials), revealing a negative effect on the model output. Because LAI possesses relatively low temporal variability and provides a stable numerical contribution over time, its global SHAP importance remained limited in determining the absolute dissolved organic carbon concentration. However, the exceptionally high residual contribution associated with LAI indicates severe structural uncertainties within the vegetation growth module. This explicit contrast highlights that while vegetation dynamics do not numerically dominate the prediction, the underlying physical mechanism representation requires targeted refinement to improve the overall simulation reliability.”
  
  Citation: https://doi.org/10.5194/egusphere-2025-5503-AC1
RC2:
'Comment on egusphere-2025-5503', Anonymous Referee #2, 28 Feb 2026
This manuscript proposes a hybrid framework (SWAT-MDF) that combines the physically based SWAT-C model with machine-learning approaches and SHAP-based interpretation to diagnose module-level uncertainty in dissolved organic carbon (DOC) simulations. The topic is relevant, as DOC modelling remains challenging and improved diagnostic tools for process-based models are indeed needed.
However, in its current form, the study does not convincingly demonstrate sufficient methodological rigor nor provide results that are practically useful or actionable. While the conceptual idea is potentially interesting, the implementation raises substantial concerns regarding conceptual consistency, experimental design, interpretability, validation strategy, and scientific contribution. Consequently, several conclusions appear overstated relative to the evidence presented.
Major comments
1. Conceptual framework and interpretation
The central claim of the manuscript is that machine learning can be used to diagnose structural uncertainty in the SWAT-C model. However, the proposed workflow effectively trains a machine-learning surrogate model using SWAT outputs as predictors and observed DOC as targets (Section 2.3). This creates several conceptual issues.
First, the ML model learns statistical relationships between SWAT outputs and observations, rather than diagnosing model structure itself. The authors assume that selected SWAT-C outputs represent individual model modules, and therefore that higher feature importance in the ML model implies higher uncertainty or relevance of the corresponding module. This logical step is not sufficiently justified.
For example:
Simulated DOC (DOC_Simulate) is a final model output resulting from multiple interacting modules and processes. It cannot uniquely represent the carbon cycle module alone.

The manuscript concludes that the vegetation (LAI) module introduces structural uncertainty in DOC simulations. However, LAI is an independent state variable that can be calibrated and validated separately. High statistical importance or residual contribution does not necessarily imply structural deficiencies in that module.

Therefore, SHAP importance reflects feature usefulness for the ML predictor, not causal deficiencies of SWAT modules. The manuscript does not provide theoretical or empirical evidence demonstrating that feature attribution can be translated into module-level structural errors.
A related issue arises in the SHAP results. The analysis identifies DOC_Simulate as the dominant feature. This outcome is expected because DOC_Simulate is already the SWAT-simulated DOC output and therefore strongly correlated with observations. This result is essentially tautological rather than diagnostic. In fact, it may indicate that SWAT-C already captures a substantial portion of DOC variability. Consequently, the conclusion that the carbon cycle module is the primary uncertainty source is not convincingly supported.
Finally, the study concludes that future improvements should focus on the carbon cycle module, vegetation module, and relative humidity inputs. These recommendations remain generic and largely consistent with existing DOC modelling literature. No specific parameters, process representations, or structural modifications are identified. As such, the framework currently provides limited actionable guidance for model development.
2. Data
DOC calibration and validation/testing relies on only 34 + 38 samples. This is critically insufficient for training deep learning models although the authors used transfer learning validation. Given such sparse observations, ML results are likely dominated by SWAT simulations used in pretraining. Although the authors acknowledge data scarcity but proceeds with complex ML architectures that require large datasets. This contradiction undermines the credibility of the analysis.
Minor comments
Line 87: SWAT-MDF is introduced without providing its full name at first mention.
Line 129: The sentence appears incomplete and should be revised.
Line 148: Terminology alternates between parameters, outputs, and features without clear definitions; consistent terminology is needed.
Table 2: The criteria for selecting outputs as representatives of specific modules are unclear. For example, justification is needed for why TOT_P represents the pollutant transport module.
Figure 4: Panels (a) and (b) are too small to read and should be enlarged for clarity.
Citation: https://doi.org/10.5194/egusphere-2025-5503-RC2
- AC2: 'Reply on RC2', zehong huang, 16 Mar 2026
  
  Response to Referee #2
  [Comment 1] This manuscript proposes a hybrid framework (SWAT-MDF) that combines the physically based SWAT-C model with machine-learning approaches and SHAP-based interpretation to diagnose module-level uncertainty in dissolved organic carbon (DOC) simulations. The topic is relevant, as DOC modelling remains challenging and improved diagnostic tools for process-based models are indeed needed.
  However, in its current form, the study does not convincingly demonstrate sufficient methodological rigor nor provide results that are practically useful or actionable. While the conceptual idea is potentially interesting, the implementation raises substantial concerns regarding conceptual consistency, experimental design, interpretability, validation strategy, and scientific contribution. Consequently, several conclusions appear overstated relative to the evidence presented.
  [Response 1] We thank the reviewer for the detailed evaluation and recognizing the relevance of the framework. We restructured the methodology, incorporated four years of daily runoff observations for rigorous model calibration, and added a conceptual diagram explicitly linking input variables to physical carbon processes. Furthermore, we emphasized applying polynomial feature expansion and ridge regression to the model residuals, rather than relying solely on SHAP for module diagnosis, to prove our machine learning diagnosis identifies actual physical flaws instead of mere statistical patterns. Please see below our responses to each comment.
  Major comments
  [Comment 2] T1. Conceptual framework and interpretation
  The central claim of the manuscript is that machine learning can be used to diagnose structural uncertainty in the SWAT-C model. However, the proposed workflow effectively trains a machine-learning surrogate model using SWAT outputs as predictors and observed DOC as targets (Section 2.3). This creates several conceptual issues.
  First, the ML model learns statistical relationships between SWAT outputs and observations, rather than diagnosing model structure itself. The authors assume that selected SWAT-C outputs represent individual model modules, and therefore that higher feature importance in the ML model implies higher uncertainty or relevance of the corresponding module. This logical step is not sufficiently justified.
  For example:
  Simulated DOC (DOC_Simulate) is a final model output resulting from multiple interacting modules and processes. It cannot uniquely represent the carbon cycle module alone.
  The manuscript concludes that the vegetation (LAI) module introduces structural uncertainty in DOC simulations. However, LAI is an independent state variable that can be calibrated and validated separately. High statistical importance or residual contribution does not necessarily imply structural deficiencies in that module.
  Therefore, SHAP importance reflects feature usefulness for the ML predictor, not causal deficiencies of SWAT modules. The manuscript does not provide theoretical or empirical evidence demonstrating that feature attribution can be translated into module-level structural errors.
  A related issue arises in the SHAP results. The analysis identifies DOC_Simulate as the dominant feature. This outcome is expected because DOC_Simulate is already the SWAT-simulated DOC output and therefore strongly correlated with observations. This result is essentially tautological rather than diagnostic. In fact, it may indicate that SWAT-C already captures a substantial portion of DOC variability. Consequently, the conclusion that the carbon cycle module is the primary uncertainty source is not convincingly supported.
  Finally, the study concludes that future improvements should focus on the carbon cycle module, vegetation module, and relative humidity inputs. These recommendations remain generic and largely consistent with existing DOC modelling literature. No specific parameters, process representations, or structural modifications are identified. As such, the framework currently provides limited actionable guidance for model development.
  [Response 2] We thank the reviewer for the detailed critique regarding the conceptual framework. We highlighted the residual analysis using polynomial feature expansion and ridge regression to quantify how intermediate physical state variables drive systematic simulation errors, and introduced a conceptual diagram linking input variables to carbon processes to demonstrate the framework identifies actual structural flaws rather than mere statistical patterns. Please see below our detailed responses to each specific point.
  1.Regarding the overall conceptual framework and residual analysis
  We highlighted applying polynomial feature expansion and ridge regression to model residuals to explicitly quantify how intermediate physical state variables drive systematic simulation errors, and incorporated a new discussion in Section 4.1 with an added conceptual diagram linking input variables to carbon processes to diagnose actual structural flaws rather than mere statistical patterns in the revised manuscript:
  “Residuals, which are defined as the differences between predicted and observed values, were further analyzed to assess the model bias and predictive uncertainty. To quantify the influence of input features on prediction errors, residuals were computed for each sample, and polynomial feature expansion was applied to generate nonlinear interaction terms. Ridge regression was subsequently used to explore the relationship between these terms and the residuals (Santos Nobre and Da Motta Singer, 2007; Tyagi et al., 2022). The absolute values of the regression coefficients served as indicators of each feature's contribution to the residuals (Eq. 1). This allowed for the quantification of both individual feature effects and interaction-driven contributions to model errors. To evaluate the robustness of these estimates, a bootstrapping approach was used to repeatedly resample the training dataset (Hongyi Li and Maddala, 1996), and the mean and standard deviation of each feature’s contribution rate were computed to characterize uncertainty. Two residual analysis strategies were adopted: (1) Interaction-based: capturing nonlinear feature interactions via polynomial expansion and ridge regression; (2) Single-feature: assessing individual feature contributions without interactions.
  Please see the supplementary material for the corresponding formulas.
  where Cj denotes the contribution rate of the j-th feature to the residuals, expressed as a percentage; βj denotes the ridge regression coefficient corresponding to the j-th feature; n denotes the total number of input features.
  4.1. Spatiotemporal evolution of dissolved organic carbon and its driving mechanisms
  The spatiotemporal evolution of DOC concentrations in the Yalong River Basin provides a direct reflection of how climatic and hydrological conditions drive the carbon cycle. Spatially, DOC concentrations exhibit a progressive increase from the northwest to the southeast, aligning closely with the hydrothermal gradients of the basin. The relatively warm and humid climate in the southeastern region not only promotes vegetation productivity but also enhances microbial activity in decomposing organic matter, thereby increasing local carbon release(Davidson and Janssens, 2006). This elevated local supply, combined with the downstream accumulation of streamflow, collectively shapes the high concentration zones observed in the downstream reaches. Temporally, the seasonal distribution of precipitation governs the annual fluctuations of DOC(Blaurock et al., 2025). Concentrated rainfall during the monsoon season triggers a pronounced flushing effect, rapidly leaching soil DOC into the river channel and resulting in significant concentration peaks(Yan et al., 2024).
  Whether through spatial biogeochemical control by hydrothermal conditions and vegetation or temporal physical flushing by precipitation, these actual catchment processes driving DOC export are accurately represented within the underlying computing equations of the SWAT-C model. As illustrated in Figure 8, the model characterizes these complex physical and biochemical processes by partitioning them into four core stages, thereby clarifying the specific physical significance of each key feature parameter (as detailed in Table 2) within the model code. This integration of natural phenomena with physical equations demonstrates that the selected feature parameters possess a robust mechanistic foundation.
  In the specific computational logic of the model, (1) LAI and (2) BIOM within the vegetation growth and carbon input stage reflect the control of plant growth by the vegetation and biomass modules; these variables determine the quantity of litter and plant residues, providing the initial carbon source for DOC generation(Ji et al., 2022). In the dynamic environmental regulation stage, meteorological forcing conditions including (3) TMAX/TMIN, (5) SOLAR, (7) PCP, and (8) WIND are coupled with (4) SW. Within the underlying code, these are converted into dynamic regulatory factors that continuously modulate the rate of microbial decomposition(Davidson and Janssens, 2006). In the hydrological processes and dissolved transport stage, the water balance calculated by (9) FLOW_OUT and (6) ET, combined with the (12) DOC simulation results, jointly drives the leaching and routing of soil carbon into the river network(Laudon et al., 2011). Furthermore, in the soil erosion and associated transport stage, (11) SED_OUT and (10) TOT_P simulate soil erosion processes, quantifying the amount of organic carbon that enters the river while attached to sediment particles(Galy et al., 2007). The complete closed loop formed by these four stages ensures that the model accurately captures the comprehensive impacts of environmental changes on the total carbon.
  Please see the supplementary material for the corresponding figures.
  Figure 8. Conceptual diagram of physical mechanisms for DOC in the SWAT-C model. Numbers (1) through (12) correspond to the key feature parameters and their specific roles in the catchment DOC export process.”
  2.Regarding the representation of simulated dissolved organic carbon
  We agree that DOC_Simulate is a final output shaped by multiple interacting processes and cannot uniquely represent the carbon cycle module alone. For this reason, we further conducted pairwise interaction analysis in the residual framework. By generating interaction terms among all features, we decomposed this integrated output into identifiable structural components and thereby diagnosed more specific process-related weaknesses:
  “3.4.2. Residual Analysis of Input Features
  Residual analysis was further conducted to quantify the contribution of input features to DOC prediction residuals within the coupled model. The three interaction terms with the highest contributions were identified as key factors. As illustrated in the residual interaction analysis (Fig. 7a), the dominant contributors were DOC_Simulate-TMIN, SW-RH, LAI with contribution rates of 2.83%, 2.21%, 2.03% respectively. Notably, the identified interaction terms involve variables related to climatic and hydrological conditions, indicating that environmental factors jointly influence the residual patterns of DOC simulations. Additionally, LAI emerged as an important individual contributor, suggesting that vegetation conditions play a notable role in shaping the distribution of model residuals.
  Please see the supplementary material for the corresponding figures.
  Figure 7. Residual analysis of characteristic parameters. (a) presents the detailed contribution rates for all parameter interactions, while (b) summarizes the primary individual contributions of key parameters. Circular bar plots illustrating the residual contribution rates of input parameters and their pairwise interactions influencing model DOC predictions. “
  3.Regarding the calibration of leaf area index and structural uncertainty
  We agree that LAI is an independent state variable and can be calibrated. To reduce parameter uncertainty, we first calibrated the key physical processes, including runoff, sediment, and DOC, and further clarified this issue in the revised Discussion section. Under this optimized parameter space, the dominant residual contribution from LAI dynamics points to structural limitations in the vegetation module. Following the referee’s suggestion, we revised the relevant expression in the revised manuscript as:
  “3.1. Model performance of SWAT-C Model in DOC simulations
  The first component of the coupled model, the SWAT-C model, achieved excellent performance in simulating daily runoff and sediment for the Yalong River Basin (Fig. 3). During the calibration period, the SWAT-C model achieved high accuracy for daily runoff (calibration: NSE = 0.91, R2 = 0.92, KGE = 0.94; validation: NSE = 0.90, R2 = 0.93, KGE = 0.85). For sediment, both calibration and validation phases yielded excellent results (calibration: NSE = 0.92, R2 = 0.92, KGE = 0.95; validation: NSE = 0.89, R2 = 0.93, KGE = 0.79). In contrast, the model's ability to simulate daily DOC was relatively limited, with lower performance (calibration: NSE = 0.57, R2 = 0.75, KGE = 0.74; validation: NSE = 0.44, R2 = 0.71, KGE = 0.81). Specifically, the model tended to overestimate DOC concentrations during certain high-flow periods, while underestimation was observed under low-flow conditions. This flow-dependent bias contributed to the overall reduction in model performance at the daily scale.
  Please see the supplementary material for the corresponding figures.
  Figure 3. Simulation performance of the SWAT-C model. Line plots (a) streamflow (Q), (c) sediment yield (SED), and (e) dissolved organic carbon (DOC). Scatter plots (b), (d), and (f) show the correlations between observed and simulated values.
  Despite these contributions, several limitations and sources of uncertainty should be acknowledged. First, long term DOC observations in the study basin remain limited, which introduces input data uncertainty and constrains the direct training of data driven models, thereby necessitating the use of simulated variables from the SWAT-C model as input features. Second, parameter uncertainty may arise from the calibration process of the SWAT-C model, as different parameter combinations may yield similar model performance. Third, the current SWAT-C structure simplifies several carbon related processes, such as vegetation derived carbon inputs, soil organic matter decomposition, and their interactions with hydrological transport, which may introduce model structural uncertainty in DOC simulations. Although these uncertainties may influence the interpretation of the results, the combined SHAP and residual analysis still provides a useful diagnostic perspective for identifying key drivers and potential structural weaknesses in the model, offering valuable guidance for future process refinement and model development.”
  4.Regarding the interpretation of SHAP results
  We agree that the high global SHAP importance of DOC_Simulate is expected and mainly reflects the baseline predictive information provided by SWAT-C, rather than a structural deficiency. Our inference regarding carbon-module uncertainty is therefore not based on SHAP ranking alone, but on the combined evidence from global SHAP interpretation and residual interaction analysis. Following the referee’s suggestion, this point has been clarified in our response to Point 2 above.
  5.Regarding actionable guidance and specific structural modifications
  We appreciate this helpful comment. We agree that identifying broad modules alone is insufficient for guiding model development. In the revised manuscript, we therefore expanded the Discussion section to link each selected feature to its corresponding DOC-related physical process, and strengthened the reference to Table S4, which provides the relevant subroutines. The added discussion in Section 4.1 is presented in our response to Point 1. Following the referee’s suggestion, we revised the relevant text as:
  “Table S4. Source code locations of the selected SWAT-C process variables”
  Please see the supplementary material for the corresponding tables.
  [Comment 2] 2. Data DOC calibration and validation/testing relies on only 34 + 38 samples. This is critically insufficient for training deep learning models although the authors used transfer learning validation. Given such sparse observations, ML results are likely dominated by SWAT simulations used in pretraining. Although the authors acknowledge data scarcity but proceeds with complex ML architectures that require large datasets. This contradiction undermines the credibility of the analysis.
  [Response 2] We thank the referee for this important comment. We agree that the limited DOC observations are insufficient for training a purely data-driven deep learning model. However, our framework is not intended as a standalone predictor. The pretraining stage is designed to preserve the physically based structure of SWAT-C, while the limited observations are used only for fine-tuning to reveal systematic discrepancies between model simulations and real-world processes. We also added four years of observed runoff data to improve SWAT-C calibration and strengthen the physical basis of the analysis:
  “DOC observations from 2013-2014 were used as training data, with 20% randomly withheld for validation, while observations from 2019-2020 were used for testing. Because DOC observations are limited, the ML models were not trained directly from observations alone. Instead, the input features were mainly derived from long-term simulated feature parameters generated by the SWAT-C model. Given the limited availability of DOC measurements, a transfer learning strategy was adopted. First, models were pretrained on long-term simulated feature parameters from the SWAT-C model (1972–2020) to establish fundamental relationships between inputs and DOC outputs. Second, the pretrained models were fine-tuned using observed data to better match the true data distribution. In this way, the long-term SWAT-C simulations provide physically consistent training features, while the observed DOC data are primarily used to adjust the model to local conditions. This process mitigated overfitting caused by data scarcity by leveraging generalized patterns learned from simulations. Text S3 in the Supplementary Materials details the principles of the selected ML methods and the procedures for Bayesian-based hyperparameter optimization.
  To further demonstrate the robustness of the hydrological processes represented in the SWAT-C model, daily observed streamflow during 2013, 2014, 2019, and 2020 was additionally used for model calibration and evaluation. The model achieved satisfactory performance with NSE= 0.67 and R²= 0.67, indicating that the model reasonably captures the daily runoff dynamics of the basin. The comparison between observed and simulated streamflow is provided in Figure S3 in the Supplementary Materials.
  Please see the supplementary material for the corresponding figures.
  Figure S3. Comparison between observed and simulated daily streamflow used for calibration of the SWAT-C model. "
  Minor comments
  [Comment 3] Line 87: SWAT-MDF is introduced without providing its full name at first mention.
  [Response 3] We thank the reviewer for pointing out this omission. We have corrected it in the revised manuscript by providing the full name, Soil and Water Assessment Tool Model Diagnostic Framework (SWAT-MDF), at its first occurrence. The corresponding revised text is as follows:
  “The SWAT-MDF (Soil and Water Assessment Tool Model Diagnostic Framework) quantitatively diagnoses module-level sources of uncertainty in SWAT-C and evaluates their relative contributions under complex environmental conditions.”
  [Comment 4] Line 129: The sentence appears incomplete and should be revised.
  [Response 4] We appreciate your careful reading. We agree that the original sentence was incomplete and lacked a clear connection to the surrounding context. To improve the overall clarity and coherence of the paragraph, we have removed this sentence entirely in the revised manuscript.
  [Comment 5] Line 148: Terminology alternates between parameters, outputs, and features without clear definitions; consistent terminology is needed.
  [Response 5] We thank the reviewer for pointing out this terminology inconsistency. We have standardized the terminology throughout the revised manuscript. Specifically, parameters refers to the calibration settings of SWAT-C, simulated variables to the physical outputs of SWAT-C, and feature variables to these outputs when used as inputs to the data-driven module. The corresponding paragraph and Table 2 have been revised accordingly. The revised text is as follows:
  “The first component (Fig. 2a), SWAT-C model, provides the physical and mechanistic foundation of the diagnostic framework (Section 2.3.1). Its simulations are driven by spatial datasets such as digital elevation model (DEM), soil type, land use and slope, as well as meteorological variables including precipitation, relative humidity, maximum and minimum temperature, wind speed and solar radiation. Upon model calibration (More details are provided in Text S1 in the Supplementary Materials.), a subset of key simulated variables that govern DOC dynamics within the watershed was identified. ”
  [Comment 6] Table 2: The criteria for selecting outputs as representatives of specific modules are unclear. For example, justification is needed for why TOT_P represents the pollutant transport module.
  [Response 6] We thank the reviewer for this helpful comment. We have clarified that the selected variables were chosen based on their direct relevance to DOC generation and transport in SWAT-C. Specifically, TOT_P was used to represent the pollutant transport module because it is closely linked to sediment and surface-runoff transport pathways and also reflects biogeochemical processes affecting DOC production. To improve clarity, we added a dedicated explanation of the selection rationale for all variables in the methodology section. The revised text is as follows:
  “These simulated variables, each linked to specific physical modules in the SWAT-C structure, are used as feature variables for the second component of the framework. Tab. 2 provides detailed definitions of these feature variables and their associated modules within the SWAT-C model. The selection of these feature variables is strictly based on their physical connections to carbon cycling. Meteorological forcings act as fundamental climatic drivers. Streamflow, sediment, soil moisture, and evapotranspiration represent the physical water and mass transport pathways. Biomass and leaf area index govern the biological production and storage of carbon. Total phosphorus is explicitly chosen to represent the pollutant transport module because it shares identical physical transport pathways with organic carbon and acts as a limiting nutrient for biological carbon generation. This comprehensive selection ensures the diagnostic framework evaluates the complete chain of internal processes governing watershed carbon dynamics.”
  Table 2. SWAT-C simulated variables used as machine learning feature variables
  Please see the supplementary material for the corresponding tables.
  Note: The specific locations of the SWAT-C modules in the source code are provided in Table S4 in the Supplementary Materials. All feature parameters were derived from the calibrated SWAT-C model.
  [Comment 7] Figure 4: Panels (a) and (b) are too small to read and should be enlarged for clarity.
  [Response 7] We thank the reviewer for pointing out the legibility issue with Figure 4. We agree that panels (a) and (b) were too small for proper visual inspection. In the revised manuscript, we have significantly enlarged these two panels and optimized the overall layout of the figure.
  Please see the supplementary material for the corresponding figures.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5503-AC2

Zehong Huang, Shouzhi Chen, Yufeng Gong, Zheng Wang, Zheng Duan, and Yongshuo H. Fu

Supplement

https://doi.org/10.5194/egusphere-2025-5503-supplement

Zehong Huang, Shouzhi Chen, Yufeng Gong, Zheng Wang, Zheng Duan, and Yongshuo H. Fu

Viewed

Total article views: 485 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
331	113	41	485	59	44	41

HTML: 331
PDF: 113
XML: 41
Total: 485
Supplement: 59
BibTeX: 44
EndNote: 41

Views and downloads (calculated since 27 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	79	31	4	114
Jan 2026	78	46	11	135
Feb 2026	63	15	5	83
Mar 2026	111	21	21	153

Cumulative views and downloads (calculated since 27 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	79	31	4	114
Jan 2026	78	46	11	135
Feb 2026	63	15	5	83
Mar 2026	111	21	21	153

Viewed (geographical distribution)

Total article views: 455 (including HTML, PDF, and XML) Thereof 455 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Mar 2026

Short summary

Understanding how carbon moves through rivers is vital for managing water quality and climate change. This study developed a new diagnostic framework that combines computer modeling and data analysis to explore why current river carbon simulations are often inaccurate. The results show that parts of the model describing plant growth and carbon cycling cause most errors, and improving these areas can make future environmental predictions more reliable.


Total:	0
HTML:	0
PDF:	0
XML:	0