the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-decadal Streamflow Projections for Catchments in Brazil based on CMIP6 Multi-model Simulations and Neural Network Embeddings for Linear Regression Models
Abstract. A linear regression model is developed to link anomalies of streamflow to anomalies of precipitation amounts and temperature with the goal of making multi-decadal streamflow projections based on CMIP6 multi-model simulations. Regression coefficients estimated separately for each catchment and each month show physically implausible spatial patterns and indicate issues with overfitting. An alternative approach is therefore explored in which all regression coefficients are estimated simultaneously through a neural network that retains the original linear model structure, but uses embeddings to map each combination of catchment and month to a set of regression coefficients. The model is demonstrated over a set of catchments in Brazil, where the estimated relationships are used to make streamflow projections for the next decades based on CMIP6 multi-model simulations. It yields physically more plausible relationships between streamflow, precipitation amounts, and temperature for our study area than the locally fitted regression models. The resulting projections indicate reduced streamflow over northern, north-eastern, central, and south-eastern Brazil, especially for the austral spring and summer season. The signal is less clear during austral winter. In southern Brazil, an increase in streamflow is expected.
- Preprint
(3854 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on egusphere-2025-1603', Ingrid Petry, 21 May 2025
I recommend the authors to check the following three references, which may be relevant to their work:
-
Brêda, J.P.L.F., de Paiva, R.C.D., Collischon, W. et al. Climate change impacts on South American water balance from a continental-scale hydrological model driven by CMIP5 projections. Climatic Change 159, 503–522 (2020). https://doi.org/10.1007/s10584-020-02667-9
- Brêda, J.P.L.F., et al., 2020. Climate change impacts on South American water balance from a continental-scale hydrological model driven by CMIP5 projections. Climatic Change, 159 (4), 503–522. doi:10.1007/s10584-020-02667-9
- Petry, I., Miranda, P. T., Paiva, R. C. D., Collischonn, W., Fan, F. M., Fagundes, H. O., et al. (2025). Changes in flood magnitude and frequency projected for vulnerable regions and major wetlands of south America. Geophysical Research Letters, 52, e2024GL112436. https://doi.org/10.1029/2024GL112436
Citation: https://doi.org/10.5194/egusphere-2025-1603-CC1 -
AC1: 'Reply on CC1', Michael Scheuerer, 02 Jun 2025
Thanks for sharing the references! I have now had a chance to read the two (1 and 2 are identical) papers and they are relevant indeed and allow us to compare the projected changes we obtained with our ML model with projections with a hydrological model.
Citation: https://doi.org/10.5194/egusphere-2025-1603-AC1 -
CC2: 'Reply on AC1', Ingrid Petry, 02 Jun 2025
I'm sorry, the second paper was supposed to be the following:
2. Brêda, J. P. L., de Paiva, R. C. D., Siqueira, V. A., & Collischonn, W. (2023). Assessing climate change impact on flood discharge in South America and the influence of its main drivers. Journal of Hydrology, 619, 129284. doi: https://doi.org/10.1016/j.jhydrol.2023.129284I'm glad to help!
Citation: https://doi.org/10.5194/egusphere-2025-1603-CC2
-
CC2: 'Reply on AC1', Ingrid Petry, 02 Jun 2025
-
-
RC1: 'Comment on egusphere-2025-1603', Anonymous Referee #1, 06 Jun 2025
- I suggest revising the study area map, adding coordinates, north arrow, scale.
-Some litreture reviews can be added about hybrid statistical-physical models in introduction.
-You are using different data sources CHIRPS vs. ERA5, dis you do some sensivity check analyses?
-Have you think about physical relationships between temperature and vapor pressure of water (like considering clausius-claperyon equation)?
-Considering nearest grid point, did you use orographic effects?
-I suggest adding skill scores as well for climatology.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC1 -
AC2: 'Reply on RC1', Michael Scheuerer, 01 Jul 2025
Thanks for your comments! I'll reply to your comments below, suggested changes will be included in the revised version of the manuscript.
- I suggest revising the study area map, adding coordinates, north arrow, scale.
Thanks for your suggestions, we'll add these features in the revised version of the manuscript.- Some literature reviews can be added about hybrid statistical-physical models in introduction.
We'll add some references about hybrid statistical-physical models in the revised version of the manuscript.- You are using different data sources CHIRPS vs. ERA5, did you do some sensitivity check analyses?
Yes. We had originally used ERA5 data for both temperature and precipitation but had then discovered discrepancies between historical trends for ERA5 precipitation and streamflow (see attached images). Encouraged by comparisons of ERA5 and CHIRPS precipitation data over Brazil reported in the literature, we tested CHIRPS data as an alternative to ERA5 data and obtained better results with CHIRPS (then still using only the constrained regression framework with slightly different predictors, see attached images). The downside of this change, a reduction in the number of years available for fitting the model, led to the development of the neural network regression approach.- Have you think about physical relationships between temperature and vapor pressure of water (like considering clausius-claperyon equation)?
In earlier stages of the project, we have discussed and tested more sophisticated ways to model the impact of temperature changes (due to global warming) on streamflow through increased evapotranspiration, which also take interactions between temperature and precipitation changes into account. However, these attempts with more complex (non-linear) approaches were not successful, likely due to the limited data and often poor signal-to-noise ratio, and we therefore decided to move forward with a simple linear model.- Considering nearest grid point, did you use orographic effects?
No, we have not corrected for orographic effects. While discrepancies between model grid and real orography may indeed entail biases for both temperature and precipitation, we use the same rationale that we describe in section 3.1 ('Data standardization') in the context of possible biases in climate model simulations: by using only standardized anomalies in our model, systematic biases are corrected implicitly as they cancel out in the standardized anomalies. Of course, this rationale does not work for non-linear bias effects, but as explained above, such effects are too complex to identify and correct in a robust way with the available data.- I suggest adding skill scores as well for climatology.
By 'skill scores', are you referring to the fraction of explained variability depicted in Figure 7? This quantity is effectively a mean squared error (MSE) skill score, and climatology in this context would be constant prediction of zero anomaly (since the climatological signal is removed in the standardization described in section 3.1). So, by definition, the explained variability (MSE skill score) for climatology is zero everywhere.
-
AC2: 'Reply on RC1', Michael Scheuerer, 01 Jul 2025
-
RC2: 'Comment on egusphere-2025-1603', Anonymous Referee #2, 23 Jun 2025
I would like to commend the authors for a clear and rigorous manuscript that adheres well to the principle of Ockham’s razor. The attention given to model interpretability is particularly appreciated. I believe the manuscript is suitable for publication in its current form. I have only one minor comment: could the authors clarify why the CatchmendID pipeline was separated from the Month pipeline in the neural network architecture? A brief explanation would be helpful.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC2 -
AC3: 'Reply on RC2', Michael Scheuerer, 02 Jul 2025
Thank you for the positive assessment! We have indeed experimented with alternative architectures where the catchment and month information is combined at an earlier stage. The performance was very similar, the reason why we chose the architecture with separate pipelines in the article is that we like the interpretation briefly explained in section 3.3.2 as one pipeline learning a set of spatial patterns and the other pipeline learning how to weigh these patterns differently over the course of the year. One can visualize this as in the figures attached to this comment, where we depicted the first 4 (of 25) spatial patterns for the first cross validation fold (see section 3.3.5) and the associated month-specific coefficients. We find the possibility to visualize and interpret the result of the embeddings in this way an advantage of our chosen architecture. If interpretability is the main focus, further improvements could be achieved by
a) choosing activation functions after the respective first dense layer that entail non-negative coefficients, and
b) reducing the number of nodes in the first hidden layer (output dimension of these dense layers) which could potentially lead to more unique individual patterns while likely (see figure A1) only having a minor negative impact on the model's performance.
-
AC3: 'Reply on RC2', Michael Scheuerer, 02 Jul 2025
-
RC3: 'Comment on egusphere-2025-1603', Anonymous Referee #3, 03 Jul 2025
This paper presents a linear regression model to project multi-decadal streamflow anomalies across Brazilian catchments. The study used CMIP6 multi-model climate simulations, CHIRPS precipitation, and ERA5 temperature data to build interpretable predictive models. The paper is well-written and contributes an interpretable alternative to black-box deep learning models in climate impact assessments. Below, I provide detailed comments and suggestions.
Introduction
- The authors provide an overview of process-based hydrological models and data-driven approaches, including LSTMs. One potential improvement is to elaborate slightly more on why existing hydrological models could not be adapted in this context.
- In the introduction, the authors emphasize the limitations of LSTM models, characterizing them as “black boxes.” However, in this study, LSTMs or neural networks are not applied to make direct predictions but rather to learn coefficient embeddings for a linear regression model. Moreover, there are many established approaches to improve the interpretability of neural networks. I suggest that the authors compare existing explainability methods (e.g., attention mechanisms, feature attribution techniques) with the embedding approach adopted in this study, to more clearly situate the method within the broader context of explainable machine learning.
Method
Please briefly explain how missing months or gaps in time series were handled in the regressions.
Figure 5 is helpful, but it would improve understanding to provide a clearer description of the dimensionality of embeddings and dense layer outputs in the main text rather than in the appendix.
Results
Figure 7 shows the percentiles of the coefficients of determination across all catchments. It is noted that in July, the model performance is relatively poor, with many catchments having values even below 0.03. Please check whether this is correct, and provide an explanation for this phenomenon.How this affects the robustness of long-term projections
The horizontal and vertical axis labels and units in Figure 8 and Figure 9 are missing.
Figure 11 analyzed the predictor contributions which is insightful. It is noted that this decomposition does not capture interaction effects, which could be a limitation. For example, the temperature emerges as the primary driver of projected declines in streamflow. However, this may be partly an artifact of the linear model structure. Have the authors tested alternative formulations, such as including interaction terms (e.g., temperature × precipitation) or exploring nonlinear relationships?
While smoothing year-to-year variability is understandable, applying centered 30-year moving averages can mask decadal shifts and dampen trends, particularly in non-stationary time series. It could be informative to provide supplementary figures with alternative window lengths (e.g., 10 or 20 years) or without smoothing, to demonstrate the stability of trends.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC3 -
AC4: 'Reply on RC3', Michael Scheuerer, 08 Jul 2025
Thank you for your comments and suggestions! I'll reply to the individual comments below, suggested changes will be included in the revised version of the manuscript:
- The authors provide an overview of process-based hydrological models and data-driven approaches, including LSTMs. One potential improvement is to elaborate slightly more on why existing hydrological models could not be adapted in this context.
Using hydrological models is certainly possible, and the first community comment has pointed us to papers that calculate projections of future streamflow over South America using that approach. However, this requires both expertise with the respective local hydroclimate and a substantial amount of time to calibrate that model for all catchments. Since our project partners at Statkraft want to use this model in several different parts of the world and often have to provide a first iteration of future streamflow simulations rather quickly, there was a desire for an approach that can more easily be transferred to different regions. This was one of the primary motivations for this work, and we will expand our explanations to make this more clear.
- In the introduction, the authors emphasize the limitations of LSTM models, characterizing them as “black boxes.” However, in this study, LSTMs or neural networks are not applied to make direct predictions but rather to learn coefficient embeddings for a linear regression model. Moreover, there are many established approaches to improve the interpretability of neural networks. I suggest that the authors compare existing explainability methods (e.g., attention mechanisms, feature attribution techniques) with the embedding approach adopted in this study, to more clearly situate the method within the broader context of explainable machine learning.
We will expand this section and add some comments and references on explainability and interpretability. In our understanding, explainability methods can help better understand the sensitivity of the output to the various inputs, but cannot make it interpretable to the same degree as a process-based hydrological or linear statistical model where one has a clear, intuitive understanding of the model’s decisions. We’ll also add a sentence that makes it clear where our proposed method is situated w.r.t. explainable machine learning approaches.
- Please briefly explain how missing months or gaps in time series were handled in the regressions.
In our setup, missing values only occurred in the context that the streamflow data looked suspicious (identical values across the majority of years) for a small number of combinations of months and catchments. In those cases, we removed the entire month-catchment combination from the analysis. The case where only a few years for a given month-catchment combination are missing did not occur in our study, but should not pose any problems as long as the standardization (section 3.1) can be calculated in a robust way. The regression model (especially the one fitted within a neural network framework using month and catchment embeddings) can be fitted with the missing years removed from the training data set.
- Figure 5 is helpful, but it would improve understanding to provide a clearer description of the dimensionality of embeddings and dense layer outputs in the main text rather than in the appendix.
We will provide that information in the main text discussing Figure 5 to give an idea about the typical hyperparameter values early on, while still referring to the ‘Hyperparameter’ subsection and the Appendix for the technical details of how these values were obtained.
- Figure 7 shows the percentiles of the coefficients of determination across all catchments. It is noted that in July, the model performance is relatively poor, with many catchments having values even below 0.03. Please check whether this is correct, and provide an explanation for this phenomenon. How this affects the robustness of long-term projections.
We believe that this is correct, and is a result of typically low precipitation amounts in July over large parts of Brazil (see Figure 2), which makes modeling the rainfall-runoff relation more difficult than in other months, and more dependent on long-term storage mechanisms and possibly other factors (e.g. more impact from reservoir operations that was not fully accounted for in the available streamflow series). We will expand the discussion of the negative implications for long-term projections and associated uncertainty (section 5), and add a sentence to the conclusion to make it clear that this is where more complex approaches may have the most potential for improvement.
- The horizontal and vertical axis labels and units in Figure 8 and Figure 9 are missing.
The horizontal axis label (year) should be self-explanatory, for the vertical axis we will add labels and units to the two left panels in the revised version.
- Figure 11 analyzed the predictor contributions which is insightful. It is noted that this decomposition does not capture interaction effects, which could be a limitation. For example, the temperature emerges as the primary driver of projected declines in streamflow. However, this may be partly an artifact of the linear model structure. Have the authors tested alternative formulations, such as including interaction terms (e.g., temperature × precipitation) or exploring nonlinear relationships?
In earlier stages of the project, we have discussed and tested more sophisticated ways to model the impact of temperature changes on streamflow through increased evapotranspiration which also take interactions between temperature and precipitation changes into account. However, these attempts with more complex (non-linear) approaches were not successful, likely due to the limited data and often poor signal-to-noise ratio, and we therefore decided to move forward with a simple linear model. We agree though that there is a potential danger of an omitted variable bias with our model, and we will add this caveat to the discussion of Figure 11.
- While smoothing year-to-year variability is understandable, applying centered 30-year moving averages can mask decadal shifts and dampen trends, particularly in non-stationary time series. It could be informative to provide supplementary figures with alternative window lengths (e.g., 10 or 20 years) or without smoothing, to demonstrate the stability of trends.
We have attached supplementary figures with the suggested alternative window lengths.
Status: closed
-
CC1: 'Comment on egusphere-2025-1603', Ingrid Petry, 21 May 2025
I recommend the authors to check the following three references, which may be relevant to their work:
-
Brêda, J.P.L.F., de Paiva, R.C.D., Collischon, W. et al. Climate change impacts on South American water balance from a continental-scale hydrological model driven by CMIP5 projections. Climatic Change 159, 503–522 (2020). https://doi.org/10.1007/s10584-020-02667-9
- Brêda, J.P.L.F., et al., 2020. Climate change impacts on South American water balance from a continental-scale hydrological model driven by CMIP5 projections. Climatic Change, 159 (4), 503–522. doi:10.1007/s10584-020-02667-9
- Petry, I., Miranda, P. T., Paiva, R. C. D., Collischonn, W., Fan, F. M., Fagundes, H. O., et al. (2025). Changes in flood magnitude and frequency projected for vulnerable regions and major wetlands of south America. Geophysical Research Letters, 52, e2024GL112436. https://doi.org/10.1029/2024GL112436
Citation: https://doi.org/10.5194/egusphere-2025-1603-CC1 -
AC1: 'Reply on CC1', Michael Scheuerer, 02 Jun 2025
Thanks for sharing the references! I have now had a chance to read the two (1 and 2 are identical) papers and they are relevant indeed and allow us to compare the projected changes we obtained with our ML model with projections with a hydrological model.
Citation: https://doi.org/10.5194/egusphere-2025-1603-AC1 -
CC2: 'Reply on AC1', Ingrid Petry, 02 Jun 2025
I'm sorry, the second paper was supposed to be the following:
2. Brêda, J. P. L., de Paiva, R. C. D., Siqueira, V. A., & Collischonn, W. (2023). Assessing climate change impact on flood discharge in South America and the influence of its main drivers. Journal of Hydrology, 619, 129284. doi: https://doi.org/10.1016/j.jhydrol.2023.129284I'm glad to help!
Citation: https://doi.org/10.5194/egusphere-2025-1603-CC2
-
CC2: 'Reply on AC1', Ingrid Petry, 02 Jun 2025
-
-
RC1: 'Comment on egusphere-2025-1603', Anonymous Referee #1, 06 Jun 2025
- I suggest revising the study area map, adding coordinates, north arrow, scale.
-Some litreture reviews can be added about hybrid statistical-physical models in introduction.
-You are using different data sources CHIRPS vs. ERA5, dis you do some sensivity check analyses?
-Have you think about physical relationships between temperature and vapor pressure of water (like considering clausius-claperyon equation)?
-Considering nearest grid point, did you use orographic effects?
-I suggest adding skill scores as well for climatology.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC1 -
AC2: 'Reply on RC1', Michael Scheuerer, 01 Jul 2025
Thanks for your comments! I'll reply to your comments below, suggested changes will be included in the revised version of the manuscript.
- I suggest revising the study area map, adding coordinates, north arrow, scale.
Thanks for your suggestions, we'll add these features in the revised version of the manuscript.- Some literature reviews can be added about hybrid statistical-physical models in introduction.
We'll add some references about hybrid statistical-physical models in the revised version of the manuscript.- You are using different data sources CHIRPS vs. ERA5, did you do some sensitivity check analyses?
Yes. We had originally used ERA5 data for both temperature and precipitation but had then discovered discrepancies between historical trends for ERA5 precipitation and streamflow (see attached images). Encouraged by comparisons of ERA5 and CHIRPS precipitation data over Brazil reported in the literature, we tested CHIRPS data as an alternative to ERA5 data and obtained better results with CHIRPS (then still using only the constrained regression framework with slightly different predictors, see attached images). The downside of this change, a reduction in the number of years available for fitting the model, led to the development of the neural network regression approach.- Have you think about physical relationships between temperature and vapor pressure of water (like considering clausius-claperyon equation)?
In earlier stages of the project, we have discussed and tested more sophisticated ways to model the impact of temperature changes (due to global warming) on streamflow through increased evapotranspiration, which also take interactions between temperature and precipitation changes into account. However, these attempts with more complex (non-linear) approaches were not successful, likely due to the limited data and often poor signal-to-noise ratio, and we therefore decided to move forward with a simple linear model.- Considering nearest grid point, did you use orographic effects?
No, we have not corrected for orographic effects. While discrepancies between model grid and real orography may indeed entail biases for both temperature and precipitation, we use the same rationale that we describe in section 3.1 ('Data standardization') in the context of possible biases in climate model simulations: by using only standardized anomalies in our model, systematic biases are corrected implicitly as they cancel out in the standardized anomalies. Of course, this rationale does not work for non-linear bias effects, but as explained above, such effects are too complex to identify and correct in a robust way with the available data.- I suggest adding skill scores as well for climatology.
By 'skill scores', are you referring to the fraction of explained variability depicted in Figure 7? This quantity is effectively a mean squared error (MSE) skill score, and climatology in this context would be constant prediction of zero anomaly (since the climatological signal is removed in the standardization described in section 3.1). So, by definition, the explained variability (MSE skill score) for climatology is zero everywhere.
-
AC2: 'Reply on RC1', Michael Scheuerer, 01 Jul 2025
-
RC2: 'Comment on egusphere-2025-1603', Anonymous Referee #2, 23 Jun 2025
I would like to commend the authors for a clear and rigorous manuscript that adheres well to the principle of Ockham’s razor. The attention given to model interpretability is particularly appreciated. I believe the manuscript is suitable for publication in its current form. I have only one minor comment: could the authors clarify why the CatchmendID pipeline was separated from the Month pipeline in the neural network architecture? A brief explanation would be helpful.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC2 -
AC3: 'Reply on RC2', Michael Scheuerer, 02 Jul 2025
Thank you for the positive assessment! We have indeed experimented with alternative architectures where the catchment and month information is combined at an earlier stage. The performance was very similar, the reason why we chose the architecture with separate pipelines in the article is that we like the interpretation briefly explained in section 3.3.2 as one pipeline learning a set of spatial patterns and the other pipeline learning how to weigh these patterns differently over the course of the year. One can visualize this as in the figures attached to this comment, where we depicted the first 4 (of 25) spatial patterns for the first cross validation fold (see section 3.3.5) and the associated month-specific coefficients. We find the possibility to visualize and interpret the result of the embeddings in this way an advantage of our chosen architecture. If interpretability is the main focus, further improvements could be achieved by
a) choosing activation functions after the respective first dense layer that entail non-negative coefficients, and
b) reducing the number of nodes in the first hidden layer (output dimension of these dense layers) which could potentially lead to more unique individual patterns while likely (see figure A1) only having a minor negative impact on the model's performance.
-
AC3: 'Reply on RC2', Michael Scheuerer, 02 Jul 2025
-
RC3: 'Comment on egusphere-2025-1603', Anonymous Referee #3, 03 Jul 2025
This paper presents a linear regression model to project multi-decadal streamflow anomalies across Brazilian catchments. The study used CMIP6 multi-model climate simulations, CHIRPS precipitation, and ERA5 temperature data to build interpretable predictive models. The paper is well-written and contributes an interpretable alternative to black-box deep learning models in climate impact assessments. Below, I provide detailed comments and suggestions.
Introduction
- The authors provide an overview of process-based hydrological models and data-driven approaches, including LSTMs. One potential improvement is to elaborate slightly more on why existing hydrological models could not be adapted in this context.
- In the introduction, the authors emphasize the limitations of LSTM models, characterizing them as “black boxes.” However, in this study, LSTMs or neural networks are not applied to make direct predictions but rather to learn coefficient embeddings for a linear regression model. Moreover, there are many established approaches to improve the interpretability of neural networks. I suggest that the authors compare existing explainability methods (e.g., attention mechanisms, feature attribution techniques) with the embedding approach adopted in this study, to more clearly situate the method within the broader context of explainable machine learning.
Method
Please briefly explain how missing months or gaps in time series were handled in the regressions.
Figure 5 is helpful, but it would improve understanding to provide a clearer description of the dimensionality of embeddings and dense layer outputs in the main text rather than in the appendix.
Results
Figure 7 shows the percentiles of the coefficients of determination across all catchments. It is noted that in July, the model performance is relatively poor, with many catchments having values even below 0.03. Please check whether this is correct, and provide an explanation for this phenomenon.How this affects the robustness of long-term projections
The horizontal and vertical axis labels and units in Figure 8 and Figure 9 are missing.
Figure 11 analyzed the predictor contributions which is insightful. It is noted that this decomposition does not capture interaction effects, which could be a limitation. For example, the temperature emerges as the primary driver of projected declines in streamflow. However, this may be partly an artifact of the linear model structure. Have the authors tested alternative formulations, such as including interaction terms (e.g., temperature × precipitation) or exploring nonlinear relationships?
While smoothing year-to-year variability is understandable, applying centered 30-year moving averages can mask decadal shifts and dampen trends, particularly in non-stationary time series. It could be informative to provide supplementary figures with alternative window lengths (e.g., 10 or 20 years) or without smoothing, to demonstrate the stability of trends.
Citation: https://doi.org/10.5194/egusphere-2025-1603-RC3 -
AC4: 'Reply on RC3', Michael Scheuerer, 08 Jul 2025
Thank you for your comments and suggestions! I'll reply to the individual comments below, suggested changes will be included in the revised version of the manuscript:
- The authors provide an overview of process-based hydrological models and data-driven approaches, including LSTMs. One potential improvement is to elaborate slightly more on why existing hydrological models could not be adapted in this context.
Using hydrological models is certainly possible, and the first community comment has pointed us to papers that calculate projections of future streamflow over South America using that approach. However, this requires both expertise with the respective local hydroclimate and a substantial amount of time to calibrate that model for all catchments. Since our project partners at Statkraft want to use this model in several different parts of the world and often have to provide a first iteration of future streamflow simulations rather quickly, there was a desire for an approach that can more easily be transferred to different regions. This was one of the primary motivations for this work, and we will expand our explanations to make this more clear.
- In the introduction, the authors emphasize the limitations of LSTM models, characterizing them as “black boxes.” However, in this study, LSTMs or neural networks are not applied to make direct predictions but rather to learn coefficient embeddings for a linear regression model. Moreover, there are many established approaches to improve the interpretability of neural networks. I suggest that the authors compare existing explainability methods (e.g., attention mechanisms, feature attribution techniques) with the embedding approach adopted in this study, to more clearly situate the method within the broader context of explainable machine learning.
We will expand this section and add some comments and references on explainability and interpretability. In our understanding, explainability methods can help better understand the sensitivity of the output to the various inputs, but cannot make it interpretable to the same degree as a process-based hydrological or linear statistical model where one has a clear, intuitive understanding of the model’s decisions. We’ll also add a sentence that makes it clear where our proposed method is situated w.r.t. explainable machine learning approaches.
- Please briefly explain how missing months or gaps in time series were handled in the regressions.
In our setup, missing values only occurred in the context that the streamflow data looked suspicious (identical values across the majority of years) for a small number of combinations of months and catchments. In those cases, we removed the entire month-catchment combination from the analysis. The case where only a few years for a given month-catchment combination are missing did not occur in our study, but should not pose any problems as long as the standardization (section 3.1) can be calculated in a robust way. The regression model (especially the one fitted within a neural network framework using month and catchment embeddings) can be fitted with the missing years removed from the training data set.
- Figure 5 is helpful, but it would improve understanding to provide a clearer description of the dimensionality of embeddings and dense layer outputs in the main text rather than in the appendix.
We will provide that information in the main text discussing Figure 5 to give an idea about the typical hyperparameter values early on, while still referring to the ‘Hyperparameter’ subsection and the Appendix for the technical details of how these values were obtained.
- Figure 7 shows the percentiles of the coefficients of determination across all catchments. It is noted that in July, the model performance is relatively poor, with many catchments having values even below 0.03. Please check whether this is correct, and provide an explanation for this phenomenon. How this affects the robustness of long-term projections.
We believe that this is correct, and is a result of typically low precipitation amounts in July over large parts of Brazil (see Figure 2), which makes modeling the rainfall-runoff relation more difficult than in other months, and more dependent on long-term storage mechanisms and possibly other factors (e.g. more impact from reservoir operations that was not fully accounted for in the available streamflow series). We will expand the discussion of the negative implications for long-term projections and associated uncertainty (section 5), and add a sentence to the conclusion to make it clear that this is where more complex approaches may have the most potential for improvement.
- The horizontal and vertical axis labels and units in Figure 8 and Figure 9 are missing.
The horizontal axis label (year) should be self-explanatory, for the vertical axis we will add labels and units to the two left panels in the revised version.
- Figure 11 analyzed the predictor contributions which is insightful. It is noted that this decomposition does not capture interaction effects, which could be a limitation. For example, the temperature emerges as the primary driver of projected declines in streamflow. However, this may be partly an artifact of the linear model structure. Have the authors tested alternative formulations, such as including interaction terms (e.g., temperature × precipitation) or exploring nonlinear relationships?
In earlier stages of the project, we have discussed and tested more sophisticated ways to model the impact of temperature changes on streamflow through increased evapotranspiration which also take interactions between temperature and precipitation changes into account. However, these attempts with more complex (non-linear) approaches were not successful, likely due to the limited data and often poor signal-to-noise ratio, and we therefore decided to move forward with a simple linear model. We agree though that there is a potential danger of an omitted variable bias with our model, and we will add this caveat to the discussion of Figure 11.
- While smoothing year-to-year variability is understandable, applying centered 30-year moving averages can mask decadal shifts and dampen trends, particularly in non-stationary time series. It could be informative to provide supplementary figures with alternative window lengths (e.g., 10 or 20 years) or without smoothing, to demonstrate the stability of trends.
We have attached supplementary figures with the suggested alternative window lengths.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
411 | 65 | 26 | 502 | 11 | 21 |
- HTML: 411
- PDF: 65
- XML: 26
- Total: 502
- BibTeX: 11
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1