the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Regionalization of GR4J model parameters for river flow prediction in Paraná, Brazil
Abstract. Regionalization methods dependent on hydrological models comprise techniques for transferring calibrated parameters in instrumented watersheds (donor basins) to non-instrumented watersheds (target basins). This study aims to evaluate regionalization methods for transferring GR4J parameters and predict river flow in catchments from the south of Brazil. We created a dataset for Paraná state with daily hydrological time series (precipitation, evapotranspiration, and river flow) and watershed physiographic and climatological indices for 126 catchments. Rigorous quality control techniques were applied to recover the rainfall history from 1979 to 2020, and manual efforts were made to georeference the fluviometric stations. The regionalization methods compared in this study are based on: simple spatial proximity, physiographic-climatic similarity and regression by Random Forest. Direct regression of Q95 was calculated using Random Forest and compared with indirect methods, i.e. using regionalization of GR4J parameters. A set of 100 basins were used to train the regionalization models and another 26 catchments, pseudo non-instrumented, were used to evaluate and compare the performance of regionalizations. The GR4J model showed acceptable performances for the sample of 126 catchments, 65 % of watersheds presented log-transformed Nash-Sutcliffe coefficient greater than 0.70 during validation period. According to evaluation carried out for the sample of 26 basins, regionalization based on physiographic-climatic similarity showed to be the most robust method for prediction of daily and Q95 reference flow in basins from Paraná state. When increasing the number of donor basins, the method based on spatial proximity has comparable performance to the method based on physiographic-climatic similarity. Based on the physiographic-climatic characteristics of the basins, it was possible to classify 6 distinct groups of watersheds in Paraná. The basins showed similarities in their size, forest cover, urban area, number of days with more than 150 mm of precipitation, and average duration of consecutive dry days.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(4475 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4475 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
CC1: 'Comment on egusphere-2023-1755', John Ding, 23 Aug 2023
Nash-Sutcliffe efficiency and its logNSE and sqrtNSE variants
For GR4J model calibration, the authors apply the logarithmic variant (logNSE) of the classical Nash-Sutcliffe coefficient or efficiency: NSE=1-F/F0, where F and F0 are residual and initial variance, respectively. logNSE variant uses log transformation of the streamflow Q, and sqrtNSE the square root one (Sect. 3.4).
Table 2 shows the median calibrated NSE value of 0.621 for 100 training watersheds out of a total of 126 in State of Paraná, Brazil.
In my view, what NSE lacks is an additional benchmark (model), BMM, which is physically more realistic than the implicit one of an observed mean flow shown in Equation 1. (Note: the leading expression "1-" is missing from the right hand side of the equation.) This will help interpret the intermediate NSE values between 0 and 1 (Sect. 5.1 and Figure 5).
I’ve put forward a 1-step forecast as such a BMM, a simplest second-order autoregressive (AR) process of the streamflow Q, called AR(2) or AR2 . This is expressed as: Qar2[t+1] =2Qobs[t]-Qobs[t-1], e.g. , Mizukami et al, 2019, SC1 therein; Cinkus et al., 2023, CC2 therein.
In a future study, the authors may wish to explore the utility of this AR2 alternative, following Cinkus et al., 2023, AC2, Page 3. A dual NSE-AR2 efficiency scale than a sole NSE one would better measure the performance of a model simulation. For the purpose of this open discussion, a pilot study would suffice for the watershed having the median NSE value of 0.621 for one year or more from 1979-2020.
References
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023.
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019.
Citation: https://doi.org/10.5194/egusphere-2023-1755-CC1 -
AC1: 'Reply on CC1', Emilio Graciliano Ferreira Mercuri, 28 Aug 2023
Here we present the authors response to CC1: 'Comment on egusphere-2023-1755', John Ding, 23 Aug 2023, about Nash-Sutcliffe efficiency and its logNSE and sqrtNSE variants.
We decided to use the logNSE for both calibration and evaluation of the model. This choice was made because logNSE is more sensitive to low flows (Oudin et al., 2008) and since we are experiencing draughts recently in Paraná State, Brazil, this sounded like a good choice. However, we also used the following other criteria to evaluate the model: Pearson Correlation Coefficient (R), Nash-Sutcliffe Coefficient (NSE) and sqrtNSE.
Table 2 shows the median calibrated NSE value of 0.621 for 26 watersheds (from validation set) out of a total of 126 in State of Paraná, Brazil. This information wasn’t totally clear in the manuscript, and we have rewritten it in the new version that will be uploaded soon.
Thanks for correcting Equation 1, the leading expression "1-" was truly missing and the new version of the manuscript is corrected.
The main idea of the article was to compare regionalization techniques, not to compare model performance metrics. We agree that according to Cinkus et al. (2023) and Mizukami et al., (2019) the choice of performance metric matters for model evaluation, that’s why we have chosen 4 metrics: R, NSE, logNSE, and sqrtNSE. However, we wanted to see which regionalization procedure for GR4J parameters was more efficient, and we showed that for Paraná catchments the Similarity approach was slightly better.
To use an autoregressive (AR) model, as suggested, we would need to know the flow in the watershed where we suppose we don’t have flow data. So, we don’t think it is a good idea to be implemented as extra work for this research/manuscript, maybe some future work.
Thanks for the suggestions, we will consider them for a future study.
References:
Oudin, L., Andréassian, V., Perrin, C., Michel, C., & Le Moine, N. (2008). Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments. Water resources research, 44(3).
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023.
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019.
Citation: https://doi.org/10.5194/egusphere-2023-1755-AC1
-
AC1: 'Reply on CC1', Emilio Graciliano Ferreira Mercuri, 28 Aug 2023
-
RC1: 'Comment on egusphere-2023-1755', Anonymous Referee #1, 06 Oct 2023
The current manuscript presents the results of the parameter regionalisation of the GR4J model based on 126 catchments in Parana, Brazil. Although the parameter regionalisation of a hydrological model is important, the current manuscript is more of a technical report than a scientific paper.
1.The comparison between three different regionalisation approaches was presented without any interpretation as to why one is better than the other.2.In addition, the scientific issues and innovations were not clearly presented in the introduction and conclusion. What is the additional knowledge the current work bring to the literature? Whether the results are common or are only fit for GR4J model? These issues are not clarified or discussed.
3.The results for the random forest are not clearly presented, e.g. whether the model is overfitted? how the author chose the training, validation and test examples?
4.Furthermore, it is not investigated whether the result is sensitive to the choice of training and validation sets in Figure 4.Therefore, although the author has done extensive work on the data processing, the scientific questions are not sufficient for its publishing in HESS journal.
Citation: https://doi.org/10.5194/egusphere-2023-1755-RC1 -
AC2: 'Reply on RC1', Emilio Graciliano Ferreira Mercuri, 13 Oct 2023
Here we present the authors response to RC1: 'Comment on egusphere-2023-1755', Anonymous Referee #1, 06 Oct 2023.
We are grateful for the comments and contributions made by Anonymous Referee #1. However, we respectfully disagree with the argument that the manuscript is more of a technical report than a scientific paper. We have proposed a scientific study to compare hydrological regionalization methods for ungauged basins and to understand watershed physiographic-climatic similarity. Here we present the response to each specific comment:
- We have shown by the comparison between three different regionalisation approaches that with a fewer number of donor basins the technique based on physiographic-climatic similarity is more accurate for prediction of daily and Q95 reference flows in basins from Paraná state, Brazil. When increasing the number of donor basins, the method based on spatial proximity has a comparable performance to the method based on physiographic-climatic similarity. In that sense and because there are usually not many donor basins available, we can state that the physiographic-climatic similarity method is superior to the others for the study site.
- Thanks for the comment, we agree that some scientific issues and innovations were not clearly presented in the introduction and conclusion. Nevertheless, the additional knowledge our current work brings to the literature is that we have characterized 6 different types of catchments in Paraná State, based on catchment descriptors. Gathering data (39 descriptive indices divided in 4 categories: physiographic, climatological, land use / land cover, and soil type) and grouping the watersheds based on physiographic-climatic indices is an innovative contribution to scientifically understand the hydrology of one of the water richest regions of the world. The transfer of GR4J constants and model parameters between basins of the same cluster was made with a rigorous criterion since we have divided the clusters into 100 training basins and 26 validation basins. One major innovation is that we have used techniques from machine learning (K-mean and Random Forest methods) to perform the regionalisation procedure. The results obtained in our work are probably valid for other hydrological models since they are based on the watersheds physiographic-climatic similarity. The verification of this hypothesis is a future work that is mentioned in the last paragraph of the conclusion. These innovations and scientific contributions were added to the introduction and conclusion of the new version of the manuscript.
- Random Forests (RF) are an ensemble learning technique that combines the predictions of multiple decision trees to make more robust and accurate predictions. They incorporate two key mechanisms to reduce overfitting: i) Bagging (Bootstrap Aggregating): Random forests train each decision tree on a random subset of the data, with replacement. This means that each tree sees a slightly different portion of the data, reducing the likelihood of any single tree overfitting to the training data; ii) Feature Randomization: In addition to using bootstrapped samples of the data, random forests also consider only a random subset of the features (variables) for each split in the decision tree. This further reduces the tendency of the model to overfit to specific features (Hastie et al., 2009). The original article by Breiman (2001) states: “use of the Strong Law of Large Numbers shows that RF always converge so that overfitting is not a problem”. All RF models applied in our study have used 80% of the data to the training set 20% of the data to the validation set. We used 1,000 decision trees and we limited the number of relevant variables to those who received weights (importance) greater than 0.01. The maximum tree depth was also limited: nodes were expanded until all leaves were pure or until all leaves contain less than two samples. This information was reinforced in the new version of the manuscript.
- The choice of training and validation sets shown in Figure 4 is a common practice in machine learning. We have decided the percentage allocated to each set to ensure that our regionalisation generalizes well to unseen basins. This procedure is a precaution that few regionalisation works adopt. In summary: we have used a typical choice of 80% of the data to the training set and we have allocated 20% of the data to the validation set.
Some of the research questions that our study can answer are:
- Which is the best streamflow regionalisation method for Brazilian watersheds with humid subtropical (Cfa) and oceanic (Cfb) climates?
- If there is physiographic-climatic similarity between two basins (one with flow data and another without flow measurements): can we transfer hydrological model constants from the monitored catchment to calculate flow in the unmonitored basin?
- Are the catchments from Paraná State similar? Is it possible to group them into homogeneous clusters with the same hydrological characteristics?
The extensive work on data processing performed by the authors provided a State-of-the-Art dataset for our study site, adding novelty to the field of hydrological characterization and classification. Our research shows that basins with similar physiographic-climatic characteristics are more prone to be chosen as candidates for flow regionalization. We provided a proof-of-concept that basins without flow monitoring can have a good approximation of streamflow if other physiographic-climatic indices are provided. Furthermore, we have shown that machine learning algorithms perform better with physiographic-climatic indices as inputs. All these arguments will be incorporated in the new version of the manuscript.
With respect, we find the reviewer’s feedback superficial. None of the points addressed by the reviewer hint to misinterpretations in our work. The scientific value is discussed above. Given the journals aims: “HESS encourages and supports fundamental and applied research that advances the understanding of hydrological systems…”, which we cover, we grade the final statement of the reviewer questionable.
References:
- Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
- Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R., and Friedman, J. (2009). Random forests. The elements of statistical learning: Data mining, inference, and prediction, 587-604.
Citation: https://doi.org/10.5194/egusphere-2023-1755-AC2
-
AC2: 'Reply on RC1', Emilio Graciliano Ferreira Mercuri, 13 Oct 2023
-
RC2: 'Comment on egusphere-2023-1755', Juraj Parajka, 20 Oct 2023
General comments
The study examines performance of selected regionalisation methods for prediction of runoff hydrographs and low flows in 126 catchments in Parana region (Brazil). The results show that regionalization based on physiographic-climatic similarity has the best performance.
In my opinion, the topic of prediction of runoff hydrographs and low flows in subtropical and hot temperate climate is a very interesting topic, which is worth to be published. However, the manuscript in its current form needs a very substantial revision to clearly demonstrate the novel scientific contribution. In its current form, it is likely more suitable for journals presenting regional studies and comparative hydrology. Or the presentation and demonstration of novel scientific and methodological contribution needs to be improved. The Introduction needs a very precise formulation what the current research gaps are and how this study goes beyond the existing literature. I would suggest to highlight the need to better understand the transfer of hydrologic models in subtropical climate, which has a very distinct and specific runoff generation mechanisms (compared to most of the previous studies). The need and context for examining the low flows needs to be clarified and justified. Existing studies and methods related to predictions of low flows in ungauged basins are not presented in sufficient detail
The manuscript has a non-traditional structure and thus is very difficult to read. The logic of section order is not clear. I would suggest to structure the manuscript to Introduction ->Data (Study region)-> Methods->Results-> Discussion and-> Conclusions. In its current form the section 2 can be part of to the Introduction (and partly to methods), but i would suggest that the review needs to be focused on the research objectives (prediction of daily runoff hydrographs and low flows). Study region description can make more precise context to the climate and main runoff generation mechanisms of the study region. The description of the seasonality of low flows can help to understand the hydrological processes in the study region .In its current form, it is not clear why it is important to examine regionalisation of low flows in the study region.
The Method section (indicated as section 3) (and subsections) can be improved. In its current form the structure of methods is confusing. For example it includes catchment descriptors, which belongs to Data section. The regionalisation methods are a separate section but in my opinion this is a part of the methodology. Description of the model is completely missing. The structure of this part of the manuscript needs to be definitely revised and improved.
It will be interesting to interpret in more detail the results from the hydrological perspective, i.e. from the context of dominant runoff generation processes and how these are described by different regionalisation methods.
I completely missed the Discussion section. Still this part is essential to demonstrate the novel scientific contribution of the study. The results needs to be compared and linked with previous relevant studies which will allow the readers to clearly see the novel contribution of the study.
I believe the manuscript has a potential to be turned into an interesting and novel scientific contribution. Still the manuscript needs a very substantial revision to demonstrate the current research gaps and novel contributions of the presented analyses.
Citation: https://doi.org/10.5194/egusphere-2023-1755-RC2 -
AC3: 'Reply on RC2', Emilio Graciliano Ferreira Mercuri, 10 Nov 2023
Here we present the authors response to RC2: https://doi.org/10.5194/egusphere-2023-1755-RC1, Juraj Parajka, 20 Oct 2023.
We are grateful for the comments and contributions made by Juraj Parajka. We agree with the points highlighted by RC2 and we will incorporate the suggestions provided, according to the following:
The introduction will be improved to highlight the research gaps and how our study goes beyond the existing literature, which consists of the following points:
1. The construction of a State-of-the-Art dataset for 126 watersheds in Paraná State containing for each catchment: streamflow, precipitation, evapotranspiration daily data and 39 descriptive indices divided in 4 categories: physiographic, climatological, land use / land cover, and soil type.
2. The need to better understand regionalization techniques in subtropical climate, which has a very distinct and specific runoff generation mechanisms, compared previous studies.
3. The classification of Parana catchments into 6 groups based on physiographic-climatic similarity. Our results show that despite being basins located in the same region of Brazil, there are heterogeneities related to relief and land use, which are important to hydrological processes.
4. The use of machine learning methods to improve the transfer of hydrological model constants between similar basins.
The study of low flows and droughts is critical in the context of water availability in Brazil, river dams and reservoirs are used for hydroelectric generation (70 % of Brazilian energy sector), to provide drinking water for population, irrigate crops, and distribute water for industrial use. The Paraná basin, a major hydroelectric producing region with 32 % (60 million people) of Brazil's population, recently experienced the most severe drought since the 1960s, compromising the water supply for 11 million people in São Paulo (Melo, et al. 2016). The focus of high flows is also important in the region because of the recurrent occurrence of floods (Stevaux et al. 2009). The influence of climate change in streamflow regime is unclear for Paraná state and reinforce the need for further research development. Blöschl et al. (2019) have shown that changing climate both increases and decreases European river floods, however we haven’t found any study analysing data from the south of Brazil. We agree that the description of the seasonality of low flows can help to understand the hydrological processes in the study region, but our focus was motivated by the recent droughts that occurred in the south of Brazil (Cunha et al. 2019) and the impact it was on the population and on the economy of the country.
We agree with the suggestion to change the structure of the manuscript to Introduction ->Data (Study region)-> Methods->Results-> Discussion and-> Conclusions. This includes reorganizing the catchment descriptors into the Data section, put regionalisation methods and GR4J model description into the Methods section and enlarge the Discussion with our novel contributions. We can also focus more the review on the research objectives: prediction of daily runoff hydrographs and low flows.
We thank the suggestion to interpret in more detail the results from the context of dominant runoff generation processes and how these are described by different regionalisation methods, however we think this might be considered a future study for our research group.
References:
Melo, D. D. C., Scanlon, B. R., Zhang, Z., Wendland, E., & Yin, L. (2016). Reservoir storage and hydrologic responses to droughts in the Paraná River basin, south-eastern Brazil. Hydrology and Earth System Sciences, 20(11), 4673-4688.
Blöschl, G., Hall, J., Viglione, A., Perdigão, R. A., Parajka, J., Merz, B., ... & Živković, N. (2019). Changing climate both increases and decreases European river floods. Nature, 573(7772), 108-111.
Stevaux, J. C., Latrubesse, E. M., Hermann, M. L. D. P., & Aquino, S. (2009). Floods in urban areas of Brazil. Developments in Earth Surface Processes, 13, 245-266.
Cunha, A. P. M., Zeri, M., Deusdará Leal, K., Costa, L., Cuartas, L. A., Marengo, J. A., ... & Ribeiro-Neto, G. (2019). Extreme drought events over Brazil from 2011 to 2019. Atmosphere, 10(11), 642.Citation: https://doi.org/10.5194/egusphere-2023-1755-AC3
-
AC3: 'Reply on RC2', Emilio Graciliano Ferreira Mercuri, 10 Nov 2023
Interactive discussion
Status: closed
-
CC1: 'Comment on egusphere-2023-1755', John Ding, 23 Aug 2023
Nash-Sutcliffe efficiency and its logNSE and sqrtNSE variants
For GR4J model calibration, the authors apply the logarithmic variant (logNSE) of the classical Nash-Sutcliffe coefficient or efficiency: NSE=1-F/F0, where F and F0 are residual and initial variance, respectively. logNSE variant uses log transformation of the streamflow Q, and sqrtNSE the square root one (Sect. 3.4).
Table 2 shows the median calibrated NSE value of 0.621 for 100 training watersheds out of a total of 126 in State of Paraná, Brazil.
In my view, what NSE lacks is an additional benchmark (model), BMM, which is physically more realistic than the implicit one of an observed mean flow shown in Equation 1. (Note: the leading expression "1-" is missing from the right hand side of the equation.) This will help interpret the intermediate NSE values between 0 and 1 (Sect. 5.1 and Figure 5).
I’ve put forward a 1-step forecast as such a BMM, a simplest second-order autoregressive (AR) process of the streamflow Q, called AR(2) or AR2 . This is expressed as: Qar2[t+1] =2Qobs[t]-Qobs[t-1], e.g. , Mizukami et al, 2019, SC1 therein; Cinkus et al., 2023, CC2 therein.
In a future study, the authors may wish to explore the utility of this AR2 alternative, following Cinkus et al., 2023, AC2, Page 3. A dual NSE-AR2 efficiency scale than a sole NSE one would better measure the performance of a model simulation. For the purpose of this open discussion, a pilot study would suffice for the watershed having the median NSE value of 0.621 for one year or more from 1979-2020.
References
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023.
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019.
Citation: https://doi.org/10.5194/egusphere-2023-1755-CC1 -
AC1: 'Reply on CC1', Emilio Graciliano Ferreira Mercuri, 28 Aug 2023
Here we present the authors response to CC1: 'Comment on egusphere-2023-1755', John Ding, 23 Aug 2023, about Nash-Sutcliffe efficiency and its logNSE and sqrtNSE variants.
We decided to use the logNSE for both calibration and evaluation of the model. This choice was made because logNSE is more sensitive to low flows (Oudin et al., 2008) and since we are experiencing draughts recently in Paraná State, Brazil, this sounded like a good choice. However, we also used the following other criteria to evaluate the model: Pearson Correlation Coefficient (R), Nash-Sutcliffe Coefficient (NSE) and sqrtNSE.
Table 2 shows the median calibrated NSE value of 0.621 for 26 watersheds (from validation set) out of a total of 126 in State of Paraná, Brazil. This information wasn’t totally clear in the manuscript, and we have rewritten it in the new version that will be uploaded soon.
Thanks for correcting Equation 1, the leading expression "1-" was truly missing and the new version of the manuscript is corrected.
The main idea of the article was to compare regionalization techniques, not to compare model performance metrics. We agree that according to Cinkus et al. (2023) and Mizukami et al., (2019) the choice of performance metric matters for model evaluation, that’s why we have chosen 4 metrics: R, NSE, logNSE, and sqrtNSE. However, we wanted to see which regionalization procedure for GR4J parameters was more efficient, and we showed that for Paraná catchments the Similarity approach was slightly better.
To use an autoregressive (AR) model, as suggested, we would need to know the flow in the watershed where we suppose we don’t have flow data. So, we don’t think it is a good idea to be implemented as extra work for this research/manuscript, maybe some future work.
Thanks for the suggestions, we will consider them for a future study.
References:
Oudin, L., Andréassian, V., Perrin, C., Michel, C., & Le Moine, N. (2008). Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments. Water resources research, 44(3).
Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, 2023.
Mizukami, N., Rakovec, O., Newman, A. J., Clark, M. P., Wood, A. W., Gupta, H. V., and Kumar, R.: On the choice of calibration metrics for “high-flow” estimation using hydrologic models, Hydrol. Earth Syst. Sci., 23, 2601–2614, https://doi.org/10.5194/hess-23-2601-2019, 2019.
Citation: https://doi.org/10.5194/egusphere-2023-1755-AC1
-
AC1: 'Reply on CC1', Emilio Graciliano Ferreira Mercuri, 28 Aug 2023
-
RC1: 'Comment on egusphere-2023-1755', Anonymous Referee #1, 06 Oct 2023
The current manuscript presents the results of the parameter regionalisation of the GR4J model based on 126 catchments in Parana, Brazil. Although the parameter regionalisation of a hydrological model is important, the current manuscript is more of a technical report than a scientific paper.
1.The comparison between three different regionalisation approaches was presented without any interpretation as to why one is better than the other.2.In addition, the scientific issues and innovations were not clearly presented in the introduction and conclusion. What is the additional knowledge the current work bring to the literature? Whether the results are common or are only fit for GR4J model? These issues are not clarified or discussed.
3.The results for the random forest are not clearly presented, e.g. whether the model is overfitted? how the author chose the training, validation and test examples?
4.Furthermore, it is not investigated whether the result is sensitive to the choice of training and validation sets in Figure 4.Therefore, although the author has done extensive work on the data processing, the scientific questions are not sufficient for its publishing in HESS journal.
Citation: https://doi.org/10.5194/egusphere-2023-1755-RC1 -
AC2: 'Reply on RC1', Emilio Graciliano Ferreira Mercuri, 13 Oct 2023
Here we present the authors response to RC1: 'Comment on egusphere-2023-1755', Anonymous Referee #1, 06 Oct 2023.
We are grateful for the comments and contributions made by Anonymous Referee #1. However, we respectfully disagree with the argument that the manuscript is more of a technical report than a scientific paper. We have proposed a scientific study to compare hydrological regionalization methods for ungauged basins and to understand watershed physiographic-climatic similarity. Here we present the response to each specific comment:
- We have shown by the comparison between three different regionalisation approaches that with a fewer number of donor basins the technique based on physiographic-climatic similarity is more accurate for prediction of daily and Q95 reference flows in basins from Paraná state, Brazil. When increasing the number of donor basins, the method based on spatial proximity has a comparable performance to the method based on physiographic-climatic similarity. In that sense and because there are usually not many donor basins available, we can state that the physiographic-climatic similarity method is superior to the others for the study site.
- Thanks for the comment, we agree that some scientific issues and innovations were not clearly presented in the introduction and conclusion. Nevertheless, the additional knowledge our current work brings to the literature is that we have characterized 6 different types of catchments in Paraná State, based on catchment descriptors. Gathering data (39 descriptive indices divided in 4 categories: physiographic, climatological, land use / land cover, and soil type) and grouping the watersheds based on physiographic-climatic indices is an innovative contribution to scientifically understand the hydrology of one of the water richest regions of the world. The transfer of GR4J constants and model parameters between basins of the same cluster was made with a rigorous criterion since we have divided the clusters into 100 training basins and 26 validation basins. One major innovation is that we have used techniques from machine learning (K-mean and Random Forest methods) to perform the regionalisation procedure. The results obtained in our work are probably valid for other hydrological models since they are based on the watersheds physiographic-climatic similarity. The verification of this hypothesis is a future work that is mentioned in the last paragraph of the conclusion. These innovations and scientific contributions were added to the introduction and conclusion of the new version of the manuscript.
- Random Forests (RF) are an ensemble learning technique that combines the predictions of multiple decision trees to make more robust and accurate predictions. They incorporate two key mechanisms to reduce overfitting: i) Bagging (Bootstrap Aggregating): Random forests train each decision tree on a random subset of the data, with replacement. This means that each tree sees a slightly different portion of the data, reducing the likelihood of any single tree overfitting to the training data; ii) Feature Randomization: In addition to using bootstrapped samples of the data, random forests also consider only a random subset of the features (variables) for each split in the decision tree. This further reduces the tendency of the model to overfit to specific features (Hastie et al., 2009). The original article by Breiman (2001) states: “use of the Strong Law of Large Numbers shows that RF always converge so that overfitting is not a problem”. All RF models applied in our study have used 80% of the data to the training set 20% of the data to the validation set. We used 1,000 decision trees and we limited the number of relevant variables to those who received weights (importance) greater than 0.01. The maximum tree depth was also limited: nodes were expanded until all leaves were pure or until all leaves contain less than two samples. This information was reinforced in the new version of the manuscript.
- The choice of training and validation sets shown in Figure 4 is a common practice in machine learning. We have decided the percentage allocated to each set to ensure that our regionalisation generalizes well to unseen basins. This procedure is a precaution that few regionalisation works adopt. In summary: we have used a typical choice of 80% of the data to the training set and we have allocated 20% of the data to the validation set.
Some of the research questions that our study can answer are:
- Which is the best streamflow regionalisation method for Brazilian watersheds with humid subtropical (Cfa) and oceanic (Cfb) climates?
- If there is physiographic-climatic similarity between two basins (one with flow data and another without flow measurements): can we transfer hydrological model constants from the monitored catchment to calculate flow in the unmonitored basin?
- Are the catchments from Paraná State similar? Is it possible to group them into homogeneous clusters with the same hydrological characteristics?
The extensive work on data processing performed by the authors provided a State-of-the-Art dataset for our study site, adding novelty to the field of hydrological characterization and classification. Our research shows that basins with similar physiographic-climatic characteristics are more prone to be chosen as candidates for flow regionalization. We provided a proof-of-concept that basins without flow monitoring can have a good approximation of streamflow if other physiographic-climatic indices are provided. Furthermore, we have shown that machine learning algorithms perform better with physiographic-climatic indices as inputs. All these arguments will be incorporated in the new version of the manuscript.
With respect, we find the reviewer’s feedback superficial. None of the points addressed by the reviewer hint to misinterpretations in our work. The scientific value is discussed above. Given the journals aims: “HESS encourages and supports fundamental and applied research that advances the understanding of hydrological systems…”, which we cover, we grade the final statement of the reviewer questionable.
References:
- Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
- Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R., and Friedman, J. (2009). Random forests. The elements of statistical learning: Data mining, inference, and prediction, 587-604.
Citation: https://doi.org/10.5194/egusphere-2023-1755-AC2
-
AC2: 'Reply on RC1', Emilio Graciliano Ferreira Mercuri, 13 Oct 2023
-
RC2: 'Comment on egusphere-2023-1755', Juraj Parajka, 20 Oct 2023
General comments
The study examines performance of selected regionalisation methods for prediction of runoff hydrographs and low flows in 126 catchments in Parana region (Brazil). The results show that regionalization based on physiographic-climatic similarity has the best performance.
In my opinion, the topic of prediction of runoff hydrographs and low flows in subtropical and hot temperate climate is a very interesting topic, which is worth to be published. However, the manuscript in its current form needs a very substantial revision to clearly demonstrate the novel scientific contribution. In its current form, it is likely more suitable for journals presenting regional studies and comparative hydrology. Or the presentation and demonstration of novel scientific and methodological contribution needs to be improved. The Introduction needs a very precise formulation what the current research gaps are and how this study goes beyond the existing literature. I would suggest to highlight the need to better understand the transfer of hydrologic models in subtropical climate, which has a very distinct and specific runoff generation mechanisms (compared to most of the previous studies). The need and context for examining the low flows needs to be clarified and justified. Existing studies and methods related to predictions of low flows in ungauged basins are not presented in sufficient detail
The manuscript has a non-traditional structure and thus is very difficult to read. The logic of section order is not clear. I would suggest to structure the manuscript to Introduction ->Data (Study region)-> Methods->Results-> Discussion and-> Conclusions. In its current form the section 2 can be part of to the Introduction (and partly to methods), but i would suggest that the review needs to be focused on the research objectives (prediction of daily runoff hydrographs and low flows). Study region description can make more precise context to the climate and main runoff generation mechanisms of the study region. The description of the seasonality of low flows can help to understand the hydrological processes in the study region .In its current form, it is not clear why it is important to examine regionalisation of low flows in the study region.
The Method section (indicated as section 3) (and subsections) can be improved. In its current form the structure of methods is confusing. For example it includes catchment descriptors, which belongs to Data section. The regionalisation methods are a separate section but in my opinion this is a part of the methodology. Description of the model is completely missing. The structure of this part of the manuscript needs to be definitely revised and improved.
It will be interesting to interpret in more detail the results from the hydrological perspective, i.e. from the context of dominant runoff generation processes and how these are described by different regionalisation methods.
I completely missed the Discussion section. Still this part is essential to demonstrate the novel scientific contribution of the study. The results needs to be compared and linked with previous relevant studies which will allow the readers to clearly see the novel contribution of the study.
I believe the manuscript has a potential to be turned into an interesting and novel scientific contribution. Still the manuscript needs a very substantial revision to demonstrate the current research gaps and novel contributions of the presented analyses.
Citation: https://doi.org/10.5194/egusphere-2023-1755-RC2 -
AC3: 'Reply on RC2', Emilio Graciliano Ferreira Mercuri, 10 Nov 2023
Here we present the authors response to RC2: https://doi.org/10.5194/egusphere-2023-1755-RC1, Juraj Parajka, 20 Oct 2023.
We are grateful for the comments and contributions made by Juraj Parajka. We agree with the points highlighted by RC2 and we will incorporate the suggestions provided, according to the following:
The introduction will be improved to highlight the research gaps and how our study goes beyond the existing literature, which consists of the following points:
1. The construction of a State-of-the-Art dataset for 126 watersheds in Paraná State containing for each catchment: streamflow, precipitation, evapotranspiration daily data and 39 descriptive indices divided in 4 categories: physiographic, climatological, land use / land cover, and soil type.
2. The need to better understand regionalization techniques in subtropical climate, which has a very distinct and specific runoff generation mechanisms, compared previous studies.
3. The classification of Parana catchments into 6 groups based on physiographic-climatic similarity. Our results show that despite being basins located in the same region of Brazil, there are heterogeneities related to relief and land use, which are important to hydrological processes.
4. The use of machine learning methods to improve the transfer of hydrological model constants between similar basins.
The study of low flows and droughts is critical in the context of water availability in Brazil, river dams and reservoirs are used for hydroelectric generation (70 % of Brazilian energy sector), to provide drinking water for population, irrigate crops, and distribute water for industrial use. The Paraná basin, a major hydroelectric producing region with 32 % (60 million people) of Brazil's population, recently experienced the most severe drought since the 1960s, compromising the water supply for 11 million people in São Paulo (Melo, et al. 2016). The focus of high flows is also important in the region because of the recurrent occurrence of floods (Stevaux et al. 2009). The influence of climate change in streamflow regime is unclear for Paraná state and reinforce the need for further research development. Blöschl et al. (2019) have shown that changing climate both increases and decreases European river floods, however we haven’t found any study analysing data from the south of Brazil. We agree that the description of the seasonality of low flows can help to understand the hydrological processes in the study region, but our focus was motivated by the recent droughts that occurred in the south of Brazil (Cunha et al. 2019) and the impact it was on the population and on the economy of the country.
We agree with the suggestion to change the structure of the manuscript to Introduction ->Data (Study region)-> Methods->Results-> Discussion and-> Conclusions. This includes reorganizing the catchment descriptors into the Data section, put regionalisation methods and GR4J model description into the Methods section and enlarge the Discussion with our novel contributions. We can also focus more the review on the research objectives: prediction of daily runoff hydrographs and low flows.
We thank the suggestion to interpret in more detail the results from the context of dominant runoff generation processes and how these are described by different regionalisation methods, however we think this might be considered a future study for our research group.
References:
Melo, D. D. C., Scanlon, B. R., Zhang, Z., Wendland, E., & Yin, L. (2016). Reservoir storage and hydrologic responses to droughts in the Paraná River basin, south-eastern Brazil. Hydrology and Earth System Sciences, 20(11), 4673-4688.
Blöschl, G., Hall, J., Viglione, A., Perdigão, R. A., Parajka, J., Merz, B., ... & Živković, N. (2019). Changing climate both increases and decreases European river floods. Nature, 573(7772), 108-111.
Stevaux, J. C., Latrubesse, E. M., Hermann, M. L. D. P., & Aquino, S. (2009). Floods in urban areas of Brazil. Developments in Earth Surface Processes, 13, 245-266.
Cunha, A. P. M., Zeri, M., Deusdará Leal, K., Costa, L., Cuartas, L. A., Marengo, J. A., ... & Ribeiro-Neto, G. (2019). Extreme drought events over Brazil from 2011 to 2019. Atmosphere, 10(11), 642.Citation: https://doi.org/10.5194/egusphere-2023-1755-AC3
-
AC3: 'Reply on RC2', Emilio Graciliano Ferreira Mercuri, 10 Nov 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
427 | 177 | 42 | 646 | 24 | 30 |
- HTML: 427
- PDF: 177
- XML: 42
- Total: 646
- BibTeX: 24
- EndNote: 30
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Louise Akemi Kuana
Arlan Scortegagna Almeida
Emilio Graciliano Ferreira Mercuri
Steffen Manfred Noe
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4475 KB) - Metadata XML