Comprehensive Global Assessment of 23 Gridded Precipitation Datasets Across 16,295 Catchments Using Hydrological Modeling

Abbas, Ather; Yang, Yuan; Pan, Ming; Tramblay, Yves; Shen, Chaopeng; Ji, Haoyu; Gebrechorkos, Solomon H.; Pappenberger, Florian; Pyo, Jong Cheol; Feng, Dapeng; Huffman, George; Nguyen, Phu; Massari, Christian; Brocca, Luca; Jackson, Tan; Beck, Hylke E.

doi:10.5194/egusphere-2024-4194

Preprints

https://doi.org/10.5194/egusphere-2024-4194

Preprints

20 Jan 2025

| 20 Jan 2025

Comprehensive Global Assessment of 23 Gridded Precipitation Datasets Across 16,295 Catchments Using Hydrological Modeling

Ather Abbas, Yuan Yang, Ming Pan, Yves Tramblay, Chaopeng Shen, Haoyu Ji, Solomon H. Gebrechorkos, Florian Pappenberger, Jong Cheol Pyo, Dapeng Feng, George Huffman, Phu Nguyen, Christian Massari, Luca Brocca, Tan Jackson, and Hylke E. Beck

Abstract. Numerous gridded precipitation (P) datasets have been developed to address a variety of needs and challenges. However, selecting the most suitable and reliable dataset remains a challenge for users. We conducted the most comprehensive global evaluation to date of gridded (sub-)daily P datasets using hydrological modeling. A total of 23 datasets, derived from satellite, model, gauge sources, or their combinations thereof, were assessed. To evaluate their performance, we calibrated the conceptual hydrological model HBV against observed daily streamflow for 16,295 catchments (each < 10, 000 km²) world- wide, using each P dataset as input. The Kling-Gupta Efficiency (KGE) was used as the performance metric and the calibration score served as a proxy for P dataset performance. Overall, MSWEP V2.8 demonstrated the highest performance (median KGE of 0.75), highlighting the value of merging P estimates from diverse data sources and applying daily gauge corrections. Among the purely satellite-based P datasets, the soil moisture- and microwave-based GPM+SM2RAIN dataset performed best (median KGE of 0.60), while the JRA-3Q reanalysis ranked highest among the purely model-based datasets (median KGE of 0.67), outperforming the widely used ERA5 reanalysis (median KGE of 0.59). Performance varied across Köppen-Geiger climate zones, with the best results in polar (E) regions (median KGE of 0.74 across datasets) and the lowest in arid (B) regions (median KGE of 0.33 across datasets). We further examined the spatial relationships between catchment attributes and KGE scores, identifying potential evaporation, air temperature, solid P fraction, and latitude as the strongest predictors of performance. Our analysis revealed significant regional differences in dataset performance and heterogeneity in P error characteristics, underscoring the critical importance of careful dataset selection for water resource management, hazard assessment, agricultural planning, and environmental monitoring.

Received: 30 Dec 2024 – Discussion started: 20 Jan 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 7741 KB)

Supplement (28536 KB)

Download & links

Preprint (7741 KB)
Metadata XML
Supplement (28536 KB)
BibTeX
EndNote

Status: closed

RC1:
'Comment on egusphere-2024-4194', Anonymous Referee #1, 20 Feb 2025
This manuscript presents an unprecedented evaluation of 23 (sub)daily (quasi)global precipitation (P)datasets across 16,295 catchments worldwide using hydrological modeling. The 23 P datasets belong to six major families of data sources: satellite only, reanalysis only, rain gauge only, satellite and rain gauge, satellite and reanalysis; and satellite, reanalysis and rain gauge. The conceptual hydrological model HBV was used to simulate the conversion of precipitation into streamflow at the daily temporal scale. Each P dataset, along with air temperature (from MSWX) and potential evapotranspiration (computed using the Hargreaves formula), are used to drive the hydrological simulations. The modified Kling-Gupta efficiency (KGE’) is used to evaluate the performance of the simulated streamflows against daily observations, and serves as a proxy for the performance of the P datasets.
This manuscript addresses an important topic for the hydrometeorological community. The manuscript is well written, concise and clear, with updated references. Unfortunately, the manuscript lacks a clear scientific question or hypothesis to be tested, and the Methodology section does not provide enough scientific detail to fully understand what was done and how, which prevents adequate reproducibility of the results. In addition, some conclusions are speculative and are not supported by the results included in the manuscript. Finally, some references are not used in the text and others contain minor errors. To summarise, the manuscript in its current form does not represent a substantial contribution to the global hydrometeorological community; but all the problems mentioned could be addresed by the authors during the review process. The following lines describe the major and minor problems detected in the manuscript.
Major comments:

MC1. The motivation for the article is not well developed. The manuscript does a really good job of pointing out the limitations of previous evaluations of P datasets. However, what is the ultimate purpose of this comprehensive evaluation of P datasets on a global scale? Is it just to provide some numbers on a global scale, or is it to test a hypothesis or answer a scientific question, or to provide recommendations for the selection of P products for specific applications or specific geographic regions? If so, the hypothesis, the scientific question or the ultimate purpose of the manuscript should be explicitly stated.

MC2. Usage of the outdated CMORPH-RAW (Joyce et al., 2004) and the unknown CMORPH-RT (Xie et al., 2017) instead of the new bias-corrected CMORPH-CDR v.1 (Xie et al., 2017, 2018). In the manuscript it is mentioned that the old CMORPH-RAW and CMORPH-RT are available from 2019 onwards (which seriously limit the hydrological modeling runs), while the newest version of CMORPH, termed CMORPH-CDR, is available from 1998 onwards (not from 2019 onwards). Moreover, it is not clear what is the product CMORPH-RT used in this study, every time that Xie et al. (2017) describe CMORPH-CDR version 1, which is available since 1998 and not from 2019. Therefore, I request the authors to remove the usage of the outdated CMORPH-RAW (version 0) and the unknown CMORPH-RT and use the relatively new bias-corrected CMORPH-CDR version 1, which is available since 1998, and it is described by Xie et al. (2017) and Xie et al. (2018).

Xie, P., Joyce, R., Wu, S., Yoo, S.-H., Yarosh, Y., Sun, F., and Lin, R.: Reprocessed, Bias-Corrected CMORPH Global High-Resolution Precipitation Estimates from 1998, Journal of Hydrometeorology, 18, 1617–1641, doi:10.1175/JHM-D-16-0168.1, 2017.

Xie, P., Joyce, R., Wu, S., Yoo, S., Yarosh, Y., Sun, F., Lin, R., and NOAA CDR Program: NOAA Climate Data Record (CDR) of CPC Morphing Technique (CMORPH) High Resolution Global Precipitation Estimates, Version 1, doi:10.25921/W9VA-Q159, URL https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00948, 2018.

MC3. Use of the under-revision PERSIANN-CCS-CDR (Sadeghi et al., 2021). This paper uses PERSIANN-CCS-CDR (Sadeghi et al., 2021) as one of the 23 P datasets to be evaluated. However, the websitehttps://chrsdata.eng.uci.edu/ clearly states that “PERSIANN-CCS-CDR is currently under revision and unavailable for download”. Therefore, I request the authors to remove the use of PERSIANN-CCS-CDR from this study or clarify the data version used in this study and indicate whether the chosen version is problematic or not.

MC4. Catchment selection. To ensure the suitability of the catchments used in the analyses, five selection criteria were applied in the manuscript to the 34,768 streamflow stations that passed the duplication check. However, the following two decisions are entirely subjective and require more detailed explanation (in the manuscript) by the authors: i) discarding streamflow stations where both the station location and the corresponding catchment centroid were within 5 km of those of another station (how does the spatial resolution of the individual P products influence this criterion?); ii) the number of events had to be greater than 10 non-consecutive (how does the duration of each selected event affect this criterion?; are 11 non-consecutive days with Q >= 5 mm d-1 sufficient to ensure a robust calibration of a hydrological model?)

MC5. Use of an unknown version of the HBV hydrological model. The manuscript does not contain a description about the version of the HBV hydrological model used in all analyses. L107 indicates that the HBV-light software described by Seibert and Vis (2012) was the software version used in this study. However, it seems unlikely that a Windows-based version of HBV was selected to simulate 16,295 catchments worldwide. I request the authors to provide details of the version of HBV used in this study. In the event that the authors use their own version of HBV, I request them to provide a link to the source code of the model in the “Code Availability” section requested by HESS (https://www.hydrology-and-earth-system-sciences.net/submission.html#templates).

MC6. Use of catchment-mean P time series to drive the hydrological model (L89-90, L114-115). The use of catchment mean P time series to drive the hydrological model HBV could lead to important problems in the representation of observed streamflows in catchments with mixed hydrological regimes (i.e. snow-dominated or snow-influenced hydrological regimes), which should be reflected in low KGE values. Therefore, I request the authors to provide -in the supplementary material- five to seven example catchments where the HBV is able to reproduce their mixed hydrological regime by using catchment-mean P time series to drive the hydrological model (I request not only the presentation of the KGE values and the daily time series of the observed Q compared to the simulated Q, but also a comparison of the mean monthly streamflows). If the model is not able to acceptably reproduce the daily and mean monthly observed streamflows of catchments with mixed hydrological regimes, I suggest the authors to implement different elevation bands in these catchments. A publicly available open-source version of an HBV-like hydrological model can be found at: https://cran.r-project.org/package=TUWmodel, which allows the use of up to 10 elevation bands in each catchment.

MC7. Using the Hargreaves (1994) equation to calculate potential evapotranspiration (PET) to drive the hydrological model. I request the authors to justify this choice after knowing that Oudin 2005a proposed a different temperature-based PET model after evaluating 27 potential evapotranspiration models in terms of streamflow simulation efficiency in a large sample of 308 catchments in France, Australia and the United States.

MC8. Range used for the calibration of the PCORR parameter of HBV. Table 2 shows that the PCORR parameter is used as a multiplier to mitigate the systematic underestimation of P characteristics of some P products, and therefore a range of [1, 2] is used in the optimisation of this parameter. This decision could lead to low KGE values in arid or hyper-arid catchments (see Table 3), where some P datasets overestimate the true (and unknown) P amount. Therefore, I request the authors to extend this range to [0.5, 2] so that the calibration procedure can compensate not only for an underestimation of P but also for an overestimation of it.

MC9. Use of an unknown version of the (µ+λ) evolutionary algorithm used to calibrate the HBV hydrological model. The manuscript does not contain a description of the version of the (µ+λ) evolutionary algorithm used to calibrate the HBV hydrological model. From L124, the reader can infer that the DEAP Python software was used to calibrate the HBV model. However, I request the authors to clarify the name and version of the software used to implement the (µ+λ) evolutionary algorithm and to describe how this algorithm was coupled to the (unknown) version of the HBV hydrological model (see MC5). Finally, I request the authors to describe whether they can ensure that the (µ+λ) evolutionary algorithm has converged to a stable KGE value after 1200 model runs (L125) or not.

MC10. Selection of temporal period used for the calibration of the individual catchments. It is not clear from the manuscript whether the period used to calibrate the HBV hydrological model with each P dataset was the same or whether it depended on the data availability of the respective P product. I request the authors to clarify this situation in the manuscript. In the case that the temporal period used for the calibration of each catchment depends on the data availability of each P product, and therefore, it was not the same for all the P products used as forcing in each catchment, I request the authors to use the same temporal period for the calibration of all P products in each catchment, to ensure a fair comparison of the performance of different P datasets in a given catchment. Of course, the temporal periods may be different from one catchment to another, but for the same catchment the same temporal period should be used to calibrate the HBV model with all P datasets.

MC11. Based on the boxplots summarising the performance of each of the 23 P datasets used in this study, it is quite surprising that the CPC Unified dataset, which is based solely on rain gauge information and has the coarsest spatial resolution of all P datasets (0.5°), ranked second among all datasets. I request the authors to add a paragraph suggesting possible reasons for this unexpected behaviour.

MC12. To provide an initial assessment of the ability of all 23 P datasets used in this study to reproduce the mean annual precipitation at a given location, I request the authors to create a new figure with the mean annual precipitation for 2007-2015 (the longest period for which all datasets have data, after removing the two CMORPH products described in MC2), computed as the average of the mean annual values obtained for each of the 23 P datasets for that period (P_avg). In addition, I request the authors to prepare 23 new figures showing the difference between the mean annual precipitation of each P dataset for 2007-2015 (P_i) and P_avg, i.e., P_i - P_avg. All the figures requested in this comment should be included in the supplementary material only, and they will allow to identify major problems in the representation of mean annual values of a given P dataset in some specific regions of the world.

MC13. To facilitate the “generalizability of their findings” (L50, L57) for readers from different countries, I request the authors to add a new figure to the main body of the manuscript: a map showing, in different colours, the KGE values obtained in each catchment. This figure will allow us to identify the spatial distribution of the high and low performance of each P dataset in the simulation of daily streamflows. This new figure will make it possible to support several statements in the “Results and Discussion” section that are currently not supported by any figure in the manuscript.

MC14. To facilitate even more the “generalizability of their findings” (L50, L57) to readers from the same country but from catchments with different hydrological regimes, I strongly suggest (and do not request) the authors to make an extra effort and classify the hydrological regimes of each of the 16,295 catchments (e.g., pluvial, glacial, snow-dominated, snow-influenced, tropical). This would allow readers to use the results of the articles to select one or more P datasets to use for analysing specific case studies in their own countries. If this suggestion could not be addressed by the authors, I request them to insert three new columns in Table 3: low solid P fraction, medium solid P fraction and high solid P fraction, where the thresholds to distinguish between low, medium and high values of solid P fraction should be proposed by the authors based on their knowledge and the values of solid P fraction of all 16,295 catchments.he values of the solid P fraction of all the 16,295 catchments.

MC15a. Poor performance of HBV in arid climates. Although the manuscript does not explicitly mention this, it can be inferred that the authors assume that the performance of HBV is likely to be poor in arid climates (L226), because “P in arid regions tends to be brief and intense, making it challenging to detect and model accurately(Beck et al., 2017b; Sun et al., 2018; El Kenawy et al., 2019; Beck et al., 2019a)” (L227-228). However, Seibert and Bergström (2022) mention in their review that the HBV is routinely used to model the impacts of climate change on water resources around the world, including regions as arid as the Nile (Booij et al., 2011) and, threfore, aridity per se should therefore not be a reason to explain a poor performance of the HBV model.

Booij, M. J., Tollenaar, D., van Beek, E., and Kwadijk, J. C. J.: Simulating impacts of climate change on river discharges in the Nile basin, Phys. Chem. Earth, 36, 696–709, https://doi.org/10.1016/j.pce.2011.07.042, 2011.

MC15b. Definition of the aridity index. In the main text of the manuscript, arid regions are associated with values of the aridity index greater than 1 (L250-251, L266). However, this association is inconsistent with the definition of the aridity index in Table B1 of Appendix B, where the aridity index is defined as the ratio between mean annual P and potential evapotranspiration, and therefore values greater than 1 would indicate wet rather than dry catchments. Please clarify this discrepancy.

MC16. Efficiency of the filter used to select the study's catchmens. In Section 3.2 (Regional performance differences) the authors mention aridity, groundwater use and/or anthropogenic water use as possible explanations for the low performance obtained for several P products in Australia, India, South Korea and Africa. Does this mean that the five criteria used in Section 2.2 to “ensure the suitability of the catchments for the present analysis” (L87) did not work as expected?. I request the authors to add a discussion of why the five criteria previously mentioned were not sufficient to filter out catchments that were not suitable for the present analysis. I also request the authors to consider whether it is necessary to add one or more criteria that would allow the presence of irrigation, hydrograph regulation and/or major consumptive water use to be detected, in order to screen out catchments that will not provide reliable results from the analysis. I suggest the authors analyse the criteria used by the Reference Observatory of Basins for INternational hydrological climate change detection (ROBIN; Kumar et al., 2024) to ensure that the streamflows observed in each selected catchment are free from anthropogenic influences.

Kumar, A., Hannaford, J., Turner, S., Barker, L. J., Dixon, H., Griffin, A., Suman, G., and Armitage, R.: Global trend and drought analysis of near-natural river flows: The ROBIN Initiative, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17249, https://doi.org/10.5194/egusphere-egu24-17249, 2024.

MC17. Make the observed streamflow dataset publicly accessible. HESS request the authors to follow their data policy (https://www.hydrology-and-earth-system-sciences.net/submission.html), which includes a statement on how the underlying research data can be accessed. If the data are not publicly accessible, a detailed explanation of why this is the case is required (e.g. applicable laws, university and research institution policies, funder terms, privacy, intellectual property and licensing agreements, and the ethical context of the research). In addition, the HESS data policy states the provision of unrestricted access to all data and materials underlying reported findings for which ethical or legal constraints do not apply. It is true that a URL or reference to the data source of the streamflow data used in this study is provided in Table A1. However, a researcher wishing to reproduce the results of this study will never be certain that the data downloaded from each URL corresponds exactly to the original 43,627 stations used in this study. Furthermore, in the hypothetical situation of having downloaded exactly the same 43,627 stations that were originally used in this study, it would not be possible to ensure that applying the five criteria, presented in Section 2.2 for filtering out stations, would result in exactly the 16,295 stations finally analysed in this study. Therefore, in practise, it would not be possible for a researcher to reproduce the results of this study. The entire scientific community will thank the authors of this study for providing public access to the daily streamflow data, the catchment boundaries and the location of the outlet of each catchment in order to improve this dataset for future analyses on a global scale.

Minor comments:

In all the manuscript, I ask the authors to use the word “reanalysis” instead of “model” when referring to atmospheric models of the global climate (e.g., ERA5, JRA-3Q), to avoid confusion with the HBV hydrological model used in this study.

Provide the full name of all the abbreviations used in the manuscript the first instance they appear, as specified in the “English guidelines and house standards” of HESS (https://www.hydrology-and-earth-system-sciences.net/submission.html). This is particularly important for all the precipitation products, which can not be assumed to be known by the wider scientific community. In addition, please provide a reference for each P dataset the first time they appear in the text.

Because CAMELS is a catchment dataset specifically developed for U.S., I request the authors to use CAMELS only for the US datasets, while when referring to CAMELS-like datasets developed for other countries, the individual names of the datasets should be used (e.g., CAMELS-GB, CAMELS-CL) or a generic name different from “CAMELS”.

To avoid possible ambiguities, use always in the text “streamflow” instead of “flow”. Also, when using “runoff” instead of “streamflow”, specify how runoff was obtained.

L20-21. Provide a reference for the crucial role that the spatio-temporal distribution of P plays in water resources assessment.

Table 1. Correct the reference provided for IMERG-Final V7, because Huffman et al. (2019) makes reference to version 6 and not to version 7.

Table 1. In the column “Temp. Cov.”, please explain the meaning of “NRT” in the caption of the table, and remove that term (assumed to mean “near real-time”) for all the products which time latency is larger than 1 day.

Table 1. Provide a “Time Latency” value for all the products lacking such information.

Table 1, Table 3, Figure 2. Please check whether IMERG-Early V7 was used in this study or not, because L72 mentions only IMERG-Early V6 and not IMERG-Early V7.

L69. It mentions that “The datasets fall into two main categories”. However, in L149 it is mentioned that “Among the six main categories of P datasets”, which is consistent with the six categories used in Table 1 (column ‘Data Source’) and Table 3 (column ‘Dataset Type’). I ask the authors to keep six categories in all the manuscript, using ‘Dataset Type’ as a consistent denomination name and using “S, R (reanalysis), G, S+G, S+R, S+R+G” as possible values for this denomination name (instead of “S; R (reanalysis); G; S,G; S,R; S,R,G” as used in Table 1).

CAMELS-like instead of CAMELS.

L82. Change “and websites” by “or websites”, because Table A1 provide either a reference or a URL but not both.

L103-104. Provide the catchments areas corresponding to the 2.5 and 97.5 percentiles as well.

Figure 1. Explain in the caption what is specifically shown in panels a) and b) of this figure.

Table 2. Please add a new column “units” to specify the measurement units of each HBV parameter.

L122-124. Provide more details about the statement: “Model initialization was done by running the model with 10 years of prior P data, if available. If 10 years of prior P data were not available, the model was run multiple times using the available P data until a total of more than 10 years was accumulated”. In particular, clarify how running multiple times the HBV model allow to compensate the lack of P data.

L130. Remove GDAS from the examples of P datasets with short record, because its data start in 2001, in contrast to the two CMORPH versions which data starts in 2019.

L144. Explain what do you mean by “γ reflects the shape of P probability distribution”.

L158. In the sentence “Specifically, gauge data enhance performance in …” do you mean something like “Specifically, bias correction using gauge data enhance performance in ….”?

L165-166. Please provide a reference that support the statement about the climatological rain gauge adjustment in IMERG-Late V7. This is requested because to the best of my knowledge the document “IMERG_V07_ReleaseNotes_final_230713.pdf”, only mentions “Applied climatological adjustment to the Final Run for Early and Late Runs”.

L174-175. Provide a discussion about the poor performance of PDIR-now in UK, Denmark and Italy.

L179. Provide a reference for GDAS.

L202. Could you be more specific with the sentence “the importance of improving coverage in data sparse regions due to data sharing limitations” ?

L203. Where can we see the “comparison of PCORR parameter values obtained after calibration using different P datasets” ?

L209. How is it possible to obtain negative values of the PCORR parameter if the range specified for this parameter in Table 2 was [1, 2]?

Figure 2. Add to the caption of this figure the meaning of the horizontal black line shown in each boxplot.

L237-L241. To avoid confusion, please use the same attribute names used in Figure 4 and Appendix B (e.g., use “Mean PET” instead of “low Mean PET”).

L240. Develop more the idea “…, as frontal P is prevalent under these conditions”.

L243. Please introduce the concept “Rain Gauge Density map” before using it here.

L272. Correct “JRA-3”

L275. Explain the meaning of TOVS-to-ATOVS.

L277-280. Where can we see the low performance obtained by PDIR-Now in Italy and Denmark, as well as the low performances obtained by JRA-3Q in Tahiland?

L288. To improve the clarity of the text, please change “bias-adjustment techniques” by “bias-adjustment techniques of P datasets”.

L310-312. Can you provide any number to support the statement “our approach may slightly overestimate the relative performance of gauge-based and model-based datasets compared to satellite-only datasets"?

L313. Remove GDAS from the examples of P datasets with short record, because its data start in 2001, in contrast to the two CMORPH versions which data starts in 2019.

L317-321. I suggest to move these lines into a new section termed “Future work”.

L331. Given that GPM+SM2RAIN performed best among all the satellite-only P datasets, and considering that the developers of that product are among the authors of this work, can you provide some description of the reasons that prevent updating this product at least once a year?

L334. Stating that MSWEP is a “gauge-based” dataset gives the wrong idea that this product is only based on rain gauge information. I suggest to be more specific here and specify that this product uses information from rain gauges, among other sources.

L339-340. The statement “while arid regions exhibited overall poor performance, with model-based datasets slightly outperforming others” is not correct, because Table 3 shows that IMERG-Final V7, GPCP v3.2 and CPC Unified outperformed reanalysis datasets in arid regions. Please correct.

In the sections “Results and Discussion” and “Conclusions” please provide some analysis of the performance of the P datasets in mountainous regions, which is of utmost interest for the wider hydrological community.

In the Section “Conclusions” please mention something about the catchment attributes that would allow to predict -to some extent- a good performance of the P datasets, which is of utmost interest for the wider hydrological community.

L359. NOAA is written twice. Correct.

L373. Change the capital “O” used in “Observed”.

L377. Mention in the text where the radiation and humidity data are used in this work.

Table A1. Please separate the “Data source” column into two different columns: “Institution name” and “Country”, to have better information about the data source used for the observed streamflow data.

Table B1. Indicate the measurement unit used for the attribute “Rain gauge density”.

Table B1. Incorrect citation to Legates and Bogart (2009). Please correct.

Table B1. Considering the existence of the attribute “Permafrost fraction”, why the attribute “Glacier fraction” was not included in the analysis?

L388-394. Please provide the correct acknowledgment to each one of the P datasets used in this study, as requested by each data source provider.

L399. There is an incorrect character in the reference. Correct it.

L503-508. This reference is repeated twice. Correct it.

L612-615. This reference is repeated twice. Correct it.

L631. Correct the error in the URL.
Citation: https://doi.org/10.5194/egusphere-2024-4194-RC1
- AC3: 'Reply on RC1', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC3
RC2:
'Comment on egusphere-2024-4194', Anonymous Referee #2, 04 Apr 2025

Review of EGUsphere-2024-4194
Title
Comprehensive Global Assessment of 23 Gridded Precipitation Datasets Across 16925 Catchments Using Hydrological Modeling
By: Ather Abbas, Yuan Yang, Ming Pan, Yves Tramblay, Chaopeng Shen, Haoyu Ji, Solomon H. Gebrechorkos, Florian Pappenberger, JonCheol Pyo, Dapeng Feng, George Huffman, Phu Hguyen, Christian Massari, Luca Brocca, Tan Jackson, & Hylke E. Beck

This manuscript provides an extensive evaluation of the hydrological performance of 23 gridded precipitation (P) datasets by calibrating a hydrological model over 16,295 catchments across the globe. The 23 P datasets are chosen based on their availability at (sub)-daily scale and their (quasi)-global coverage. Among them, 1 dataset is gauge-based only, 3 are reanalysis-based only, 12 are satellite-based only, 3 combine gauge and satellite data, 2 combine reanalysis and satellite data, and 2 combine gauge, reanalysis and satellite data. A conceptual hydrological model (HBV) is used to simulate daily streamflow and is calibrated with each P dataset using the evolutionary algorithm. The Kling-Gupta Efficiency (KGE) is used to assess the hydrological performance of each P dataset across 16,295 catchments.

General Comments:
This study tackles a crucial issue in hydrological modeling, which is becoming increasingly important for many potential users who often lack guidance and confidence when navigating the wide range of products available for addressing specific hydrological problems across different geographical regions. The manuscript is well-written and well-structured; however, several issues need to be addressed to strengthen the robustness of the findings and conclusions.
The inclusion criteria stated in Section 2.2 (L80-104) are questionable because of their subjective nature and arbitrary setting. To ensure sufficient data for calibration, the number of events (defined as runoff > 5 mm d^-1) must exceed 10 non-consecutively (L96-97). However, some fixed values were set by the authors without providing a clear explanation of their rationale behind their selection. Would it be more appropriate to use a percentile of runoff instead of a fixed value for adaptation across catchments with different hydrological regimes? Similarly, to filter out catchments with erroneous streamflow and catchment boundary data, the authors set the mean annual runoff to be ≥ 5 and < 5000 mm yr^-1 (L98-99). However, the range of mean annual runoff values can vary significantly across different climatic zones, with arid regions ranging from 0 to 100 mm yr^-1, while tropical regions can vary from 800 to over 2000 mm yr^-1. It would be appreciated if the authors could provide further explanation and justification for their inclusion criteria.
The hydrological performance of P datasets with higher spatial resolution might be compromised when using catchment-mean P to drive the hydrological model, as the more detailed spatial information from these P datasets is lost. It is somewhat surprising to see that CPC Unified ranks second, given its coarse spatial resolution (0.5°). In addition, it is quite interesting that JRA-3Q performs better than its higher spatial resolution counterparts (ERA5 and GDAS). It is suspected that using catchment-mean P might mask the advantages of higher spatial resolution, leading to the conclusion that “higher spatial resolution does not guarantee better performance, especially when data is aggregated at the catchment scale” (L153-154). This may hold true for catchments dominated by a single climate or with relatively uniform topography, where spatial variability in precipitation has less influence. However, in mountainous, snow-dominated, or mixed-climatic catchments, the hydrological response cannot be adequately captured without detailed spatial P information. As a result, the true value of higher spatial resolution datasets may be underestimated, potentially biasing the selection of P datasets for hydrological modelling.
The use of PCORR parameter to mitigate systematic biases in P datasets during calibration may present challenges because it adjusts only for P underestimation by setting the range between 1 to 2. This focuses on correcting underestimation without addressing P overestimation could disproportionately affect datasets prone to overestimation, potentially skewing performance evaluations. For instance, datasets like PDIR-Now and JRA-3Q, which experience overestimation, have low median KGE scores in some streamflow data sources (L277-280). It would be appreciated if the authors could provide a more comprehensive explanation and justification for their focus on mitigating only P underestimation.

Specific Comments:
L57-60: It would be appreciated if the authors could provide some basic information about the new datasets in the Gridded P Datasets section 2.1.
L121-124: It is very unclear that how the model was initialized when 10 years of prior P data were not available. Did the authors just concatenate the same available P data n times to achieve the desired length? Or did the authors use any rainfall generators to produce stochastic P data? Please clarify and justify the use of “multiple times using the available P data until a total of more than 10 years was accumulated”?
L124-126: It would be appreciated if the authors could provide more information and description about the evolutionary algorithm.
L128-132: For a particular catchment, the full period of overlapping streamflow and P data could be different because of the differences in the temporal availability of the P datasets. In this regard, will such differences also cause instability in the performance score?
L170-176: Will the poor performance of PDIR-Now due to the inability of PCORR in adjusting overestimation of the P dataset?
L310-312: Could the authors elaborate further how the alignment of streamflow stations with meteorological network might favour gauge-based and reanalysis-based P datasets over satellite-only P datasets?
Remarks:
L54: typo “result n biased conclusions”
L72: should it be “IMERG-Early V7” instead of “IMERG-Early V6”?
L275: please explain “TOVS-to-ATOVS transition. Thank you.

Citation: https://doi.org/10.5194/egusphere-2024-4194-RC2
- AC2: 'Reply on RC2', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC2
RC3:
'Comment on egusphere-2024-4194', Oscar Manuel Baez Villanueva, 09 Apr 2025

This article presents a comprehensive evaluation of the daily performance of 23 global precipitation (P) datasets through a hydrological modeling experiment using the HBV model across 16,295 catchments worldwide. The manuscript is very well-written and clear, and the study addresses a topic of interest to the scientific community. It contributes valuable insights into the suitability of different P datasets for hydrological applications at the global scale and fits well within the scope of the journal. Overall, this is a strong and useful contribution, and I congratulate the authors for the scale and depth of the analysis. I believe that considering the following points will further enhance the manuscript.
I understand the rationale behind calibrating both the snowfall gauge undercatch correction factor (SFCF) and the multiplicative bias correction factor (PCORR), as these systematic errors can be addressed more easily. However, I suggest lowering the minimum bound of these parameters (e.g., to 0.6) to avoid favouring datasets that tend to underestimate P. In addition, it would be helpful to compare the results of the current calibration approach with a scenario where SFCF and PCORR are both fixed at 1.0. This comparison could shed light on the overall performance of P datasets. More specifically about (i) which datasets tend to systematically over- or underestimate and where, and (ii) the relative importance of these biases when these products are used for hydrological modelling purposes.
The HBV model is applied in a lumped configuration, considering catchment-averaged forcing data. While this is understandable for a global-scale study, it would be very interesting to explore whether a semi-distributed configuration that accounts more explicitly for P gradients related to elevation, could provide additional insights, particularly in mountainous or topographically complex regions.
It would also be helpful to include a short statement in the Limitations Section acknowledging the assumption of constant land cover over the analysis period. Land cover changes can influence hydrological responses and might introduce some uncertainties in the model performance, especially over multi-decadal periods.
Since PCORR and SFCF are calibrated, I also recommend including a few sentences explaining which types of biases are captured by the beta component of the KGE. This would clarify the references throughout the manuscript to over- and underestimation of datasets, which in part, could be related to biases of different magnitudes for specific P intensities, and the skill of the products to accurately detect P events.
In Section 2.2, the authors mention that streamflow records of selected catchments must span more than three years. Could the authors clarify if these years must be consecutive? Similarly, the rationale for requiring more than 10 non-consecutive P events is not fully explained. How was this threshold determined?
Additional minor suggestions:
Table 1: In the “Temporal resolution” column please use either “30 min.” or “30 min” consistently.
L156: It would be helpful to report the median KGE for MSWEP V2.8 here for easier comparison with other products.
L189: Likewise, indicate the median KGE for CHIRPS V2.0.
Table 3: IMERG-Final V7 also performs best over tropical regions and should be marked in bold, as is done for MSWEP V2.8.
Throughout the manuscript, “evaporation” and “evapotranspiration” are used interchangeably. Please consider clarifying and using these terms consistently.

Citation: https://doi.org/10.5194/egusphere-2024-4194-RC3
- AC1: 'Reply on RC3', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC1

Status: closed

RC1:
'Comment on egusphere-2024-4194', Anonymous Referee #1, 20 Feb 2025
This manuscript presents an unprecedented evaluation of 23 (sub)daily (quasi)global precipitation (P)datasets across 16,295 catchments worldwide using hydrological modeling. The 23 P datasets belong to six major families of data sources: satellite only, reanalysis only, rain gauge only, satellite and rain gauge, satellite and reanalysis; and satellite, reanalysis and rain gauge. The conceptual hydrological model HBV was used to simulate the conversion of precipitation into streamflow at the daily temporal scale. Each P dataset, along with air temperature (from MSWX) and potential evapotranspiration (computed using the Hargreaves formula), are used to drive the hydrological simulations. The modified Kling-Gupta efficiency (KGE’) is used to evaluate the performance of the simulated streamflows against daily observations, and serves as a proxy for the performance of the P datasets.
This manuscript addresses an important topic for the hydrometeorological community. The manuscript is well written, concise and clear, with updated references. Unfortunately, the manuscript lacks a clear scientific question or hypothesis to be tested, and the Methodology section does not provide enough scientific detail to fully understand what was done and how, which prevents adequate reproducibility of the results. In addition, some conclusions are speculative and are not supported by the results included in the manuscript. Finally, some references are not used in the text and others contain minor errors. To summarise, the manuscript in its current form does not represent a substantial contribution to the global hydrometeorological community; but all the problems mentioned could be addresed by the authors during the review process. The following lines describe the major and minor problems detected in the manuscript.
Major comments:

MC1. The motivation for the article is not well developed. The manuscript does a really good job of pointing out the limitations of previous evaluations of P datasets. However, what is the ultimate purpose of this comprehensive evaluation of P datasets on a global scale? Is it just to provide some numbers on a global scale, or is it to test a hypothesis or answer a scientific question, or to provide recommendations for the selection of P products for specific applications or specific geographic regions? If so, the hypothesis, the scientific question or the ultimate purpose of the manuscript should be explicitly stated.

MC2. Usage of the outdated CMORPH-RAW (Joyce et al., 2004) and the unknown CMORPH-RT (Xie et al., 2017) instead of the new bias-corrected CMORPH-CDR v.1 (Xie et al., 2017, 2018). In the manuscript it is mentioned that the old CMORPH-RAW and CMORPH-RT are available from 2019 onwards (which seriously limit the hydrological modeling runs), while the newest version of CMORPH, termed CMORPH-CDR, is available from 1998 onwards (not from 2019 onwards). Moreover, it is not clear what is the product CMORPH-RT used in this study, every time that Xie et al. (2017) describe CMORPH-CDR version 1, which is available since 1998 and not from 2019. Therefore, I request the authors to remove the usage of the outdated CMORPH-RAW (version 0) and the unknown CMORPH-RT and use the relatively new bias-corrected CMORPH-CDR version 1, which is available since 1998, and it is described by Xie et al. (2017) and Xie et al. (2018).

Xie, P., Joyce, R., Wu, S., Yoo, S.-H., Yarosh, Y., Sun, F., and Lin, R.: Reprocessed, Bias-Corrected CMORPH Global High-Resolution Precipitation Estimates from 1998, Journal of Hydrometeorology, 18, 1617–1641, doi:10.1175/JHM-D-16-0168.1, 2017.

Xie, P., Joyce, R., Wu, S., Yoo, S., Yarosh, Y., Sun, F., Lin, R., and NOAA CDR Program: NOAA Climate Data Record (CDR) of CPC Morphing Technique (CMORPH) High Resolution Global Precipitation Estimates, Version 1, doi:10.25921/W9VA-Q159, URL https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00948, 2018.

MC3. Use of the under-revision PERSIANN-CCS-CDR (Sadeghi et al., 2021). This paper uses PERSIANN-CCS-CDR (Sadeghi et al., 2021) as one of the 23 P datasets to be evaluated. However, the websitehttps://chrsdata.eng.uci.edu/ clearly states that “PERSIANN-CCS-CDR is currently under revision and unavailable for download”. Therefore, I request the authors to remove the use of PERSIANN-CCS-CDR from this study or clarify the data version used in this study and indicate whether the chosen version is problematic or not.

MC4. Catchment selection. To ensure the suitability of the catchments used in the analyses, five selection criteria were applied in the manuscript to the 34,768 streamflow stations that passed the duplication check. However, the following two decisions are entirely subjective and require more detailed explanation (in the manuscript) by the authors: i) discarding streamflow stations where both the station location and the corresponding catchment centroid were within 5 km of those of another station (how does the spatial resolution of the individual P products influence this criterion?); ii) the number of events had to be greater than 10 non-consecutive (how does the duration of each selected event affect this criterion?; are 11 non-consecutive days with Q >= 5 mm d-1 sufficient to ensure a robust calibration of a hydrological model?)

MC5. Use of an unknown version of the HBV hydrological model. The manuscript does not contain a description about the version of the HBV hydrological model used in all analyses. L107 indicates that the HBV-light software described by Seibert and Vis (2012) was the software version used in this study. However, it seems unlikely that a Windows-based version of HBV was selected to simulate 16,295 catchments worldwide. I request the authors to provide details of the version of HBV used in this study. In the event that the authors use their own version of HBV, I request them to provide a link to the source code of the model in the “Code Availability” section requested by HESS (https://www.hydrology-and-earth-system-sciences.net/submission.html#templates).

MC6. Use of catchment-mean P time series to drive the hydrological model (L89-90, L114-115). The use of catchment mean P time series to drive the hydrological model HBV could lead to important problems in the representation of observed streamflows in catchments with mixed hydrological regimes (i.e. snow-dominated or snow-influenced hydrological regimes), which should be reflected in low KGE values. Therefore, I request the authors to provide -in the supplementary material- five to seven example catchments where the HBV is able to reproduce their mixed hydrological regime by using catchment-mean P time series to drive the hydrological model (I request not only the presentation of the KGE values and the daily time series of the observed Q compared to the simulated Q, but also a comparison of the mean monthly streamflows). If the model is not able to acceptably reproduce the daily and mean monthly observed streamflows of catchments with mixed hydrological regimes, I suggest the authors to implement different elevation bands in these catchments. A publicly available open-source version of an HBV-like hydrological model can be found at: https://cran.r-project.org/package=TUWmodel, which allows the use of up to 10 elevation bands in each catchment.

MC7. Using the Hargreaves (1994) equation to calculate potential evapotranspiration (PET) to drive the hydrological model. I request the authors to justify this choice after knowing that Oudin 2005a proposed a different temperature-based PET model after evaluating 27 potential evapotranspiration models in terms of streamflow simulation efficiency in a large sample of 308 catchments in France, Australia and the United States.

MC8. Range used for the calibration of the PCORR parameter of HBV. Table 2 shows that the PCORR parameter is used as a multiplier to mitigate the systematic underestimation of P characteristics of some P products, and therefore a range of [1, 2] is used in the optimisation of this parameter. This decision could lead to low KGE values in arid or hyper-arid catchments (see Table 3), where some P datasets overestimate the true (and unknown) P amount. Therefore, I request the authors to extend this range to [0.5, 2] so that the calibration procedure can compensate not only for an underestimation of P but also for an overestimation of it.

MC9. Use of an unknown version of the (µ+λ) evolutionary algorithm used to calibrate the HBV hydrological model. The manuscript does not contain a description of the version of the (µ+λ) evolutionary algorithm used to calibrate the HBV hydrological model. From L124, the reader can infer that the DEAP Python software was used to calibrate the HBV model. However, I request the authors to clarify the name and version of the software used to implement the (µ+λ) evolutionary algorithm and to describe how this algorithm was coupled to the (unknown) version of the HBV hydrological model (see MC5). Finally, I request the authors to describe whether they can ensure that the (µ+λ) evolutionary algorithm has converged to a stable KGE value after 1200 model runs (L125) or not.

MC10. Selection of temporal period used for the calibration of the individual catchments. It is not clear from the manuscript whether the period used to calibrate the HBV hydrological model with each P dataset was the same or whether it depended on the data availability of the respective P product. I request the authors to clarify this situation in the manuscript. In the case that the temporal period used for the calibration of each catchment depends on the data availability of each P product, and therefore, it was not the same for all the P products used as forcing in each catchment, I request the authors to use the same temporal period for the calibration of all P products in each catchment, to ensure a fair comparison of the performance of different P datasets in a given catchment. Of course, the temporal periods may be different from one catchment to another, but for the same catchment the same temporal period should be used to calibrate the HBV model with all P datasets.

MC11. Based on the boxplots summarising the performance of each of the 23 P datasets used in this study, it is quite surprising that the CPC Unified dataset, which is based solely on rain gauge information and has the coarsest spatial resolution of all P datasets (0.5°), ranked second among all datasets. I request the authors to add a paragraph suggesting possible reasons for this unexpected behaviour.

MC12. To provide an initial assessment of the ability of all 23 P datasets used in this study to reproduce the mean annual precipitation at a given location, I request the authors to create a new figure with the mean annual precipitation for 2007-2015 (the longest period for which all datasets have data, after removing the two CMORPH products described in MC2), computed as the average of the mean annual values obtained for each of the 23 P datasets for that period (P_avg). In addition, I request the authors to prepare 23 new figures showing the difference between the mean annual precipitation of each P dataset for 2007-2015 (P_i) and P_avg, i.e., P_i - P_avg. All the figures requested in this comment should be included in the supplementary material only, and they will allow to identify major problems in the representation of mean annual values of a given P dataset in some specific regions of the world.

MC13. To facilitate the “generalizability of their findings” (L50, L57) for readers from different countries, I request the authors to add a new figure to the main body of the manuscript: a map showing, in different colours, the KGE values obtained in each catchment. This figure will allow us to identify the spatial distribution of the high and low performance of each P dataset in the simulation of daily streamflows. This new figure will make it possible to support several statements in the “Results and Discussion” section that are currently not supported by any figure in the manuscript.

MC14. To facilitate even more the “generalizability of their findings” (L50, L57) to readers from the same country but from catchments with different hydrological regimes, I strongly suggest (and do not request) the authors to make an extra effort and classify the hydrological regimes of each of the 16,295 catchments (e.g., pluvial, glacial, snow-dominated, snow-influenced, tropical). This would allow readers to use the results of the articles to select one or more P datasets to use for analysing specific case studies in their own countries. If this suggestion could not be addressed by the authors, I request them to insert three new columns in Table 3: low solid P fraction, medium solid P fraction and high solid P fraction, where the thresholds to distinguish between low, medium and high values of solid P fraction should be proposed by the authors based on their knowledge and the values of solid P fraction of all 16,295 catchments.he values of the solid P fraction of all the 16,295 catchments.

MC15a. Poor performance of HBV in arid climates. Although the manuscript does not explicitly mention this, it can be inferred that the authors assume that the performance of HBV is likely to be poor in arid climates (L226), because “P in arid regions tends to be brief and intense, making it challenging to detect and model accurately(Beck et al., 2017b; Sun et al., 2018; El Kenawy et al., 2019; Beck et al., 2019a)” (L227-228). However, Seibert and Bergström (2022) mention in their review that the HBV is routinely used to model the impacts of climate change on water resources around the world, including regions as arid as the Nile (Booij et al., 2011) and, threfore, aridity per se should therefore not be a reason to explain a poor performance of the HBV model.

Booij, M. J., Tollenaar, D., van Beek, E., and Kwadijk, J. C. J.: Simulating impacts of climate change on river discharges in the Nile basin, Phys. Chem. Earth, 36, 696–709, https://doi.org/10.1016/j.pce.2011.07.042, 2011.

MC15b. Definition of the aridity index. In the main text of the manuscript, arid regions are associated with values of the aridity index greater than 1 (L250-251, L266). However, this association is inconsistent with the definition of the aridity index in Table B1 of Appendix B, where the aridity index is defined as the ratio between mean annual P and potential evapotranspiration, and therefore values greater than 1 would indicate wet rather than dry catchments. Please clarify this discrepancy.

MC16. Efficiency of the filter used to select the study's catchmens. In Section 3.2 (Regional performance differences) the authors mention aridity, groundwater use and/or anthropogenic water use as possible explanations for the low performance obtained for several P products in Australia, India, South Korea and Africa. Does this mean that the five criteria used in Section 2.2 to “ensure the suitability of the catchments for the present analysis” (L87) did not work as expected?. I request the authors to add a discussion of why the five criteria previously mentioned were not sufficient to filter out catchments that were not suitable for the present analysis. I also request the authors to consider whether it is necessary to add one or more criteria that would allow the presence of irrigation, hydrograph regulation and/or major consumptive water use to be detected, in order to screen out catchments that will not provide reliable results from the analysis. I suggest the authors analyse the criteria used by the Reference Observatory of Basins for INternational hydrological climate change detection (ROBIN; Kumar et al., 2024) to ensure that the streamflows observed in each selected catchment are free from anthropogenic influences.

Kumar, A., Hannaford, J., Turner, S., Barker, L. J., Dixon, H., Griffin, A., Suman, G., and Armitage, R.: Global trend and drought analysis of near-natural river flows: The ROBIN Initiative, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17249, https://doi.org/10.5194/egusphere-egu24-17249, 2024.

MC17. Make the observed streamflow dataset publicly accessible. HESS request the authors to follow their data policy (https://www.hydrology-and-earth-system-sciences.net/submission.html), which includes a statement on how the underlying research data can be accessed. If the data are not publicly accessible, a detailed explanation of why this is the case is required (e.g. applicable laws, university and research institution policies, funder terms, privacy, intellectual property and licensing agreements, and the ethical context of the research). In addition, the HESS data policy states the provision of unrestricted access to all data and materials underlying reported findings for which ethical or legal constraints do not apply. It is true that a URL or reference to the data source of the streamflow data used in this study is provided in Table A1. However, a researcher wishing to reproduce the results of this study will never be certain that the data downloaded from each URL corresponds exactly to the original 43,627 stations used in this study. Furthermore, in the hypothetical situation of having downloaded exactly the same 43,627 stations that were originally used in this study, it would not be possible to ensure that applying the five criteria, presented in Section 2.2 for filtering out stations, would result in exactly the 16,295 stations finally analysed in this study. Therefore, in practise, it would not be possible for a researcher to reproduce the results of this study. The entire scientific community will thank the authors of this study for providing public access to the daily streamflow data, the catchment boundaries and the location of the outlet of each catchment in order to improve this dataset for future analyses on a global scale.

Minor comments:

In all the manuscript, I ask the authors to use the word “reanalysis” instead of “model” when referring to atmospheric models of the global climate (e.g., ERA5, JRA-3Q), to avoid confusion with the HBV hydrological model used in this study.

Provide the full name of all the abbreviations used in the manuscript the first instance they appear, as specified in the “English guidelines and house standards” of HESS (https://www.hydrology-and-earth-system-sciences.net/submission.html). This is particularly important for all the precipitation products, which can not be assumed to be known by the wider scientific community. In addition, please provide a reference for each P dataset the first time they appear in the text.

Because CAMELS is a catchment dataset specifically developed for U.S., I request the authors to use CAMELS only for the US datasets, while when referring to CAMELS-like datasets developed for other countries, the individual names of the datasets should be used (e.g., CAMELS-GB, CAMELS-CL) or a generic name different from “CAMELS”.

To avoid possible ambiguities, use always in the text “streamflow” instead of “flow”. Also, when using “runoff” instead of “streamflow”, specify how runoff was obtained.

L20-21. Provide a reference for the crucial role that the spatio-temporal distribution of P plays in water resources assessment.

Table 1. Correct the reference provided for IMERG-Final V7, because Huffman et al. (2019) makes reference to version 6 and not to version 7.

Table 1. In the column “Temp. Cov.”, please explain the meaning of “NRT” in the caption of the table, and remove that term (assumed to mean “near real-time”) for all the products which time latency is larger than 1 day.

Table 1. Provide a “Time Latency” value for all the products lacking such information.

Table 1, Table 3, Figure 2. Please check whether IMERG-Early V7 was used in this study or not, because L72 mentions only IMERG-Early V6 and not IMERG-Early V7.

L69. It mentions that “The datasets fall into two main categories”. However, in L149 it is mentioned that “Among the six main categories of P datasets”, which is consistent with the six categories used in Table 1 (column ‘Data Source’) and Table 3 (column ‘Dataset Type’). I ask the authors to keep six categories in all the manuscript, using ‘Dataset Type’ as a consistent denomination name and using “S, R (reanalysis), G, S+G, S+R, S+R+G” as possible values for this denomination name (instead of “S; R (reanalysis); G; S,G; S,R; S,R,G” as used in Table 1).

CAMELS-like instead of CAMELS.

L82. Change “and websites” by “or websites”, because Table A1 provide either a reference or a URL but not both.

L103-104. Provide the catchments areas corresponding to the 2.5 and 97.5 percentiles as well.

Figure 1. Explain in the caption what is specifically shown in panels a) and b) of this figure.

Table 2. Please add a new column “units” to specify the measurement units of each HBV parameter.

L122-124. Provide more details about the statement: “Model initialization was done by running the model with 10 years of prior P data, if available. If 10 years of prior P data were not available, the model was run multiple times using the available P data until a total of more than 10 years was accumulated”. In particular, clarify how running multiple times the HBV model allow to compensate the lack of P data.

L130. Remove GDAS from the examples of P datasets with short record, because its data start in 2001, in contrast to the two CMORPH versions which data starts in 2019.

L144. Explain what do you mean by “γ reflects the shape of P probability distribution”.

L158. In the sentence “Specifically, gauge data enhance performance in …” do you mean something like “Specifically, bias correction using gauge data enhance performance in ….”?

L165-166. Please provide a reference that support the statement about the climatological rain gauge adjustment in IMERG-Late V7. This is requested because to the best of my knowledge the document “IMERG_V07_ReleaseNotes_final_230713.pdf”, only mentions “Applied climatological adjustment to the Final Run for Early and Late Runs”.

L174-175. Provide a discussion about the poor performance of PDIR-now in UK, Denmark and Italy.

L179. Provide a reference for GDAS.

L202. Could you be more specific with the sentence “the importance of improving coverage in data sparse regions due to data sharing limitations” ?

L203. Where can we see the “comparison of PCORR parameter values obtained after calibration using different P datasets” ?

L209. How is it possible to obtain negative values of the PCORR parameter if the range specified for this parameter in Table 2 was [1, 2]?

Figure 2. Add to the caption of this figure the meaning of the horizontal black line shown in each boxplot.

L237-L241. To avoid confusion, please use the same attribute names used in Figure 4 and Appendix B (e.g., use “Mean PET” instead of “low Mean PET”).

L240. Develop more the idea “…, as frontal P is prevalent under these conditions”.

L243. Please introduce the concept “Rain Gauge Density map” before using it here.

L272. Correct “JRA-3”

L275. Explain the meaning of TOVS-to-ATOVS.

L277-280. Where can we see the low performance obtained by PDIR-Now in Italy and Denmark, as well as the low performances obtained by JRA-3Q in Tahiland?

L288. To improve the clarity of the text, please change “bias-adjustment techniques” by “bias-adjustment techniques of P datasets”.

L310-312. Can you provide any number to support the statement “our approach may slightly overestimate the relative performance of gauge-based and model-based datasets compared to satellite-only datasets"?

L313. Remove GDAS from the examples of P datasets with short record, because its data start in 2001, in contrast to the two CMORPH versions which data starts in 2019.

L317-321. I suggest to move these lines into a new section termed “Future work”.

L331. Given that GPM+SM2RAIN performed best among all the satellite-only P datasets, and considering that the developers of that product are among the authors of this work, can you provide some description of the reasons that prevent updating this product at least once a year?

L334. Stating that MSWEP is a “gauge-based” dataset gives the wrong idea that this product is only based on rain gauge information. I suggest to be more specific here and specify that this product uses information from rain gauges, among other sources.

L339-340. The statement “while arid regions exhibited overall poor performance, with model-based datasets slightly outperforming others” is not correct, because Table 3 shows that IMERG-Final V7, GPCP v3.2 and CPC Unified outperformed reanalysis datasets in arid regions. Please correct.

In the sections “Results and Discussion” and “Conclusions” please provide some analysis of the performance of the P datasets in mountainous regions, which is of utmost interest for the wider hydrological community.

In the Section “Conclusions” please mention something about the catchment attributes that would allow to predict -to some extent- a good performance of the P datasets, which is of utmost interest for the wider hydrological community.

L359. NOAA is written twice. Correct.

L373. Change the capital “O” used in “Observed”.

L377. Mention in the text where the radiation and humidity data are used in this work.

Table A1. Please separate the “Data source” column into two different columns: “Institution name” and “Country”, to have better information about the data source used for the observed streamflow data.

Table B1. Indicate the measurement unit used for the attribute “Rain gauge density”.

Table B1. Incorrect citation to Legates and Bogart (2009). Please correct.

Table B1. Considering the existence of the attribute “Permafrost fraction”, why the attribute “Glacier fraction” was not included in the analysis?

L388-394. Please provide the correct acknowledgment to each one of the P datasets used in this study, as requested by each data source provider.

L399. There is an incorrect character in the reference. Correct it.

L503-508. This reference is repeated twice. Correct it.

L612-615. This reference is repeated twice. Correct it.

L631. Correct the error in the URL.
Citation: https://doi.org/10.5194/egusphere-2024-4194-RC1
- AC3: 'Reply on RC1', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC3
RC2:
'Comment on egusphere-2024-4194', Anonymous Referee #2, 04 Apr 2025

Review of EGUsphere-2024-4194
Title
Comprehensive Global Assessment of 23 Gridded Precipitation Datasets Across 16925 Catchments Using Hydrological Modeling
By: Ather Abbas, Yuan Yang, Ming Pan, Yves Tramblay, Chaopeng Shen, Haoyu Ji, Solomon H. Gebrechorkos, Florian Pappenberger, JonCheol Pyo, Dapeng Feng, George Huffman, Phu Hguyen, Christian Massari, Luca Brocca, Tan Jackson, & Hylke E. Beck

This manuscript provides an extensive evaluation of the hydrological performance of 23 gridded precipitation (P) datasets by calibrating a hydrological model over 16,295 catchments across the globe. The 23 P datasets are chosen based on their availability at (sub)-daily scale and their (quasi)-global coverage. Among them, 1 dataset is gauge-based only, 3 are reanalysis-based only, 12 are satellite-based only, 3 combine gauge and satellite data, 2 combine reanalysis and satellite data, and 2 combine gauge, reanalysis and satellite data. A conceptual hydrological model (HBV) is used to simulate daily streamflow and is calibrated with each P dataset using the evolutionary algorithm. The Kling-Gupta Efficiency (KGE) is used to assess the hydrological performance of each P dataset across 16,295 catchments.

General Comments:
This study tackles a crucial issue in hydrological modeling, which is becoming increasingly important for many potential users who often lack guidance and confidence when navigating the wide range of products available for addressing specific hydrological problems across different geographical regions. The manuscript is well-written and well-structured; however, several issues need to be addressed to strengthen the robustness of the findings and conclusions.
The inclusion criteria stated in Section 2.2 (L80-104) are questionable because of their subjective nature and arbitrary setting. To ensure sufficient data for calibration, the number of events (defined as runoff > 5 mm d^-1) must exceed 10 non-consecutively (L96-97). However, some fixed values were set by the authors without providing a clear explanation of their rationale behind their selection. Would it be more appropriate to use a percentile of runoff instead of a fixed value for adaptation across catchments with different hydrological regimes? Similarly, to filter out catchments with erroneous streamflow and catchment boundary data, the authors set the mean annual runoff to be ≥ 5 and < 5000 mm yr^-1 (L98-99). However, the range of mean annual runoff values can vary significantly across different climatic zones, with arid regions ranging from 0 to 100 mm yr^-1, while tropical regions can vary from 800 to over 2000 mm yr^-1. It would be appreciated if the authors could provide further explanation and justification for their inclusion criteria.
The hydrological performance of P datasets with higher spatial resolution might be compromised when using catchment-mean P to drive the hydrological model, as the more detailed spatial information from these P datasets is lost. It is somewhat surprising to see that CPC Unified ranks second, given its coarse spatial resolution (0.5°). In addition, it is quite interesting that JRA-3Q performs better than its higher spatial resolution counterparts (ERA5 and GDAS). It is suspected that using catchment-mean P might mask the advantages of higher spatial resolution, leading to the conclusion that “higher spatial resolution does not guarantee better performance, especially when data is aggregated at the catchment scale” (L153-154). This may hold true for catchments dominated by a single climate or with relatively uniform topography, where spatial variability in precipitation has less influence. However, in mountainous, snow-dominated, or mixed-climatic catchments, the hydrological response cannot be adequately captured without detailed spatial P information. As a result, the true value of higher spatial resolution datasets may be underestimated, potentially biasing the selection of P datasets for hydrological modelling.
The use of PCORR parameter to mitigate systematic biases in P datasets during calibration may present challenges because it adjusts only for P underestimation by setting the range between 1 to 2. This focuses on correcting underestimation without addressing P overestimation could disproportionately affect datasets prone to overestimation, potentially skewing performance evaluations. For instance, datasets like PDIR-Now and JRA-3Q, which experience overestimation, have low median KGE scores in some streamflow data sources (L277-280). It would be appreciated if the authors could provide a more comprehensive explanation and justification for their focus on mitigating only P underestimation.

Specific Comments:
L57-60: It would be appreciated if the authors could provide some basic information about the new datasets in the Gridded P Datasets section 2.1.
L121-124: It is very unclear that how the model was initialized when 10 years of prior P data were not available. Did the authors just concatenate the same available P data n times to achieve the desired length? Or did the authors use any rainfall generators to produce stochastic P data? Please clarify and justify the use of “multiple times using the available P data until a total of more than 10 years was accumulated”?
L124-126: It would be appreciated if the authors could provide more information and description about the evolutionary algorithm.
L128-132: For a particular catchment, the full period of overlapping streamflow and P data could be different because of the differences in the temporal availability of the P datasets. In this regard, will such differences also cause instability in the performance score?
L170-176: Will the poor performance of PDIR-Now due to the inability of PCORR in adjusting overestimation of the P dataset?
L310-312: Could the authors elaborate further how the alignment of streamflow stations with meteorological network might favour gauge-based and reanalysis-based P datasets over satellite-only P datasets?
Remarks:
L54: typo “result n biased conclusions”
L72: should it be “IMERG-Early V7” instead of “IMERG-Early V6”?
L275: please explain “TOVS-to-ATOVS transition. Thank you.

Citation: https://doi.org/10.5194/egusphere-2024-4194-RC2
- AC2: 'Reply on RC2', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC2
RC3:
'Comment on egusphere-2024-4194', Oscar Manuel Baez Villanueva, 09 Apr 2025

This article presents a comprehensive evaluation of the daily performance of 23 global precipitation (P) datasets through a hydrological modeling experiment using the HBV model across 16,295 catchments worldwide. The manuscript is very well-written and clear, and the study addresses a topic of interest to the scientific community. It contributes valuable insights into the suitability of different P datasets for hydrological applications at the global scale and fits well within the scope of the journal. Overall, this is a strong and useful contribution, and I congratulate the authors for the scale and depth of the analysis. I believe that considering the following points will further enhance the manuscript.
I understand the rationale behind calibrating both the snowfall gauge undercatch correction factor (SFCF) and the multiplicative bias correction factor (PCORR), as these systematic errors can be addressed more easily. However, I suggest lowering the minimum bound of these parameters (e.g., to 0.6) to avoid favouring datasets that tend to underestimate P. In addition, it would be helpful to compare the results of the current calibration approach with a scenario where SFCF and PCORR are both fixed at 1.0. This comparison could shed light on the overall performance of P datasets. More specifically about (i) which datasets tend to systematically over- or underestimate and where, and (ii) the relative importance of these biases when these products are used for hydrological modelling purposes.
The HBV model is applied in a lumped configuration, considering catchment-averaged forcing data. While this is understandable for a global-scale study, it would be very interesting to explore whether a semi-distributed configuration that accounts more explicitly for P gradients related to elevation, could provide additional insights, particularly in mountainous or topographically complex regions.
It would also be helpful to include a short statement in the Limitations Section acknowledging the assumption of constant land cover over the analysis period. Land cover changes can influence hydrological responses and might introduce some uncertainties in the model performance, especially over multi-decadal periods.
Since PCORR and SFCF are calibrated, I also recommend including a few sentences explaining which types of biases are captured by the beta component of the KGE. This would clarify the references throughout the manuscript to over- and underestimation of datasets, which in part, could be related to biases of different magnitudes for specific P intensities, and the skill of the products to accurately detect P events.
In Section 2.2, the authors mention that streamflow records of selected catchments must span more than three years. Could the authors clarify if these years must be consecutive? Similarly, the rationale for requiring more than 10 non-consecutive P events is not fully explained. How was this threshold determined?
Additional minor suggestions:
Table 1: In the “Temporal resolution” column please use either “30 min.” or “30 min” consistently.
L156: It would be helpful to report the median KGE for MSWEP V2.8 here for easier comparison with other products.
L189: Likewise, indicate the median KGE for CHIRPS V2.0.
Table 3: IMERG-Final V7 also performs best over tropical regions and should be marked in bold, as is done for MSWEP V2.8.
Throughout the manuscript, “evaporation” and “evapotranspiration” are used interchangeably. Please consider clarifying and using these terms consistently.

Citation: https://doi.org/10.5194/egusphere-2024-4194-RC3
- AC1: 'Reply on RC3', Ather Abbas, 21 Jul 2025
  
  Thank you for your comments. Please find our responses in the attached PDF file.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4194-AC1

Supplement

https://doi.org/10.5194/egusphere-2024-4194-supplement

Viewed

Total article views: 3,186 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,286	844	56	3,186	229	55	81

HTML: 2,286
PDF: 844
XML: 56
Total: 3,186
Supplement: 229
BibTeX: 55
EndNote: 81

Views and downloads (calculated since 20 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	222	78	6	306
Feb 2025	143	35	2	180
Mar 2025	82	22	1	105
Apr 2025	125	21	3	149
May 2025	66	16	1	83
Jun 2025	119	32	9	160
Jul 2025	138	23	8	169
Aug 2025	186	21	0	207
Sep 2025	484	26	2	512
Oct 2025	124	30	2	156
Nov 2025	121	39	2	162
Dec 2025	106	100	6	212
Jan 2026	113	92	8	213
Feb 2026	83	109	2	194
Mar 2026	151	176	4	331
Apr 2026	23	24	0	47

Cumulative views and downloads (calculated since 20 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	222	78	6	306
Feb 2025	143	35	2	180
Mar 2025	82	22	1	105
Apr 2025	125	21	3	149
May 2025	66	16	1	83
Jun 2025	119	32	9	160
Jul 2025	138	23	8	169
Aug 2025	186	21	0	207
Sep 2025	484	26	2	512
Oct 2025	124	30	2	156
Nov 2025	121	39	2	162
Dec 2025	106	100	6	212
Jan 2026	113	92	8	213
Feb 2026	83	109	2	194
Mar 2026	151	176	4	331
Apr 2026	23	24	0	47

Viewed (geographical distribution)

Total article views: 3,093 (including HTML, PDF, and XML) Thereof 3,093 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Short summary

Our study evaluated 23 precipitation datasets using a hydrological model at global scale to assess their suitability and accuracy. We found that MSWEP V2.8 excels due to its ability to integrate data from multiple sources, while others, such as IMERG and JRA-3Q, demonstrated strong regional performances. This research assists in selecting the appropriate dataset for applications in water resource management, hazard assessment, agriculture, and environmental monitoring.


Total:	0
HTML:	0
PDF:	0
XML:	0