Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation

Muto, Yuka; Kotsuki, Shunji

doi:https://doi.org/10.5194/egusphere-2024-960

Preprints

https://doi.org/10.5194/egusphere-2024-960

Preprints

23 Apr 2024

| 23 Apr 2024

Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation

Yuka Muto and Shunji Kotsuki

Abstract. It is crucial to improve global precipitation estimates for a better understanding on water-related disasters and water resources. This study proposes a new methodology to interpolate global precipitation fields from ground rain gauge observations using the algorithm of the local ensemble transform Kalman filter (LETKF) in which the first guess and its error covariance are developed based on the reanalysis data of precipitation from the European Center for Medium-Range Forecasts (ERA5). For the estimation of each date, the climatological ensembles are constructed using the ERA5 data 10 years before and after that date, and thereafter are utilized to obtain the first guess and its error covariance. Additionally, the global rain gauge observations provided by the National Oceanic and Atmospheric Administration Climate Prediction Center (NOAA CPC) are used for observation inputs in the LETKF algorithm.

Our estimates have better agreements against independent rain gauge observations compared to the existing precipitation estimates of the NOAA CPC in general. Because we utilized the same rain gauge observations for the inputs of our estimation as those used in the NOAA CPC product, it is indicated that the proposed estimation method is superior to that of the NOAA CPC (i.e., the Optimal Interpolation). Our proposed method took the advantage of constructing a physically guaranteed first guess and its error variance using reanalysis data for interpolating precipitation fields. Furthermore, the method of this study is shown to be particularly beneficial for mountainous or rain-gauge-sparse regions.

Received: 29 Mar 2024 – Discussion started: 23 Apr 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1905 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1905 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

17 Dec 2024

Estimating global precipitation fields by interpolating rain gauge observations using the local ensemble transform Kalman filter and reanalysis precipitation

Yuka Muto and Shunji Kotsuki

Hydrol. Earth Syst. Sci., 28, 5401–5417, https://doi.org/10.5194/hess-28-5401-2024,https://doi.org/10.5194/hess-28-5401-2024, 2024

Short summary

Yuka Muto and Shunji Kotsuki

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-960', Anonymous Referee #1, 14 May 2024

Comments to “Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation” by Muro and Kotsuki
This paper conducted interpolation of global precipitation data using the algorism of LETKF with ERA5 reanalysis data. According to the daily base comparison with APHRODITE and monthly base comparison with GPCC, rank correlation coefficient and mean absolute difference showed improvements compared to the CPC_est products. Also, their global distribution showed better agreement in some mountainous areas. Therefore, the paper concluded LETKF method took advantage for interpolation of precipitation field in global scale.

Demonstrating the performance of LETKF with reanalysis data to reconstruct global scale precipitation data is beneficial for the assessments of hydrological cycle, and publication of the results in HESS are basically promoted.

Although I am not keen for the LETKF, I was interested in the distribution (areas) of better plots in Fig. 8 that were not always coincident with topography. Besides, I think that there are several fundamental problems to derive author’s arguments, and I would like to suggest fundamental revision.
General comments

1. The product quality of this paper may rely not only for the LETKF method but daily base ERA5 data that have archived various satellite base observational information with physical model to assimilate them. I wonder that usage of ERA5 is critical, or LETKF could also act better on other reanalysis data, such as NCEP or JRA. Case studies in Fig. 2 and Fig. 5 were used to confirm the improvements, however, why they were a single day/month in old ages such as 1988 and 1985 ? Are they chosen to avoid recent improvements of ERA5 quality on purpose? Did you try other cases with different seasons to derive the same tendency? Please show the representativity of case studies, and discuss that how much of your improvements were rely on ERA5 quality.
2. Reasons of the improvement in this paper is in the statistical base. The statistics changes depending on the samples derived from the areas and periods. However, the treatment of target areas or periods changes depending on the chapters, or even they are not clearly explained. It looks like that the samples may prepared as author’s subjective convenient. Consistent data process, such as the same area with same duration, is required for daily and monthly analysis.
2. Most of the gauge observation has been conducted in the valley or basin where people live even in the mountainous regions, and interpolation of those gauge-based network is hard to provide unique signals in high-elevations. It is the same situation for gauge space areas, because interpolation can not produce no data areas’s information. The APHRODITE is the same condition. Besides, numerical model providing the reanalysis data is expected to reproduce precipitations (not as interpolation). Direct measurements by satellite-based radar observation, such as TRMM or GPM-PR, are also expected to provide the signals, however, gauge adjusted micro-wave satellite products (such as GSMaP-Gauge) intentionally filter out the important remote signals. If the paper would like to insist that new products are beneficial for mountainous or rain-gauge-sparse regions, please show the results of comparisons with gauge data locating in the high-mountains or remote areas not included in APHRODITE or CPC networks. Also, you intended focusing on specific areas, such as Himalayas, Zagrous mountiains, South-east Asia and central part of Africa, but they are not “mountainous or rain-gauge-sparse regions” of all. I would like to advice to exclude the sentences of beneficial/improvements of new data in “mountainous or rain-gauge-sparse regions” from the abstract and conclusion. Or you should mention that as “algorism worked better especially in Himalayas, Zagrous mountiains, South-east Asia and central part of Africa” with adequate reasons.
3. Composition of the chapters need to be revised again. In the Section 1, reviews need to derive the issues to be challenged, and clear objectives should follow. If the “Estimation” is your objective as in the title, you need to specify not only for the target periods/areas but also describe for “which purpose”. As there are already so many precipitation products (Sun et al., 2018), you may want to demonstrate the efficiency of LETKF with ERA5 data. Then it is better to modify the title, and add physical explanation of why the LETKF could improve the biases in the conclusion, not only showing the statistical number. In that sense, analysis procedure explained in the Fig.1 caption should be done in the contents with more polite ways. Discussion of chapter 4 need to be done with challenges described in chapter 5. Besides, important results such as the performance of LETKF is needed in the conclusion more.
Specific comments
Title: Better to mention the appeal terms, such as LETKF, improvements, assessments,,

Abstract: Better to mention the reason of why the LETKF could improve the products according to the comparison of CPC_est.

L56 Please rise the issues that previous study did not archive. Then, describe why you need new methods, for which areas/period for your target of estimation.

L57 Clear objectives are missing. You need to set them according to the key conclusion (Chapter 5).

L66 Readers can not handle why the Fig.1 appeared suddenly without explanations. Move the Fig. 1 in Section 2.1.2.

L68-74 Fig.1 caption includes study methods to be written in the main contents.

L76 Need to explain why the CPC_est is the target of comparison.

L77 “Daily” mean 24 hours from 0UTC ? The original daily CPC data were not local time coordinate?

L78 CPC archive does not limit to the US. Please clarify the target areas/periods of your estimation here. Maps of Fig.1 includes north/south America and Australia, but you omitted them later.

L80 You did not estimate the grids without gauge sites, then mask the grids in the following maps. If the multiple gauge station existed in a pixel, did you assume them in the same location in the 0.5 degree scale?

L85-88 I can not understand “,, over land, where rain gauge observation are available”. How did you adjust 0.5 interval CPC_est with 0.25 interval ERA5 data? The same expression at L166 “ converted”.

L100 Why the (2) was classified at 1mm/d ? No precipitation (0mm/d) is always log(2)?

L103-104 Why the “the data of the 10 years before and after the date”, “surrounding 7 days for ,,” ? Again, your target of study period is not clear, so I can not understand why you intended to do so.

L105 It is better to divide the Fig. 1 in two, and lower part should be cited here as Fig. 1b.

L106 Section 2.2.1 is about the comparison for case study day. Did you perform the comparison only in the case day or multiple years? Reader can not understand the detail evaluation methods.

L115 Formula (5) is your original?

L117　“Observation site” is the location of CPC observation site used to make CPC_est? You mentioned that location of the gauge is set at a pixel (L80), so it is not clear the meaning of d (distance). Meaning of “analysis grid point” is also unclear. Is this about the ERA5 grid? Please also revise English sentence.

L124 “author’s preliminary experiments” need additional explanation or citation. Some constants, such as 1000km or 10, many have meaning according to the study target.

L151 Why the Fig. 2a and 2b are different areas? Still not clear the target areas of your estimation. Are you interested in Asia for daily base and global scale for monthly base? Better to unify the map (and analysis) areas. As the precipitation intensity distributions are depending on the climate (areas), following statistic (such as shown in Fig.4, 6) may change depending on the target areas.

L144-149 APHODITE and GPCC were utilized in different concepts. Former data is very dense and used for hydrometeorological sense, and latter data is long and used to evaluate historical climate change. The daily biases are evaluated in local time base, and monthly biases are evaluated by subgrid scale spatial average. Such background should be referred in Section 1. Then, please clarify which kind of time scales you want to “estimate” ?

L148 APHRODITE and GPCC may include the data by GTS, so they are not “independent”.

L149 I do no think “dense rain gauge” in any MA regions. Again are you interested in the estimation of monsoon Asia? The Gauge observation is much dense in UA, Europa, Japan,, (Fig. 2b). Why you did not avoid those areas? Also, which periods of comparison with APHRODITE?

L165 Again, you assumed the location of gauge at the center of pixel (0.25 or 0.5 grids), but considered the distance (d) between the grid point and observation site (gauge location) at L114, making confusion. In the later chapters, orographic effects are discussed, but such assumption (location of gauge = the center of pixel ) do not affect for your interpretation?

L172 “to be biased” which kind of biases? Then, why you choose rank correlation coeffect? Do you want to improve the identification of extreme events, not the absolute amount?

L183-186, L200-203 These parts should be explained before.

L142 “2.2 Validation” This chapter would be in the “3. Result” of your analysis.

L185 “APHRODITE < 0.5 mm/d is excluded”. Your statistics exclude the non-rain days. Please mention clearly in advance. This is not the matter of rage accuracy.

L215 Why you chose the old post-monsoon month in both hemisphere (1988, Nov. 15th) that may also miss heavy precipitation events? As you evaluate the difference as ranking correlation (L174) and would like to discuss the orographic enhancement (Fig. 7), the day should be in summer. Why the legend is exponential without color?

L211 I could not understand “broader precipitation areas”. Where is the Himalayas and Zaguroud mountains? Please mention in the map.

L220 The sample of the Fig.4 is not clear. Is this from one day distribution on Fig. 3 or from certain periods. Why you limit the areas in Fig. 2a? You will discuss the signals in Africa later on (Fig. 9).

L224 Although the correlation coefficient is highest, is it significant? Please show the statistical significance.

L234 Monthly comparison in Fig. 5&6 was done in which areas, in MA or global scale? If it is in global scale, North/South America is included and why it was different from the comparison area of APHRODITE?

L257 There are many kind of dynamics of orography affecting precipitation system (Houze, 2012). Please explain why you assume the first guess could take into account the orographic effect? Please explain somewhere in the paper.

L260 Why you choose the day of Jun. 7th, 1985 in MA? The date is old and different from Fig. 3. The feature of “reproducing the orographic changes in precipitation” was also confirmed by other days?

L261 Monsoon rain along the Himalayas dominates in the night (e.g. Sugimoto et al., 2021). So, you mean that your algorism work for the nocturnal rain? Orographic ascending type precipitation along the Ghats mountain range was reproduced in both products (Fig. 7e&f). Is this consistent with your idea? Please explain the consistency if you would like to mention “LETKF succeeded in reproducing orographic changes in precipitation”.

L270 Is this June 7th or 27th? This map is also different from MA (Fig. 2a). Your comparison changes areas/periods according to your interests. I hope the analysis in the same areas because your results are depending on statistical evidences.

L276 Only one case does not to fit to mention “as the Himalayas in general”.

L277-278 Need to mention the sample periods for Fig. 8. I could not understand how one pixel could get more than 1800 samples.

L279 Please show the way of statistical significance if you insist “significantly”.

L299 I can not see grey pixel.

L282 Please explain the meaning of “samples”. Is this months, then which period?

L285 Here you mentioned “temporal MAD”, but formula (14) defined the spatial MAD. Please explain the difference.

L286 Figure 9d-f covers Africa. Do you also want to estimate the precipitation in Africa? Please explain the reason of area extension.

L287 “methods is beneficial for those areas in general” mean your methods works especially in the Himalayas and Zagrous Mountains or mountainous areas in general? Why in general?

L288 “gauge stations are especially sparse, such as South-east Asia and central part of Africa” Such crude descriptions should be avoided. Where is the central part of Africa? There are dense gauge networks even in Asian countries.

After L253 Chapter 4 and 5 must be revised carefully after the revision according to the former comments. Exclusion of North America, Australia and Arabian Peninsula is excused in the ending part of the conclusion; however, this way is very strange. You need to mention the target areas in the beginning with reasons.

Citation: https://doi.org/10.5194/egusphere-2024-960-RC1
- AC1: 'Reply on RC1', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached. The supplemental PDF file would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC1
- AC2: 'Reply on RC1', Yuka Muto, 17 Jul 2024
  
  Please find the supplemental PDF attached to this reply.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC2
RC2:
'Comment on egusphere-2024-960', Anonymous Referee #2, 20 May 2024

Precipitation is the most significant and dynamic variable linking to atmospheric circulation in climate and weather studies and a vital component of the water cycle. Estimate of precipitation in accuracy are important not only for the study of climate trends and variability, but also for the management of water resources and weather, climate, and hydrological forecasting. This study uses the LETFK algorithm to estimate global daily precipitation by integrating rain gauge data and ERA5 reanalyzed precipitation dataset. Specifically, the ERA5 precipitation dataset provides the initial guess and its error covariance, and the NOAA CPC global rain gauge observation data updates the prior estimate to obtain the analyzed precipitation. By comparisons to the existing precipitation dataset (i.e., NOAA CPC, APHRODITE and GPCC products), the analyzed precipitation shows superior accuracy, particularly in mountainous and rain-gauge-spare regions.
The manuscript is well written, however, there are major concerns below raised by the reviewer that necessitate addressing before acceptance. Please also refer to the attachment for specific comments.
1. The title should be revised to incorporate the ERA5 dataset, given its large contribution to the improved estimate, if the reviewer understands correctly from the author. As such, the reviewer wonders if the method proposed by the author can also enhance the precipitation field on recent periods, furthermore, the other fields (e.g., soil moisture) from ERA5 datasets? Could the author provide a brief discussion on the applicability of this method in the “discussion” section?
2. In the second paragraph of “Abstract”, could the author add the results of comparisons to the APHRODITE and GPCC products to support the author’s demonstrations that the method of this study is shown to be particularly beneficial for mountainous or rain-gauge-sparse regions.
3. In section 2.1.2 regarding to LETFK, the author mentioned that the parameterization of error covariance of observations are based on preliminary sensitivity experiments. Could the author briefly introduce the experiments? The corresponding result can be placed in the supplementary materials. Additionally, the review also suggest making plots of spatial and temporal error distribution (may put it in the supplementary materials), so the reader can further understand the observation error better and evaluate the improved estimate.
4. Clarification is needed in section 2.1.2 regarding whether parameter values such as 10 days, 7 days, 2*sqrt(10/3), 1000 km and 10 are optimal for this case or are they generic values in widely-used sense? It would be valuable to discuss the sensitivities of data assimilation results to the variations in these parameters in the “Discussion” section?
5. In section 2.1.2, could the author cite the source for Equation (5) and give a more detailed explanation? The review suggest including figures depicting temporal and spatial distribution of the localization function L(d) for reference in the supplementary materials.
6. The author declared that the orographic effects considered in the EAR5 results in the superior performance of the analyzed precipitation on the mountainous regions. It would be better if the author add a short description of the interpolation method and ancillary data (e.g., especially whether the elevation data is included) used in the interpolation of CPC, GPCC and APHRODITE products, as such, the reader can get insights on author’s declaration.
7. The reviewer proposes integrating the ‘Discussion’ section into the “Results” section, as it shows the results of comparisons between LETKF_est with the existing datasets. Furthermore, the content in the third paragraph in the “Conclusions” section could be discussed in the ‘Discussion’ section in a more detailed way.

Citation: https://doi.org/10.5194/egusphere-2024-960-RC2
- AC3: 'Reply on RC2', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC3
- AC4: 'Reply on RC2', Yuka Muto, 17 Jul 2024
  
  Please also fined the supplemental PDF file attached to this reply, which would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC4
RC3:
'Comment on egusphere-2024-960', Anonymous Referee #3, 07 Jun 2024

Review of
Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation
by Yuka Muto and Shunji Kotsuki

Overall description and assessment
The Authors present a manuscript addressing a challenging task of producing global fields of daily precipitation over an extended historical period. The paper explores a new idea, namely an implementation of a well-established data assimilation methodology, LETKF, for the generation of those global gridded precipitation fields based on sparse in-situ observations.
The motivation behind the presented study is clearly explained, the methodology correctly referenced, the results are illustrated and analysed. Comparisons to the existing NOAA CPC estimates obtained using OI show superior performance of the results obtained by the Authors, especially in the mountainous and data sparse regions. Possibilities for further improvements which could be addressed in future studies are outlined.
The paper is concise and well-organised. The plots are neat, they nicely summarise the results and support conclusions.
I recommend the manuscript for publication as soon as the minor issues listed below have been addressed.

Minor comments
A. In the introduction, possibly in line 26, right after the first sentence of this section I propose to insert an additional one emphasizing the importance of global gridded precipitation fields for validation of data assimilation (for example data assimilation of space borne lightning observations would lead to forecast precipitation fields that one may want to compare with gridded precipitation fields) as well as climate studies.
B. Line 52 in the introduction: It is not clear to me how EnDA is used to obtain climatological covariances. Based on caption in Fig.1 would you say that your covariances are climatological but date-specific. Maybe 'daily climatological covariances'?
C. Line 115: I am puzzled by sqrt(10/3) in the formula expressing L(d)? Is there a particular reason for such a choice? I mean, I would understand 3 sigma, or possibly also 2 sigma, but I am puzzled by the value of the constant the Authors used. There is essentially nothing wrong with such a choice if it serves the purpose but I have been wondering if there was a justification for it.
D. Section 2.2.2. Could the Authors elaborate on how the Kendall's coefficient has been computed? What criteria do you use to rank the precipitation fields in the 3 analysed data sets and in the APHRODITE_gauge? When you talk about concordant/discordant correlations do you mean their sign?
E. Line 212: regarding Zagros Mountains, looking at your plots, it seems to me that you do not really have rainfields reconstructed in this particular region, which is between the Caspian Sea and the Persian Gulf. I think that the coloured region on your map is in Kazakhstan, Uzbekistan and Turkmenistan (Turan Depression?). I also think that your differences with respect to GPCC are more significant over South America. By the way, Zagros Mountains are correctly identified in Fig.9
F. Section 3, line 228 in the caption of Fig. 4: What does the ratio really represent? I mean what do the bins refer to?
G. Figs 5 and 6: I do not want to add to your work but since GPCC is a reference, it would be better to name the plots CPC_est vs. GPCC and LETKF_est vs. GPCC. In Fig. 8 I would also rather say CPC_est versus APHRODITE_gauge and LETKF_est versus APHRODITE_gauge. Not sure if you need to replot the figures for that and how difficult it is for you at this stage.
H. Line 250, 251: Figure 6 legend and caption. There is an inconsistency between the legend: dark-red circles represent low latitude and the caption: dark-red circles represent mid- and high-latitude regions. In addition, please also check if the statements in lines 236-241 are correct

Editorial remarks
line 9: 'understanding of' better than 'understanding on'
line 13: estimation for each date
line 26: 'they are' better than 'it is'
line 36:maybe 'important' or 'valuable' instead of 'demanding'; or, maybe, 'in demand'
line 53: 'NWP-based precipitation records'; is the word 'record' employed here best as it implies observational information; maybe 'NWP-based precipitation fields' would work better?
line 59: 'in comparison' better than 'with comparison'
line 84: 'Forecasting' instead of 'Forecast'
line 94: 'covariances' or 'covariance matrices' rather than covariance; I also think the Authors want to say 'scalars'
line 111: I think you mean ensemble members
line 112: better: 'requires localization'
line 116: be careful how you phrase it because the way it is written now implies dividing by zero outside of the radius, which is probably what you are effectively doing; but ii would be better to state that outside of the 'impact' radius there is no influence from the observation by effectively setting r to infinity; or something of that sort
line 119: better say: 𝑑ini 𝑚𝑎𝑥 in km?
line 119: ' ... followed by setting ...'
line 161: 'on a monthly basis' better than 'in a monthly basis'
line 185: 'below' better than 'under'
lie 191: is 𝑥_𝑟𝑒𝑓_𝑖_,_𝑡is a typo which should read 𝑥_{𝑔𝑝𝑐𝑐}_𝑖_,_𝑡 instead?
line 200: remove in before year
line 205: remove 'in'
line 205: better 'are illustrated' than 'is illustrated'
line 212: 'Indochinese Peninsula' or 'Indochina'
line 217: subplot (lower case)
line 255: ‘dynamically consistent’ rather than ‘dynamically guaranteed’
line 271, 272: lowercase subplot better (two occurrences)
line 275-276: I would phrase it differently as the Authors analyse one specific example of the LETKF_est in the mountainous areas, even if it is a significant one. I would rather say: 'Using the examples of the Himalayas, we investigate whether the precipitation of LETKF_est is more accurate than that of CPC_est around mountainous areas' or something of that sort
line 280: Using the same argument, I would probably skip 'in general' at the end of the sentence
line 288: maybe Southeast Asia for consistency with line 258
line 314: 'took advantage' instead of 'took the advantage' and 'dynamically consistent' instead of 'dynamically guaranteed' and 'background error covariance'
line 321: 'is known to diverge from Gaussian'
line 351: 'numerical' one word

Citation: https://doi.org/10.5194/egusphere-2024-960-RC3
- AC5: 'Reply on RC3', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC5
- AC6: 'Reply on RC3', Yuka Muto, 17 Jul 2024
  
  Please find the supplemental PDF file attached to this reply, which would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC6

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-960', Anonymous Referee #1, 14 May 2024

Comments to “Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation” by Muro and Kotsuki
This paper conducted interpolation of global precipitation data using the algorism of LETKF with ERA5 reanalysis data. According to the daily base comparison with APHRODITE and monthly base comparison with GPCC, rank correlation coefficient and mean absolute difference showed improvements compared to the CPC_est products. Also, their global distribution showed better agreement in some mountainous areas. Therefore, the paper concluded LETKF method took advantage for interpolation of precipitation field in global scale.

Demonstrating the performance of LETKF with reanalysis data to reconstruct global scale precipitation data is beneficial for the assessments of hydrological cycle, and publication of the results in HESS are basically promoted.

Although I am not keen for the LETKF, I was interested in the distribution (areas) of better plots in Fig. 8 that were not always coincident with topography. Besides, I think that there are several fundamental problems to derive author’s arguments, and I would like to suggest fundamental revision.
General comments

1. The product quality of this paper may rely not only for the LETKF method but daily base ERA5 data that have archived various satellite base observational information with physical model to assimilate them. I wonder that usage of ERA5 is critical, or LETKF could also act better on other reanalysis data, such as NCEP or JRA. Case studies in Fig. 2 and Fig. 5 were used to confirm the improvements, however, why they were a single day/month in old ages such as 1988 and 1985 ? Are they chosen to avoid recent improvements of ERA5 quality on purpose? Did you try other cases with different seasons to derive the same tendency? Please show the representativity of case studies, and discuss that how much of your improvements were rely on ERA5 quality.
2. Reasons of the improvement in this paper is in the statistical base. The statistics changes depending on the samples derived from the areas and periods. However, the treatment of target areas or periods changes depending on the chapters, or even they are not clearly explained. It looks like that the samples may prepared as author’s subjective convenient. Consistent data process, such as the same area with same duration, is required for daily and monthly analysis.
2. Most of the gauge observation has been conducted in the valley or basin where people live even in the mountainous regions, and interpolation of those gauge-based network is hard to provide unique signals in high-elevations. It is the same situation for gauge space areas, because interpolation can not produce no data areas’s information. The APHRODITE is the same condition. Besides, numerical model providing the reanalysis data is expected to reproduce precipitations (not as interpolation). Direct measurements by satellite-based radar observation, such as TRMM or GPM-PR, are also expected to provide the signals, however, gauge adjusted micro-wave satellite products (such as GSMaP-Gauge) intentionally filter out the important remote signals. If the paper would like to insist that new products are beneficial for mountainous or rain-gauge-sparse regions, please show the results of comparisons with gauge data locating in the high-mountains or remote areas not included in APHRODITE or CPC networks. Also, you intended focusing on specific areas, such as Himalayas, Zagrous mountiains, South-east Asia and central part of Africa, but they are not “mountainous or rain-gauge-sparse regions” of all. I would like to advice to exclude the sentences of beneficial/improvements of new data in “mountainous or rain-gauge-sparse regions” from the abstract and conclusion. Or you should mention that as “algorism worked better especially in Himalayas, Zagrous mountiains, South-east Asia and central part of Africa” with adequate reasons.
3. Composition of the chapters need to be revised again. In the Section 1, reviews need to derive the issues to be challenged, and clear objectives should follow. If the “Estimation” is your objective as in the title, you need to specify not only for the target periods/areas but also describe for “which purpose”. As there are already so many precipitation products (Sun et al., 2018), you may want to demonstrate the efficiency of LETKF with ERA5 data. Then it is better to modify the title, and add physical explanation of why the LETKF could improve the biases in the conclusion, not only showing the statistical number. In that sense, analysis procedure explained in the Fig.1 caption should be done in the contents with more polite ways. Discussion of chapter 4 need to be done with challenges described in chapter 5. Besides, important results such as the performance of LETKF is needed in the conclusion more.
Specific comments
Title: Better to mention the appeal terms, such as LETKF, improvements, assessments,,

Abstract: Better to mention the reason of why the LETKF could improve the products according to the comparison of CPC_est.

L56 Please rise the issues that previous study did not archive. Then, describe why you need new methods, for which areas/period for your target of estimation.

L57 Clear objectives are missing. You need to set them according to the key conclusion (Chapter 5).

L66 Readers can not handle why the Fig.1 appeared suddenly without explanations. Move the Fig. 1 in Section 2.1.2.

L68-74 Fig.1 caption includes study methods to be written in the main contents.

L76 Need to explain why the CPC_est is the target of comparison.

L77 “Daily” mean 24 hours from 0UTC ? The original daily CPC data were not local time coordinate?

L78 CPC archive does not limit to the US. Please clarify the target areas/periods of your estimation here. Maps of Fig.1 includes north/south America and Australia, but you omitted them later.

L80 You did not estimate the grids without gauge sites, then mask the grids in the following maps. If the multiple gauge station existed in a pixel, did you assume them in the same location in the 0.5 degree scale?

L85-88 I can not understand “,, over land, where rain gauge observation are available”. How did you adjust 0.5 interval CPC_est with 0.25 interval ERA5 data? The same expression at L166 “ converted”.

L100 Why the (2) was classified at 1mm/d ? No precipitation (0mm/d) is always log(2)?

L103-104 Why the “the data of the 10 years before and after the date”, “surrounding 7 days for ,,” ? Again, your target of study period is not clear, so I can not understand why you intended to do so.

L105 It is better to divide the Fig. 1 in two, and lower part should be cited here as Fig. 1b.

L106 Section 2.2.1 is about the comparison for case study day. Did you perform the comparison only in the case day or multiple years? Reader can not understand the detail evaluation methods.

L115 Formula (5) is your original?

L117　“Observation site” is the location of CPC observation site used to make CPC_est? You mentioned that location of the gauge is set at a pixel (L80), so it is not clear the meaning of d (distance). Meaning of “analysis grid point” is also unclear. Is this about the ERA5 grid? Please also revise English sentence.

L124 “author’s preliminary experiments” need additional explanation or citation. Some constants, such as 1000km or 10, many have meaning according to the study target.

L151 Why the Fig. 2a and 2b are different areas? Still not clear the target areas of your estimation. Are you interested in Asia for daily base and global scale for monthly base? Better to unify the map (and analysis) areas. As the precipitation intensity distributions are depending on the climate (areas), following statistic (such as shown in Fig.4, 6) may change depending on the target areas.

L144-149 APHODITE and GPCC were utilized in different concepts. Former data is very dense and used for hydrometeorological sense, and latter data is long and used to evaluate historical climate change. The daily biases are evaluated in local time base, and monthly biases are evaluated by subgrid scale spatial average. Such background should be referred in Section 1. Then, please clarify which kind of time scales you want to “estimate” ?

L148 APHRODITE and GPCC may include the data by GTS, so they are not “independent”.

L149 I do no think “dense rain gauge” in any MA regions. Again are you interested in the estimation of monsoon Asia? The Gauge observation is much dense in UA, Europa, Japan,, (Fig. 2b). Why you did not avoid those areas? Also, which periods of comparison with APHRODITE?

L165 Again, you assumed the location of gauge at the center of pixel (0.25 or 0.5 grids), but considered the distance (d) between the grid point and observation site (gauge location) at L114, making confusion. In the later chapters, orographic effects are discussed, but such assumption (location of gauge = the center of pixel ) do not affect for your interpretation?

L172 “to be biased” which kind of biases? Then, why you choose rank correlation coeffect? Do you want to improve the identification of extreme events, not the absolute amount?

L183-186, L200-203 These parts should be explained before.

L142 “2.2 Validation” This chapter would be in the “3. Result” of your analysis.

L185 “APHRODITE < 0.5 mm/d is excluded”. Your statistics exclude the non-rain days. Please mention clearly in advance. This is not the matter of rage accuracy.

L215 Why you chose the old post-monsoon month in both hemisphere (1988, Nov. 15th) that may also miss heavy precipitation events? As you evaluate the difference as ranking correlation (L174) and would like to discuss the orographic enhancement (Fig. 7), the day should be in summer. Why the legend is exponential without color?

L211 I could not understand “broader precipitation areas”. Where is the Himalayas and Zaguroud mountains? Please mention in the map.

L220 The sample of the Fig.4 is not clear. Is this from one day distribution on Fig. 3 or from certain periods. Why you limit the areas in Fig. 2a? You will discuss the signals in Africa later on (Fig. 9).

L224 Although the correlation coefficient is highest, is it significant? Please show the statistical significance.

L234 Monthly comparison in Fig. 5&6 was done in which areas, in MA or global scale? If it is in global scale, North/South America is included and why it was different from the comparison area of APHRODITE?

L257 There are many kind of dynamics of orography affecting precipitation system (Houze, 2012). Please explain why you assume the first guess could take into account the orographic effect? Please explain somewhere in the paper.

L260 Why you choose the day of Jun. 7th, 1985 in MA? The date is old and different from Fig. 3. The feature of “reproducing the orographic changes in precipitation” was also confirmed by other days?

L261 Monsoon rain along the Himalayas dominates in the night (e.g. Sugimoto et al., 2021). So, you mean that your algorism work for the nocturnal rain? Orographic ascending type precipitation along the Ghats mountain range was reproduced in both products (Fig. 7e&f). Is this consistent with your idea? Please explain the consistency if you would like to mention “LETKF succeeded in reproducing orographic changes in precipitation”.

L270 Is this June 7th or 27th? This map is also different from MA (Fig. 2a). Your comparison changes areas/periods according to your interests. I hope the analysis in the same areas because your results are depending on statistical evidences.

L276 Only one case does not to fit to mention “as the Himalayas in general”.

L277-278 Need to mention the sample periods for Fig. 8. I could not understand how one pixel could get more than 1800 samples.

L279 Please show the way of statistical significance if you insist “significantly”.

L299 I can not see grey pixel.

L282 Please explain the meaning of “samples”. Is this months, then which period?

L285 Here you mentioned “temporal MAD”, but formula (14) defined the spatial MAD. Please explain the difference.

L286 Figure 9d-f covers Africa. Do you also want to estimate the precipitation in Africa? Please explain the reason of area extension.

L287 “methods is beneficial for those areas in general” mean your methods works especially in the Himalayas and Zagrous Mountains or mountainous areas in general? Why in general?

L288 “gauge stations are especially sparse, such as South-east Asia and central part of Africa” Such crude descriptions should be avoided. Where is the central part of Africa? There are dense gauge networks even in Asian countries.

After L253 Chapter 4 and 5 must be revised carefully after the revision according to the former comments. Exclusion of North America, Australia and Arabian Peninsula is excused in the ending part of the conclusion; however, this way is very strange. You need to mention the target areas in the beginning with reasons.

Citation: https://doi.org/10.5194/egusphere-2024-960-RC1
- AC1: 'Reply on RC1', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached. The supplemental PDF file would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC1
- AC2: 'Reply on RC1', Yuka Muto, 17 Jul 2024
  
  Please find the supplemental PDF attached to this reply.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC2
RC2:
'Comment on egusphere-2024-960', Anonymous Referee #2, 20 May 2024

Precipitation is the most significant and dynamic variable linking to atmospheric circulation in climate and weather studies and a vital component of the water cycle. Estimate of precipitation in accuracy are important not only for the study of climate trends and variability, but also for the management of water resources and weather, climate, and hydrological forecasting. This study uses the LETFK algorithm to estimate global daily precipitation by integrating rain gauge data and ERA5 reanalyzed precipitation dataset. Specifically, the ERA5 precipitation dataset provides the initial guess and its error covariance, and the NOAA CPC global rain gauge observation data updates the prior estimate to obtain the analyzed precipitation. By comparisons to the existing precipitation dataset (i.e., NOAA CPC, APHRODITE and GPCC products), the analyzed precipitation shows superior accuracy, particularly in mountainous and rain-gauge-spare regions.
The manuscript is well written, however, there are major concerns below raised by the reviewer that necessitate addressing before acceptance. Please also refer to the attachment for specific comments.
1. The title should be revised to incorporate the ERA5 dataset, given its large contribution to the improved estimate, if the reviewer understands correctly from the author. As such, the reviewer wonders if the method proposed by the author can also enhance the precipitation field on recent periods, furthermore, the other fields (e.g., soil moisture) from ERA5 datasets? Could the author provide a brief discussion on the applicability of this method in the “discussion” section?
2. In the second paragraph of “Abstract”, could the author add the results of comparisons to the APHRODITE and GPCC products to support the author’s demonstrations that the method of this study is shown to be particularly beneficial for mountainous or rain-gauge-sparse regions.
3. In section 2.1.2 regarding to LETFK, the author mentioned that the parameterization of error covariance of observations are based on preliminary sensitivity experiments. Could the author briefly introduce the experiments? The corresponding result can be placed in the supplementary materials. Additionally, the review also suggest making plots of spatial and temporal error distribution (may put it in the supplementary materials), so the reader can further understand the observation error better and evaluate the improved estimate.
4. Clarification is needed in section 2.1.2 regarding whether parameter values such as 10 days, 7 days, 2*sqrt(10/3), 1000 km and 10 are optimal for this case or are they generic values in widely-used sense? It would be valuable to discuss the sensitivities of data assimilation results to the variations in these parameters in the “Discussion” section?
5. In section 2.1.2, could the author cite the source for Equation (5) and give a more detailed explanation? The review suggest including figures depicting temporal and spatial distribution of the localization function L(d) for reference in the supplementary materials.
6. The author declared that the orographic effects considered in the EAR5 results in the superior performance of the analyzed precipitation on the mountainous regions. It would be better if the author add a short description of the interpolation method and ancillary data (e.g., especially whether the elevation data is included) used in the interpolation of CPC, GPCC and APHRODITE products, as such, the reader can get insights on author’s declaration.
7. The reviewer proposes integrating the ‘Discussion’ section into the “Results” section, as it shows the results of comparisons between LETKF_est with the existing datasets. Furthermore, the content in the third paragraph in the “Conclusions” section could be discussed in the ‘Discussion’ section in a more detailed way.

Citation: https://doi.org/10.5194/egusphere-2024-960-RC2
- AC3: 'Reply on RC2', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC3
- AC4: 'Reply on RC2', Yuka Muto, 17 Jul 2024
  
  Please also fined the supplemental PDF file attached to this reply, which would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC4
RC3:
'Comment on egusphere-2024-960', Anonymous Referee #3, 07 Jun 2024

Review of
Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation
by Yuka Muto and Shunji Kotsuki

Overall description and assessment
The Authors present a manuscript addressing a challenging task of producing global fields of daily precipitation over an extended historical period. The paper explores a new idea, namely an implementation of a well-established data assimilation methodology, LETKF, for the generation of those global gridded precipitation fields based on sparse in-situ observations.
The motivation behind the presented study is clearly explained, the methodology correctly referenced, the results are illustrated and analysed. Comparisons to the existing NOAA CPC estimates obtained using OI show superior performance of the results obtained by the Authors, especially in the mountainous and data sparse regions. Possibilities for further improvements which could be addressed in future studies are outlined.
The paper is concise and well-organised. The plots are neat, they nicely summarise the results and support conclusions.
I recommend the manuscript for publication as soon as the minor issues listed below have been addressed.

Minor comments
A. In the introduction, possibly in line 26, right after the first sentence of this section I propose to insert an additional one emphasizing the importance of global gridded precipitation fields for validation of data assimilation (for example data assimilation of space borne lightning observations would lead to forecast precipitation fields that one may want to compare with gridded precipitation fields) as well as climate studies.
B. Line 52 in the introduction: It is not clear to me how EnDA is used to obtain climatological covariances. Based on caption in Fig.1 would you say that your covariances are climatological but date-specific. Maybe 'daily climatological covariances'?
C. Line 115: I am puzzled by sqrt(10/3) in the formula expressing L(d)? Is there a particular reason for such a choice? I mean, I would understand 3 sigma, or possibly also 2 sigma, but I am puzzled by the value of the constant the Authors used. There is essentially nothing wrong with such a choice if it serves the purpose but I have been wondering if there was a justification for it.
D. Section 2.2.2. Could the Authors elaborate on how the Kendall's coefficient has been computed? What criteria do you use to rank the precipitation fields in the 3 analysed data sets and in the APHRODITE_gauge? When you talk about concordant/discordant correlations do you mean their sign?
E. Line 212: regarding Zagros Mountains, looking at your plots, it seems to me that you do not really have rainfields reconstructed in this particular region, which is between the Caspian Sea and the Persian Gulf. I think that the coloured region on your map is in Kazakhstan, Uzbekistan and Turkmenistan (Turan Depression?). I also think that your differences with respect to GPCC are more significant over South America. By the way, Zagros Mountains are correctly identified in Fig.9
F. Section 3, line 228 in the caption of Fig. 4: What does the ratio really represent? I mean what do the bins refer to?
G. Figs 5 and 6: I do not want to add to your work but since GPCC is a reference, it would be better to name the plots CPC_est vs. GPCC and LETKF_est vs. GPCC. In Fig. 8 I would also rather say CPC_est versus APHRODITE_gauge and LETKF_est versus APHRODITE_gauge. Not sure if you need to replot the figures for that and how difficult it is for you at this stage.
H. Line 250, 251: Figure 6 legend and caption. There is an inconsistency between the legend: dark-red circles represent low latitude and the caption: dark-red circles represent mid- and high-latitude regions. In addition, please also check if the statements in lines 236-241 are correct

Editorial remarks
line 9: 'understanding of' better than 'understanding on'
line 13: estimation for each date
line 26: 'they are' better than 'it is'
line 36:maybe 'important' or 'valuable' instead of 'demanding'; or, maybe, 'in demand'
line 53: 'NWP-based precipitation records'; is the word 'record' employed here best as it implies observational information; maybe 'NWP-based precipitation fields' would work better?
line 59: 'in comparison' better than 'with comparison'
line 84: 'Forecasting' instead of 'Forecast'
line 94: 'covariances' or 'covariance matrices' rather than covariance; I also think the Authors want to say 'scalars'
line 111: I think you mean ensemble members
line 112: better: 'requires localization'
line 116: be careful how you phrase it because the way it is written now implies dividing by zero outside of the radius, which is probably what you are effectively doing; but ii would be better to state that outside of the 'impact' radius there is no influence from the observation by effectively setting r to infinity; or something of that sort
line 119: better say: 𝑑ini 𝑚𝑎𝑥 in km?
line 119: ' ... followed by setting ...'
line 161: 'on a monthly basis' better than 'in a monthly basis'
line 185: 'below' better than 'under'
lie 191: is 𝑥_𝑟𝑒𝑓_𝑖_,_𝑡is a typo which should read 𝑥_{𝑔𝑝𝑐𝑐}_𝑖_,_𝑡 instead?
line 200: remove in before year
line 205: remove 'in'
line 205: better 'are illustrated' than 'is illustrated'
line 212: 'Indochinese Peninsula' or 'Indochina'
line 217: subplot (lower case)
line 255: ‘dynamically consistent’ rather than ‘dynamically guaranteed’
line 271, 272: lowercase subplot better (two occurrences)
line 275-276: I would phrase it differently as the Authors analyse one specific example of the LETKF_est in the mountainous areas, even if it is a significant one. I would rather say: 'Using the examples of the Himalayas, we investigate whether the precipitation of LETKF_est is more accurate than that of CPC_est around mountainous areas' or something of that sort
line 280: Using the same argument, I would probably skip 'in general' at the end of the sentence
line 288: maybe Southeast Asia for consistency with line 258
line 314: 'took advantage' instead of 'took the advantage' and 'dynamically consistent' instead of 'dynamically guaranteed' and 'background error covariance'
line 321: 'is known to diverge from Gaussian'
line 351: 'numerical' one word

Citation: https://doi.org/10.5194/egusphere-2024-960-RC3
- AC5: 'Reply on RC3', Yuka Muto, 17 Jul 2024
  
  We are very grateful to the referees for her/his careful reviews and kindly giving us valuable and constructive comments and suggestions that we have generally accepted. We provide our point-by-point responses in the file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC5
- AC6: 'Reply on RC3', Yuka Muto, 17 Jul 2024
  
  Please find the supplemental PDF file attached to this reply, which would also be useful to check revisions and their corresponding comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-960-AC6

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (02 Aug 2024) by Bob Su

AR by Yuka Muto on behalf of the Authors (02 Aug 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (14 Aug 2024) by Bob Su

RR by Anonymous Referee #3 (13 Sep 2024)

RR by Anonymous Referee #2 (13 Sep 2024)

ED: Publish subject to minor revisions (review by editor) (23 Sep 2024) by Bob Su

AR by Yuka Muto on behalf of the Authors (25 Sep 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (13 Oct 2024) by Bob Su

AR by Yuka Muto on behalf of the Authors (19 Oct 2024)

Journal article(s) based on this preprint

17 Dec 2024

Estimating global precipitation fields by interpolating rain gauge observations using the local ensemble transform Kalman filter and reanalysis precipitation

Yuka Muto and Shunji Kotsuki

Hydrol. Earth Syst. Sci., 28, 5401–5417, https://doi.org/10.5194/hess-28-5401-2024,https://doi.org/10.5194/hess-28-5401-2024, 2024

Short summary

Yuka Muto and Shunji Kotsuki

Viewed

Total article views: 741 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
552	139	50	741	27	26

HTML: 552
PDF: 139
XML: 50
Total: 741
BibTeX: 27
EndNote: 26

Views and downloads (calculated since 23 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	109	33	7	149
May 2024	235	38	8	281
Jun 2024	36	15	6	57
Jul 2024	65	22	18	105
Aug 2024	37	6	7	50
Sep 2024	24	8	4	36
Oct 2024	11	5	0	16
Nov 2024	30	5	0	35
Dec 2024	5	7	0	12

Cumulative views and downloads (calculated since 23 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	109	33	7	149
May 2024	235	38	8	281
Jun 2024	36	15	6	57
Jul 2024	65	22	18	105
Aug 2024	37	6	7	50
Sep 2024	24	8	4	36
Oct 2024	11	5	0	16
Nov 2024	30	5	0	35
Dec 2024	5	7	0	12

Viewed (geographical distribution)

Total article views: 746 (including HTML, PDF, and XML) Thereof 746 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 17 Dec 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1905 KB)
Metadata XML

Short summary

It is crucial to improve global precipitation estimates for understanding water-related disasters and water resources. This study proposes a new methodology to interpolate global precipitation fields from ground rain gauge observations using ensemble data assimilation and the precipitation of a numerical weather prediction model. Our estimates agree with independent rain gauge observations better than the existing precipitation estimates, especially in mountainous or rain-gauge-sparse regions.


Total:	0
HTML:	0
PDF:	0
XML:	0

Estimating global precipitation fields from rain gauge observations using local ensemble data assimilation

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)