the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A neural network-based observation operator for weather radar data assimilation
Abstract. In three-dimensional variational data assimilation (3DVar) for numerical weather prediction (NWP), the observation operator H plays a central role by mapping model state variables to an observation equivalent. For weather radar, however, specifying H is particularly challenging: reflectivity is a nonlinear, microphysics-dependent diagnostic quantity that only indirectly relates to the model’s prognostic variables, making traditional parameterised radar operators complex, regime-dependent and difficult to tune.
In this study, we propose a neural-network (NN)-based observation operator for radar reflectivity and apply it within a 3DVar data assimilation (DA) framework. Using five years (2019–2023) of radar reflectivity data from the Lisca radar and 4.4 km-resolution short-range forecasts from ALADIN model over Slovenia, we train a convolutional encoder–decoder neural network to map model temperature, humidity, horizontal wind components and surface pressure fields to radar reflectivity. Across independent test cases spanning clear-sky, stratiform and convective regimes, the NN-based operator accurately reproduces the spatial structure and intensity of observed reflectivity, relying primarily on the model state in the vicinity of the observation point. In the extreme precipitation case, which caused widespread floods in Slovenia on August 4, 2023, assimilating the full radar disc reduces the domain-averaged reflectivity root-mean-square error (RMSE) from 5.99 dBZ to 3.47 dBZ and improves the alignment between the analysed and observed convective bands.
Embedded within 3DVar, the Jacobian of the NN observation operator allows radar reflectivity observations to inform model state variables, producing corresponding analysis increments. The proposed NN radar observation operator offers a flexible alternative to traditional parameterised radar operators for improving convective-storm forecasts.
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-77', Anonymous Referee #1, 24 Feb 2026
-
RC2: 'Comment on egusphere-2026-77', Frederic Fabry, 02 Mar 2026
Summary evaluation
I was intrigued by the premise and results of the manuscript EGUSphere-2026-77v1 entitled “A NEURAL NETWORK-BASED OBSERVATION OPERATOR FOR WEATHER RADAR DATA ASSIMILATION” by Stefanelli et al.. The authors have used a very unusual approach to obtain expected radar observations given model fields, a necessary first step towards radar data assimilation. Instead of trying to simulate the radar reflectivity from the model fields, they devised an approach that learned how to reproduce what an actual radar observes from model fields by training a machine-learning based machine to do so. They then demonstrated the use of the machine by showing that radar data could be used to find new model states that would reduce the mismatch between simulated and observed radar data.
That stated, I believe the authors focused too much of their attention on studying the more expected results of their work while not critically analyzing the much more interesting and novel ones. The norm in observation operators H() is to only use model variables x to devise H. You chose the unusual path to use a combination of observations y and model variables x to determine H. This has a set of advantages (fewer assumptions on microphysics and scattering, natural ability to simulate many radar artifacts…) and pose additional challenges (decoupling between the real world shaping observations and the simulated world shaping model fields due to initial condition and model errors) compared to the traditional approach. And while a few of the advantages were mentioned in the introduction, none of the challenges were, and they were not reflected upon after the Introduction. And while the machine was trained with 4 years of data with 2023 being used for testing, I only found results for four radar maps. Critically analyzing the performance of your indirect approach to devising H() should have been given considerably more emphasis, as the follow-up results, and your planned future work, are, in my opinion, expected: Once you have an H function, it is a trivial result that you can use 3-D Var to modify x and find a new one whose H(x) would be a better match to y than the original H(x). Given that the key novelty of your approach now and in the future rests on your ability to find a good H() using a combination of model and radar data, a better critical analysis of its performance would have been expected. Ideally, a comparison with a more traditional H() would be best, but I’d be happy with some longer-term statistics (echo coverage, biases, standard errors, etc.).
Because I am uncertain that these and other changes can be made in the time constraints associated with a conditional acceptance, I will recommend rejection of the current manuscript in its current form. But I encourage you to make the necessary modifications to your manuscript as your approach has the value of being much more original than many others.
Specific comments
i) The title does not reveal the key novelty, namely that you are devising and evaluating a “measurement-based observation operator” as opposed to the more traditional “simulation-based observation operator” (I find myself having to invent new terminology to express what you have designed; if you can find a better way to express this, please do so!). One could devise a NN-based OO by having it learn to imitate a simulation-based OO, and your current title would apply to such a work; but this is not what you did. It would be better if your title would better express or describe your unusual approach.
ii) Methodology, 2nd sentence: I propose the following edit, if you believe it is still correct: “The NN observation operator is trained to determine the expected reflectivity averaged over the previous hour from fields of temperature (t), horizontal wind components (u and v), relative humidity (r), at four pressure levels (975 hPa, 925 hPa, 850 hPa, and 800 hPa) and three surface variables, 2m temperature (t2m), 2m relative humidity (r2m) and mean sea level pressure (msl) from the ALADIN numerical weather prediction model (hereafter ’ALADIN model outputs’ refers to the listed fields)".
iii) Last paragraph of 2.1, also related to ii: “The quality-controlled radar observations were subsequently summed into 1-hour intervals to match the temporal resolution of the ALADIN model outputs”. First, this choice of using hourly averages or sums of reflectivity is interesting per se, because you could have chosen to use the radar data closest to the hour, and the temporal resolution would still be matched. Why that choice? I personally believe it is a good idea, but I believe it needs to be articulated here. Then, a few more details are required: a) How is the sum done (in dBZ, in Z, in R)? b) Are the radar maps shown in all figures the hourly sums, the hourly averages, or something else? c) Is the 13.5-dBZ thresholding done before or after the summing or averaging? d) How do you handle the NaN resulting from that thresholding in the summing (if the thresholding is done before) and in the training of the machine?
iv) Second paragraph of 2.4: “All such timesteps are extracted and augmented (Aggarwal et al., 2018) by rotating both the input ALADIN fields and the corresponding radar reflectivity by 90°, 180°, and 270° around the vertical axis”. I presume you changed what was u-winds and v-winds in response to that rotation. More importantly, this rotation will hamper your radar field statistics (average, standard deviations) and make it less representative by combining points occurring at the right geographical location with three times more points occurring at another geographical location and having different averages and standard deviations, introducing location-dependent biases. Can you justify this choice? Wouldn’t a simple repetition (or weight increase in the loss function) or a minor displacement of a model pixel (4.3 km) in each direction be a better choice?
v) Figure A.11 (which should be A.1) could use units to x and y. Kilometers? Grid points?
vi) Errors, differences, bin sizes, and standard differences in reflectivity should have units of “dB” or “dB(Z)” (better choice), but not “dBZ”. I counted 16 corrections to make on the text and figure captions, plus a few more on the figures themselves.
Citation: https://doi.org/10.5194/egusphere-2026-77-RC2
Data sets
LISCA-ALADIN HNN Marco Stefanelli https://zenodo.org/records/17880623
Model code and software
3DVar Neural Network-Based Observation Operator Marco Stefanelli https://zenodo.org/records/17898084
3DVar for Neural Network-Based Observation Operator Marco Stefanelli https://zenodo.org/records/17899025
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 154 | 0 | 3 | 157 | 0 | 0 |
- HTML: 154
- PDF: 0
- XML: 3
- Total: 157
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review: A neural network-based observation operator for weather radar data assimilation by Stefanelli et al.
In this paper, the observation operator for radar data assimilation is replaced by a neural network that maps state variables such as temperature, wind, and relative humidity at different vertical levels with the reflectivity. The machine-learning-based observation operator is coupled with a 3DVar data assimilation system, and observation impact experiments are performed in order to provide a preliminary evaluation of this observation operator.
The topic is novel and is aligned with the current trend of merging data assimilation with machine learning. However, there are aspects of the obtained results that are not very clear and that should be discussed in more detail. There are also some decisions taken in the design of the experiments and the methodology that deserve further discussion. My recommendation is that the paper should undergo major revisions before being considered for publication in GMD.
Major comments:
Minor points