Enhancing weather radar data by removing non-meteorological echoes, using neural networks trained on synthetic weather data
Abstract. Meteorological weather radars are essential for atmospheric research, weather forecasting and aviation safety, but they often detect non-meteorological echoes from scatterers such as insects, birds, and ground clutter. These non-meteorological echoes can then lead to misinterpretations in quantitative precipitation estimation and hydrometeor classification, which cause difficulties for atmospheric research and weather forecasting. This paper introduces a novel AI-based approach to identify such non-meteorological echoes in polarimetric radar data using a convolutional neural network. More specifically, we utilize a so-called U-net, which relies on large amounts of labeled radar data for training. To address the challenge of acquiring labeled radar data consisting of meteorological and non-meteorological echoes, we generate synthetic training samples by combining preprocessed winter data (meteorological echoes) with cluttered summer data (non-meteorological echoes) provided by Deutscher Wetterdienst (DWD). After training on synthetic data, evaluation of the U-net approach on operationally measured radar data shows that it outperforms the state-of-the-art DWD classification algorithm overall. This is particularly evident in the preservation of precipitation signals at the boundaries of larger weather events.
This manuscript presents an interesting idea on how to address the lack
of expert-labeled data in classification of weather radar echoes. Using
this approach, the authors train a U-net model to classify
meteorological and non-meteorological echoes. The issue being addressed
is critical to producers and users of radar data, and the authors
achieve promising results with their model. However, the small amount of
data used to train and validate the model raise questions that should be
addressed in the manuscript.
# Specific comments
1) One major concern is the small amount of data used in the model
training. It appears that the dataset contains data from 3 hours,
plus some test data from a time period that is not mentioned. Even
if the dataset contains measurements from all 17 radars in the DWD
network, the small temporal windows covered by the dataset raise
serious concerns of the resulting model validity. Preferably, the
authors should increase the dataset size. If using more data for the
model training is not possible, at least the following issues should
be addressed with sensitivity tests or discussion in text:
1. How representative are the selected time periods of the
conditions they aim to represent? E.g. for cluttered summer
measurements it is worth to note that the appearance of insect
echoes can vary depending on temperature, diurnal cycle, and
annual cycle. Similarly, how representative are the selected
winter sweeps?
2. How representative are the scaled winter images of summer
precipitation?
3. Does the selection of the sweeps require some considerations? If
one were to attempt to repeat the study using different
measurements, how should one address the dataset selection?
4. When are the "experimentally measured mixed radar images"
measured? How were the images selected?
5. As far as I can tell, Figure 5d shows differential attenuation
that is unlikely to be present in any winter measurement and
thus would not appear in your training material. How does the
model perform for this case or other similar artefacts in summer
measurements that are not present in winter?
2) Creating synthetic images:
1. The process for selecting the scaling factor seem rather
arbitrary. When comparing the images, did you compare mean
values, max values etc?
2. How representative do you expect the scaling determined in the
described way to be over longer time periods? Is it impacted by
calibration differences etc?
3. Ideally, the scaling factor should be related to some physical
factor or explanation, e.g. differences between Z-R and Z-S
relations rather than subjective selection.
4. Assigning of UDR when creating the synthetic images: did you
test other approaches to assign the value, e.g. weighted sum of
the original images? Now, as far as can I follow, a radar gate
in the synthetic image has contributions of both the winter and
summer images in DBZH and ZDR but not in UDR? Do the resulting
UDR values in the synthetic images follow a similar distribution
(on their own and joint distributions with DBZH/ZDR) as
experimentally measured values?
3) There are also some concerns related to the model training and test
dataset construction:
1. There is no validation dataset. Typically, ML model training
should include a training dataset used to train the model,
validation dataset for hyperparameter selection and monitoring
training convergence (i.e., selecting the best model outcome
among all possibilities), and test dataset for independent
validation of the selected model. Since there is no validation
dataset, how is the training convergence monitored?
2. Given that the training and test datasets are temporally
overlapping, how was information leakage between the datasets
reduced? The test dataset being from a different radar site does
not automatically remove information leakage, as the
precipitation areas move and same precipitation could easily be
present in the measurements of multiple radars; we can also
expect that precipitation within one hour is correlated within
different areas inside Germany. Additionally, the measurement
range is said to be 150km; do the images from multiple radars
overlap?
3. It is unclear how are the two winter and summer measurements
used to a create synthetic image selected? Randomly, matching
time from start of the measurement period, something else? How
does this selection impact the dataset and model skill? Do you
only combine images from a single radar site?
4) Description of model training is incomplete:
1. How is the model training convergence monitored and how do you
decide if the training has converged?
2. Please list all relevant hyperparameters used in the training
(e.g. learning rate), and refer to any relevant libraries used
in the model implementation and training.
5) Issues related to data visualization:
1. The colormap of ZDR measurements should be limited to show only
the interval of interest, which the authors state to be around 0
dB to 20dB (not starting from -20dB)
2. It would be better to show the excluded radar gates with some
color that is visible to aid the reader in interpreting the
figures
3. I'm not sure if there is a need to repeat the range ring labels
in every image; the images would be less cluttered if those were
removed especially in the smaller images
6) I would appreciate more specificity on the description of radar
measurements in introduction. There is also some repetition in the
descriptions in the introduction and Section 2.1 that could be
reduced. Specific comments:
1. Lines 51-54: If talking about radar moments, it would be better
to name them, e.g. "compute so-called radar moments, such as
radar reflectivity representing the strength of the signal" etc.
2. Lines 65-66: I would interpret this to mean radar systems with
waveguide switches; how about radar systems that transmit H and V
polarizations simultaneously?
3. Lines 219-221: this seems repetitive
7) The description of the state-of-the-art method in section 3.2 is
confusing. It would be better to order the description so that steps
are described in the order that they are performed. For example,
paragraph starting on L609 should be after the eligible pixels are
first mentioned, and the paragraph starting on L626 should follow
them
8) Eq. 1: I assume $\theta$ denotes the azimuthal angle? This should be
mentioned in the text