the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep learning for non-precipitation radar echo identification: Comparative evaluation of polarimetric, spatial, and temporal information
Abstract. Accurate identification of non-precipitation echoes (NPEs) in weather radar observations requires effective use of polarimetric signatures together with spatiotemporal structure. Here we present a unified deep-learning framework to quantify the independent and synergistic contributions of model architecture, dual-polarization variables, and short-term temporal evolution to NPE identification. Using data from the Guangzhou S-band dual-polarization radar, we conduct controlled comparative experiments with two representative architectures: a pointwise multilayer perceptron (MLP) and a Transformer-based Swin U-Net that explicitly learns spatial context. We further perform ablation experiments across single- versus dual-polarization inputs and single-volume versus two-volume inputs. Results show that architecture-driven spatial-context learning is the dominant factor: Swin U-Net consistently outperforms the pointwise MLP under all input settings. On a high-confidence test subset, for example, the Critical Success Index (CSI) increases from 0.887 for the dual-polarization MLP to 0.950 for the dual-polarization Swin U-Net. Dual-polarization variables provide essential microphysical constraints and substantially improve class separability, particularly for pointwise classifiers. Incorporating two consecutive volumes further improves performance by capturing short-term echo evolution, with larger gains for the MLP than for Swin U-Net. The best-performing configuration, combining Swin U-Net with dual-polarization and two-volume inputs, achieves a CSI of 0.953 on the high-confidence test subset. Notably, the Swin U-Net using only the reflectivity factor (ZH) as input retains strong skill (CSI = 0.927), indicating that spatial-context learning can partially compensate for missing polarimetry and thus providing a practical pathway for quality control of legacy single-polarization archives.
- Preprint
(8988 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 15 Jul 2026)
- RC1: 'Comment on egusphere-2026-590', Anonymous Referee #2, 28 Jun 2026 reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 96 | 27 | 8 | 131 | 5 | 6 |
- HTML: 96
- PDF: 27
- XML: 8
- Total: 131
- BibTeX: 5
- EndNote: 6
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This preprint carries out controlled comparative experiments between pixel-wise MLP and Swin U-Net to disentangle how polarimetric observables, spatial context and short-term temporal radar evolution jointly shape non-precipitation echo discrimination. The research addresses a long-standing operational demand for radar data quality control and historical single-polarization archive reprocessing, with well-designed ablation groups, multi-type precipitation case demonstrations and dual test sets featuring clean and flawed labels to support quantitative evaluation via CSI, POD and FAR. The core finding that spatial feature learning serves as the primary driver of classification accuracy is physically consistent, and the discovery that reflectivity-only Swin U-Net surpasses dual-polarization MLP delivers valuable practical implications.
Even so, several substantive scientific shortcomings limit the manuscript’s depth and generalizability. The entire dataset originates from a single Guangzhou S-band radar without supporting cross-regional or cross-terrain validation; temporal modeling is confined to two consecutive radar volumes, leaving the potential of longer scan sequences unexplored; spatial attention mechanisms of Swin Transformer are not visualized or interpreted, making the model’s multi-scale feature extraction a black box; computational overhead critical to real-time operational workflows is not quantified; the adverse effects of label noise on evaluation metrics are only qualitatively illustrated rather than quantitatively decomposed; performance metrics lump all non-precipitation echo subtypes together without separate statistics for clutter, anomalous propagation and biological echoes; no classic U-Net segmentation baseline is included to confirm performance increments brought by windowed self-attention. Major revisions are required to resolve these deficiencies prior to formal consideration.
Major Comments:
1)All experimental data are sourced merely from a single S-band radar in Guangzhou, such a narrow observational basis risks rendering the study’s core conclusions case biased and one-sided. It is recommended to partition the research into different seasons to calculate stratified verification metrics, or thoroughly discuss how seasonal disparities may alter the accuracy gap between the two model architectures if supplementary radar data cannot be accessed.
2) Only two sequential radar scans are fed into the model to capture temporal variation. It is suggested to extending input sequences to three or four consecutive volumes for additional ablation tests which would help quantify marginal accuracy gains from prolonged temporal context, while splitting metrics by clutter echo categories can further reveal the unique value of temporal information for different artifact types.
3) Though the manuscript attributes superior classification performance of Swin U-Net to multi-scale spatial self-attention, no attention map visualization is provided to verify what spatial textures the model prioritizes during decision-making. Generating attention heatmaps for representative precipitation and non-precipitation cases and contrasting learned spatial features with conventional handcrafted texture indicators will enhance the interpretability of the Transformer-based framework.
4) No quantitative comparisons of inference speed, GPU memory consumption and total parameter volume across all eight model configurations are presented, yet computational efficiency is a key concern for operational radar platforms. A supplementary summary table documenting such hardware and latency metrics is suggested, alongside preliminary trials of lightweight distilled Swin variants to balance classification precision and real-time processing capability.
5) Test Dataset B contains mislabeled samples to assess model robustness, yet the current analysis cannot distinguish performance degradation induced by intrinsic model defects versus label inaccuracies. Injecting artificial label noise at varying proportions into the high-confidence Test Dataset A enables systematic quantification of each model’s tolerance to labeling errors.
6) All non-precipitation echoes are treated as a unified binary target without differentiated assessment of distinct artifact categories, which weakens the targeted guidance for operational quality control workflows. Stratifying CSI, POD and FAR values by ground clutter, anomalous propagation streaks and biological echoes will clarify which model-input combination optimally mitigates each interference source.
7) The paper only compares MLP and self-built Swin U-Net without benchmarking against widely adopted radar segmentation backbones. Incorporating a standard U-Net into ablation experiments can isolate performance improvements stemming from Swin Transformer windowed self-attention, rather than generic encoder-decoder segmentation structures.
Minor comments:
1) Rectify distorted axis labels and misaligned longitude tick marks in all case figures, and supplement explicit feature channel dimension annotations for the unlabeled intermediate layers in Figure 1(a).
2) Fix textual typographical errors such as “filed” to “field”, trim repetitive boilerplate language across figure captions, and briefly define the drop-path regularization term for readers with limited deep learning background.
3) Round all CSI, POD and FAR values in Tables 2 and 3 to three decimal places for consistent numerical precision.