the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comprehensive Inter-comparison of Generative AI Models for Super-Resolution Precipitation Downscaling Across Hydroclimatic Regimes
Abstract. High-resolution precipitation information is essential for hydrologic modeling, flood forecasting, and climate-risk assessment, yet global weather and climate models operate at spatial resolutions too coarse to resolve storm structure, intermittency, and extremes. Deep-learning-based statistical downscaling provides a computationally efficient alternative to dynamical downscaling, but deterministic convolutional neural networks often yield overly smooth predictions and underestimate fine-scale variability and extreme events. Generative deep-learning models, including generative adversarial networks and diffusion models, offer a promising alternative by enabling stochastic downscaling and explicit representation of uncertainty. This study presents a systematic, hydrologically oriented comparison of three representative deep-learning frameworks for precipitation super-resolution: a convolutional U-NET, a conditional Wasserstein GAN (WGAN), and a conditional denoising diffusion probabilistic model (DDPM). Using a perfect-model experimental design based on ERA5-Land precipitation over distinct hydroclimatic regions of the United States, we evaluate performance under 8-times (8×) and 16-times (16×) downscaling tasks within a unified training and evaluation framework. Models are evaluated using diagnostics that examine precipitation distributions, wet–dry occurrence, extremes, spatial structure, storm morphology, mass consistency, ensemble variability, and computational cost. All three models preserve aggregate rainfall mass despite the absence of explicit physical constraints. Differences arise primarily at fine spatial scales and in the representation of extremes, spatial dependence, and uncertainty. U-NET provides stable and computationally efficient predictions but smooths small-scale variability. WGAN improves fine-scale structure and heavy-tail behavior at the expense of increased noise. The DDPM yields physically coherent ensemble members and an explicit representation of uncertainty, at a substantially higher computational cost.
- Preprint
(9570 KB) - Metadata XML
-
Supplement
(9814 KB) - BibTeX
- EndNote
Status: open (until 24 Apr 2026)
- RC1: 'Comment on egusphere-2026-861', Anonymous Referee #1, 09 Mar 2026 reply
-
CEC1: 'Comment on egusphere-2026-861 - No compliance with the policy of the journal', Juan Antonio Añel, 26 Mar 2026
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYour manuscript does not contain a "Code Availability" section providing all the code used for your work. I am sorry to have to be so outspoken, but we can not accept this, it is forbidden by our policy, and your manuscript should have never been accepted for Discussions given such lack of compliance with the policy of the journal. Our policy clearly states that all the code and data necessary to replicate a manuscript must be published openly and freely to anyone before submission.
Additionally, you do not provide the training data used for your work (you simply cite a paper and a web for ERA5 which is not trusted repository for long-term archival and we can not accept) nor the output files resulting from it.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Therefore, you have to reply to this comment as soon as possible with the information for the repositories (link and a permanent identifier for it (e.g. DOI); also, please, check our policy for the characteristics of the accepted repositories) containing all the code and data that you use to produce and replicate your manuscript. We cannot have manuscripts under discussion that do not comply with our policy.
Please, reply to this comment with the new 'Code and Data Availability’ section, which must also be modified in any new version of your manuscript to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2026-861-CEC1 -
AC1: 'Reply on CEC1', Shivam Singh, 26 Mar 2026
reply
We sincerely thank the Editor for pointing out this important issue. We apologize for this oversight and fully acknowledge the importance of ensuring reproducibility and long-term accessibility of the code, data, and outputs associated with our manuscript.
We are currently preparing a complete public archive of the materials required to reproduce the study, including:
- the code used for model training, evaluation, and figure generation,
- the processed data and/or reproducible preprocessing workflow used in the study, and
- the relevant output files used in the analysis and manuscript figures.
These materials are being organized in a suitable public repository with long-term archival support and a permanent identifier (DOI), in accordance with the GMD Code and Data Policy. We will update this discussion as soon as the repository deposition is finalized and will also revise the manuscript accordingly to include a complete Code and Data Availability section and the corresponding references.
Citation: https://doi.org/10.5194/egusphere-2026-861-AC1 -
AC2: 'Reply on CEC1', Shivam Singh, 02 Apr 2026
reply
Dear Editor,
Thank you for your comment and for highlighting the importance of openly sharing code and data to ensure reproducibility and compliance with the journal’s policy. We have now made the materials required for reproducibility publicly available and have updated the manuscript accordingly. Specifically:
- the full code repository used for data downloading, preprocessing, model training, inference, and evaluation is publicly available on GitHub and archived on Zenodo;
- the processed dataset splits and selected trained model weights used for the main experiments and figure generation have been archived separately on Zenodo;
- the revised manuscript will include updated Code availability and Data availability sections reflecting these resources.
The updated statements are provided below for your reference:
Code availability
All scripts used for data downloading, preprocessing, model training, inference, and evaluation are openly available in the public GitHub repository: https://github.com/shivamsinghhada/precipitation-downscaling. The exact version of the code used in this study is permanently archived on Zenodo at: https://doi.org/10.5281/zenodo.19297906.Data availability
The raw ERA5-Land precipitation data used in this study are publicly available from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) (Muñoz Sabater, 2019). ERA5-Land provides global land-surface variables at approximately 9 km spatial resolution and hourly temporal resolution and can be accessed at: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=download.The processed dataset splits and selected trained model weights used for the main experiments and figure generation are archived separately on Zenodo at: https://doi.org/10.5281/zenodo.19324377. These archived materials include the processed 8× and 16× downscaling datasets, selected trained U-Net, WGAN, and DDPM model weights, and the associated inference scripts required to reproduce the main analyses presented in this manuscript.
Raw ERA5-Land data are not redistributed here because they are already publicly available from ECMWF / CDS. Instead, the full preprocessing workflow required to reproduce the derived datasets is provided in the archived code repository.
Sincerely,
Shivam Singh
(on behalf of all co-authors)Citation: https://doi.org/10.5194/egusphere-2026-861-AC2
-
AC1: 'Reply on CEC1', Shivam Singh, 26 Mar 2026
reply
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 232 | 135 | 20 | 387 | 465 | 11 | 24 |
- HTML: 232
- PDF: 135
- XML: 20
- Total: 387
- Supplement: 465
- BibTeX: 11
- EndNote: 24
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review for the GMD
This manuscript presents a timely and meaningful intercomparison of three “most used” gen AI models for precipitation super-resolution downscaling. The study addresses an important problem at the interface of atmospheric science and machine learning, and the effort to compare multiple model classes within a unified framework is valuable. At the same time, several aspects of the manuscript require substantial clarification and strengthening before the conclusions can be fully supported. Addressing these issues would improve the scientific rigor and impact of the paper.
Major Comments :
1.The authors state that the 10-member ensemble is generated from 10 independently trained models initialized with different random seeds. This procedure primarily reflects epistemic uncertainty associated with parameter estimation and training variability. However, the central theoretical advantage of conditional generative models is that for a given low-resolution input, they can generate a distribution of plausible high-resolution outputs through stochastic sampling. At present, the manuscript uses one prediction from each independently trained model and interprets the resulting spread as ensemble uncertainty, which is not equivalent to sampling the conditional output distribution of a single trained generative model. The authors must separate these two uncertainty sources explicitly. In addition to the current analysis, they should report results from repeated stochastic sampling using a single trained model, preferably the best-performing checkpoint, and compare that spread with the spread arising from different training seeds. This distinction is essential for a correct interpretation of the ensemble results.
2.The manuscript refers in several places to ERA5-Land as “observation”. This terminology is not correct. ERA5-Land is a reanalysis-based product, not a direct observational dataset. Since the study does not use in situ station obs, radar, satellite retrievals, or soundings as reference truth, the manuscript should consistently refer to ERA5-Land as a reanalysis or reanalysis-based target, not as observation. You could read this paper to obtain the detailed reason.
https://doi.org/10.1175/BAMS-D-14-00226.1
3.The use of min-max normalization may help stabilize training, but it raises an important concern for precipitation, especially for extreme events. Min-max scaling bounds the normalized target by the range seen in the training data, which may hinder robust extrapolation to unprecedented values. This issue is especially relevant for climate-related downscaling and extreme precipitation, where out-of-sample events may exceed the historical training maximum. This issue may also be relevant to the behavior shown in Figure S11, where DDPM with T=100 approaches the upper bound and cannot grow freely. The authors should discuss this limitation explicitly and test at least one alternative normalization strategy such as quantile normalization or z-score normalization over wet pixels only, reporting whether the normalization choice materially changes the extreme-value results.
4.Precipitation is not a typical continuous variable like temperature, pressure, or geopotential height. It is sparse, intermittent, highly skewed, and often better represented by zero-inflated or Tweedie-like distributions. For this reason, the loss-function choice deserves much more discussion than it currently receives. The authors should discuss why their selected losses are appropriate for precipitation specifically, and whether distribution-aware losses could improve tail behavior and wet-day occurrence. Recent studies suggest that distributional losses can be beneficial for precipitation prediction and downscaling. At minimum, this should be discussed more clearly. Ideally, the authors would include a sensitivity test or ablation experiment.
https://arxiv.org/html/2509.08369
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2024GL111828
5.Figure 5 appears to compare model predictions against an upsampled low-resolution field rather than the native high-resolution ERA5-Land target, given the clustering of identical reference values on the x-axis. If so, the comparison is not appropriate and the figure needs to be redone using the actual high-resolution target field. If this interpretation is incorrect, the authors should clarify exactly how the reference field in Figure 5 was constructed.
6.The spatial lag analysis in Fig. 6 is not the most informative way to evaluate scale-dependent structure for precipitation super-resolution. A spatial power spectrum would be more standard and more physically interpretable. The authors should add a spectral analysis to the main paper. The current spatial lag figure could be moved to the supplement.
7.All three models condition only on coarse-resolution precipitation. Precipitation is not a self-contained variable. For instance, topography is a key control on high-resolution precipitation structure, especially in regions where orographic effects are important. The manuscript does not sufficiently discuss the implications of omitting terrain height or other static geographic information as conditioning variables. The smoothness seen in the deterministic baseline may partly reflect the lack of physically informative conditioning, rather than only the architecture itself. This point is also relevant for the generative models. One of the strengths of conditional DDPM is the flexibility with which conditioning information can be incorporated, including modulation-based conditioning (like FiLM used here and AdaGN etc.). The authors should discuss more directly whether including terrain or other physically meaningful covariates could materially change the conclusions.
8.A plain UNet trained with a pointwise loss is well known to produce overly smooth outputs in super resolution tasks, so the observed contrast between UNet and generative models may partly reflect the baseline choice rather than an inherent limitation of deterministic approaches. The paper should justify this baseline more carefully or include at least one stronger deterministic baseline such as a sub-pixel convolution or PixelShuffle-based architecture.
https://arxiv.org/pdf/1609.05158
9.SSIM is a perceptual image metric designed for natural-image comparison based on luminance and contrast, and its physical meaning for sparse, intermittent precipitation fields is limited. SSIM should not be emphasized as a primary result and may be moved to the supplement. The Q-Q diagnostics currently in Figure S7 are more physically meaningful for a heavy-tailed intermittent variable and should be promoted to the main text.
10.Because the low-resolution inputs are constructed using a block-averaging operator, the mass conservation result in Section 5.2.2 is partly guaranteed by the experimental design and is less informative than the manuscript implies. This section should be shortened and some of the discussion moved to the supplement.
11.The manuscript compares a U-Net against generative models, but it does not include a simple interpolation baseline. That omission weakens the benchmark. At minimum, the authors should include one standard interpolation baseline so that readers can assess whether the deterministic neural model actually adds value beyond trivial reconstruction.
12.Precipitation has strong temporal autocorrelation because it is tied to evolving synoptic and mesoscale systems. Wet-spell duration, dry-spell duration, and multi-day persistence are among the most hydrologically relevant properties of any downscaled product. A model may match daily spatial structure while still failing to reproduce realistic persistence across time. The manuscript does not evaluate this sufficiently. The authors should either include temporal diagnostics that directly assess persistence behavior or clearly state that the current evaluation is not enough to establish hydrological usefulness.
Some minor issues :
1.Figures 3 and S3 should state the timestamp or time period shown. The figure captions should clearly indicate which date or sample is being plotted.
2.At line 126, the manuscript refers to one region as the “Pacific Northwest”. Based on the domain shown, this terminology appears inaccurate, since Utah and western Nevada are not usually considered part of the Pacific Northwest. It would be better to use “Northwest” unless the domain is redefined.
3.The manuscript states that the models are trained on the Central Plains and Northwest, while validation uses the Central Plains plus a subset of the Northeast, and the remaining Northeast samples are used for independent testing. This is understandable, but the exact fractions or sample counts should be stated explicitly in the main text.
4.The manuscript sets daily precipitation below 1 mm per day to zero and excludes days with fewer than 1 percent wet pixels. These choices may be reasonable, but the authors should report how many samples are removed by region and season and briefly discuss the potential impact on light-rain statistics and wet-day occurrence.
5.The manuscript states that the models are trained on the Central Plains and Northwest, while validation uses the Central Plains plus a subset of the Northeast, and the remaining Northeast samples are used for independent testing. This is understandable, but the exact fractions or sample counts should be stated explicitly in the main text.
6.The U-Net training description omits weight decay and the Adam beta values; since the WGAN uses non-default beta1=0.0 and beta2=0.9, any deviation from defaults for other models must also be explicitly reported. The DDPM section does not specify the optimizer, initial learning rate, or weight decay.