the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Conditional diffusion models for downscaling & bias correction of Earth system model precipitation
Abstract. Climate change exacerbates extreme weather events like heavy rainfall and flooding. As these events cause severe socioeconomic damages, accurate high-resolution simulation of precipitation is imperative. However, existing Earth System Models (ESMs) struggle resolving small-scale dynamics and suffer from biases. Traditional statistical bias correction and downscaling methods fall short in improving spatial structure, while recent deep learning methods lack controllability and suffer from unstable training. Here, we propose a machine learning framework for simultaneous bias correction and downscaling. We train a generative diffusion model purely on observational data. We map observational and ESM data to a shared embedding space, where both are unbiased towards each other and train a conditional diffusion model to reverse the mapping. Our method can correct any ESM field, as the training is independent of the ESM. Our approach ensures statistical fidelity, preserves large-scale spatial patterns and outperforms existing methods especially regarding extreme events.
- Preprint
(11889 KB) - Metadata XML
-
Supplement
(15045 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-2646 - No compliance with the policy of the journal', Juan Antonio Añel, 24 Jul 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html.In your manuscript you do not provide repositories for the code and data used in your study. You simply provide a number of links, that in the case of the Zenodo repository for the code is empty, and for the data, only points out to main webpages that do not contain the specific data/variables used in your work. Also, you do not provide a repository with the output data. Therefore, the current situation with your manuscript is irregular, and it should have not been accepted in Discussions or for peer-review because of the above mentioned issues. Please, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we can not continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2646-CEC1 -
AC1: 'Reply on CEC1', Michael Aich, 30 Jul 2025
Dear Editor,
we apologize for the missing code and data uploads. We now made both public:
Code: https://github.com/aim56009/ESM_cdifffusion_downscaling/
Input data: https://doi.org/10.5281/zenodo.16610901
Model data: https://doi.org/10.5281/zenodo.16610050
Output data: https://doi.org/10.5281/zenodo.14849653
We will also adjust our 'Code and Data Availability' section accordingly.Best regards
Michael AichCitation: https://doi.org/10.5194/egusphere-2025-2646-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Jul 2025
Dear authors,
Unfortunately, again, your reply fails to comply with our policy. I would ask you to read it carefully before replying again with something that does not comply with it.
You have provided the code hosted in a git site. Git sites are not acceptable for scientific publication. Please, store your code in one of the repositories acceptable according to our policy.
Additionally, your implementation relies on the use of external software, a number of libraries. In this case, to assure the replicability of your work, please, clarify which is the version number of the libraries that you have used for your work, and the Python interpreter you use.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2646-CEC2 -
AC2: 'Reply on CEC2', Michael Aich, 31 Jul 2025
Dear Editor,
we apologize to provide the wrong code link, the code is available in zenodo under: https://doi.org/10.5281/zenodo.16629039. We also added the version numbers of the library requirements to run the code (requirements.txt).
Thank you for the correction and Best regards
Michael AichCitation: https://doi.org/10.5194/egusphere-2025-2646-AC2
-
AC2: 'Reply on CEC2', Michael Aich, 31 Jul 2025
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Jul 2025
-
AC1: 'Reply on CEC1', Michael Aich, 30 Jul 2025
-
RC1: 'Comment on egusphere-2025-2646', Anonymous Referee #1, 02 Sep 2025
This manuscript presents a conditional diffusion model framework for simultaneous bias correction and downscaling of Earth System Model (ESM) precipitation fields. The novelty lies in training the model exclusively on observational data by mapping both ESM and observations into a shared embedding space, where quantile mapping and noise injection help align distributions. The conditional diffusion model then reconstructs small-scale precipitation structures while preserving large-scale ESM patterns. The authors evaluate the method using ERA5 as the observational reference and GFDL-ESM4 as the test ESM, showing improvements over bilinear interpolation with quantile mapping and comparisons with other diffusion approaches. They also highlight strengths in representing extremes, ensemble spread, and future climate scenario preservation.
Major Issues
1. The experimental setup is confined to one ESM (GFDL-ESM4) and one reanalysis dataset (ERA5) over a single continental region (South America). While the framework claims generality to “any ESM,” the evidence is narrow. Without testing multiple models or regions, it is unclear whether the embedding and conditional framework is robust to diverse ESM biases and precipitation regimes. Furthermore, the noise-scale hyperparameter, chosen at the spectral intersection, is dataset-specific and may require fine-tuning across contexts, raising concerns about general applicability.2. The choice of benchmark—bilinear upsampling followed by quantile mapping (QM)—is somewhat too weak given recent literature. QM is indeed the statistical baseline, but the field has seen GAN-based approaches (cycleGANs, conditional GANs), CNN-based super-resolution, and unconditional consistency models that have been applied to similar downscaling tasks. Although the authors briefly compare with Hess et al. (2025) and an EDM model, the evaluation remains limited and not systematic. A stronger study would include comparisons against multiple state-of-the-art baselines (GAN, VAE, transformer- or CNN-based super-resolution methods) under consistent experimental conditions.
3. The manuscript does not clearly articulate how the proposed model differs from existing diffusion-based downscaling and bias correction efforts. For example, Wan et al. (2024) combined diffusion with optimal transport, while EDM (Karras et al., 2022) provides another diffusion benchmark. The authors claim advantages in efficiency and data efficiency, but the conceptual distinction between their conditional embedding approach and these prior diffusion frameworks is not fully elaborated. Is the main novelty the embedding trick with QM + noise to align distributions? Or is it the conditional supervision on observational embeddings? This needs further discussion.
4. While the results on extremes (R95p, Rx1Day) and SSP5-8.5 trends are promising, the metrics are limited. Extreme event validation could be broadened with tail-focused skill scores, quantile-specific errors, or return-level analyses. For future scenarios, the manuscript shows preservation of mean and trend, but it remains unclear whether the method could distort physical consistency (e.g., covariance with other variables, conservation constraints). Since diffusion models are inherently stochastic, an evaluation of physical realism constraints would be useful.
5. A central claim is that the proposed method is independent of the chosen ESM because the diffusion model is trained only on observations. However, in practice, the embedding transformation g requires quantile mapping of ESMs, which is itself model-dependent. Thus, some degree of ESM-specific adjustment is unavoidable. The manuscript should acknowledge this limitation and discuss how sensitive results are to the chosen reference period, quantile mapping scheme, and observational dataset.
Recommendation
The manuscript introduces a promising and technically creative approach that leverages conditional diffusion for a challenging problem in climate modeling. However, the current version has limitations in experimental breadth, benchmark rigor, and clarity of novelty relative to existing diffusion approaches. I recommend major revision before publication. The authors should expand the benchmark comparison, better articulate how their method diverges from and improves upon existing diffusion-based methods, and provide more robust multi-model/multi-region evaluations to strengthen the claim of general applicability.
Citation: https://doi.org/10.5194/egusphere-2025-2646-RC1 -
RC2: 'Comment on egusphere-2025-2646', Anonymous Referee #2, 11 Sep 2025
This manuscript proposes a framework to downscale GFDL-ESM4 using a conditional diffusion model trained only on ERA5. The authors align the train/test distributions by (i) applying quantile mapping (QM) to the ESM data to remove large-scale biases and (ii) adding carefully chosen noise so that both ERA5 and GFDL are projected into a shared embedding on which the conditional diffusion model is trained and applied.
What I like
- Focusing on precipitation is well motivated; it remains one of the hardest fields for ML downscaling and bias correction.
- Within the scope of their data, the authors conduct a relatively deep analysis and explore key hyperparameters. The SI is useful for additional insights
- The idea to select a noise (cutoff) scale via the PSD relationship between ERA5 and GFDL seems new; the model then matches small-scale (high-wavenumber) PSD to ERA5 while preserving large-scale ESM information. This design appears effective based on PSDs, trend preservation, and overall fidelity.Clarify the embedding vs. preprocessing story
- Early in the paper I read the approach as latent-space manipulation after mapping both datasets into a shared space. Later it became clear that the shared embedding is achieved via preprocessing (QM + controlled noising), and the diffusion model learns to reverse the noising conditioned on the preserved large scales.
- This sequencing is a bit confusing and contributes to statements about “no dependency on the test dataset” and “no ESM - OBS pairing” being misread.
- Concretely: this is supervised training on ERA5 and inference on preprocessed GFDL that has been mapped into the same embedding. I recommend making that pipeline explicit with a schematic and a sentence like we preprocess ERA5 and GFDL to a shared embedding (via QM + noising up to scale s); we train on ERA5 in this embedding and apply the learned conditional reverse process to embedded GFDL at inference.
Reference line in the paper: "We map observational and ESM data to a shared embedding space, where both are unbiased towards each other and train a conditional diffusion model to reverse the mapping."Quantile mapping and potential leakage
- Please specify exactly how and when QM is fit and applied, if QM parameters are estimated using years that later appear in validation, or worse, from the future period, there is a risk of data leakage and trend distortion.
Scope: broaden temporal and regional tests
- The analysis is relatively deep but narrow in scope.
- Extend to at least one additional region with different regimes.
- Use a longer temporal validation, including seasonality coupled with temporal behavior (autocorrelations, wet/dry spell durations, event persistence), and spatial/temporal scatter comparisons between train (ERA5) and test (GFDL-embedded) across the full-time span.Choice of the cutoff scale s
- You present s as the PSD-based crossover that yields strong performance. That is reasonable.
- Compare to alternative mechanisms (e.g., providing a noise channel explicitly to a strong CNN baseline, or conditioning variants in diffusion that target high-frequency losses).Baselines and diversity
- The test set lacks model diversity (single ESM), and the baselines are limited.
- Add at least one more ESM with different small-scale biases.
- Add strong ML baselines (e.g., diffusion/SR variants trained on down/upsampled ERA5 pairs, competitive CNN/Transformer SR models) alongside standard statistical methods.Transformations and ablations
- Data undergo heavy transformations (log, scaling, etc.). Please include ablations on these choices and demonstrate their effects.
Figures
- Use a good and uniform color map and clearer legend/contrast for Figure 3.
- In sample 4 (upper-left corner), precipitation patterns appear to change; please comment on whether this is intended regeneration of small-scale structure consistent with large scales, or an artifact.Inputs, reproducibility, and generalizability
- What are the input channels to the model (precip only, or multivariate conditioning such as humidity, winds, temperature)? Please list them explicitly.
- Provide exact config files/scripts (including how s is computed from two PSDs) to ensure full reproducibility.
- Clearly mention that new ESM will require training the model again due to precprocessing - it seems to be missing the in the text.I recommend major revision. The approach is promising and potentially impactful, but the current version requires broader validation, clearer methodological framing, and stronger baselines. I would be happy to review a revised manuscript if the authors choose to resubmit.
Citation: https://doi.org/10.5194/egusphere-2025-2646-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,052 | 50 | 23 | 1,125 | 20 | 17 | 27 |
- HTML: 1,052
- PDF: 50
- XML: 23
- Total: 1,125
- Supplement: 20
- BibTeX: 17
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1