the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Representing Subgrid-Scale Cloud Effects in a Radiation Parameterization using Machine Learning: MLe-radiation v1.0
Abstract. Improvements of Machine Learning (ML)-based radiation emulators remain constrained by the underlying assumptions to represent horizontal and vertical subgrid-scale cloud distributions, which continue to introduce substantial uncertainties. In this study, we introduce a method to represent the impact of subgrid-scale clouds by applying ML to learn processes from high-resolution model output with a horizontal grid spacing of 5 km. In global storm resolving models, clouds begin to be explicitly resolved. Coarse-graining these high-resolution simulations to the resolution of coarser Earth System Models yields radiative heating rates that implicitly include subgrid-scale cloud effects, without assumptions about their horizontal or vertical distributions. We define the cloud radiative impact as the difference between all-sky and clear-sky radiative fluxes, and train the ML component solely on this cloud-induced contribution to heating rates. The clear-sky tendencies remain being computed with a conventional physics-based radiation scheme. This hybrid design enhances generalization, since the machine-learned part addresses only subgrid-scale cloud effects, while the clear-sky component remains responsive to changes in greenhouse gas or aerosol concentrations. Applied to coarse-grained data offline, the ML-enhanced radiation scheme reduces errors by a factor of 4–10 compared with a conventional coarse-scale radiation scheme. This shows the potential of representing subgrid-scale cloud effects in radiation schemes with ML for the next generation of Earth System Models.
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-4949', Anonymous Referee #1, 24 Nov 2025
-
RC2: 'Comment on egusphere-2025-4949', Anonymous Referee #2, 29 Nov 2025
Review Summary
This manuscript aims to improve the treatment of sub-grid cloud variability in coarse-scale simulations by using machine learning to predict cloud radiative effects (CRE). Specifically, the authors use the existing radiation scheme for clear-sky conditions, while enhancing all-sky calculations with CRE predictions from a machine-learning model trained on coarsened high-resolution simulations.
The idea is scientifically reasonable, and the introduction is well-written. However, I struggled to fully understand the results. The manuscript would benefit greatly from clearer and more systematic descriptions of what was exactly done in the datasets, especially in Sections 3 and 4.
Major Comments
1. Clarify the treatment of radiation in high-resolution and coarsened datasets
This is the most critical missing piece. Section 3 describes the QUBICC simulations and coarsening procedure, yet radiation is barely addressed. The only mention is that snow is considered in RTE+RRTMGP, which is relatively minor to the overall methodology. Please clarify:- In the original QUBICC simulations (5 km):
– Was radiation computed using RTE+RRTMGP with maximum-random overlap?
– Or did you use an all-or-nothing cloud cover assumption? The manuscript loosely uses “high-resolution simulations,” making it unclear. - In the coarsening process: Are radiation fields simply averaged over coarse boxes? How is cloud fraction determined in the coarsened dataset? While Grundner et al. (2022) is cited for coarsening process, a more explicit description within this paper is essential for readers to understand the implications of coarsening, especially on radiation fields and cloud fraction.
2. What is the benchmark for “improvement”?
It is unclear what constitutes the “truth” when evaluating improvement. It appears that the authors compare ML-enhanced all-sky radiation against that computed by the existing radiation scheme using coarse-scale input. However, since neither inherently represents truth, it is difficult to claim “improvement.” Rather, it is a comparison between two imperfect approximations.3. Section 4 is very difficult to follow
The description of datasets and experiments in Section 4 is unclear. The repeated use of “with pyRTE” is ambiguous because presumably
all simulations could involve pyRTE. I could not clearly understand which datasets were being evaluated, what the reference was, and what exactly was being shown in Figures 2–4. It would be helpful to provide a table summarizing all datasets. Also, please clearly address:What is the ML model’s performance on the coarsened high-resolution test dataset? Is Figure 4 meant to show this?
What is the difference between the datasets used in Figures 2&3, and those described in Section 4?
Much of the confusion could be resolved by more precise wording and descriptions and by consistently naming the datasets.
Minor Comments
- Page 6: Printing uneven output intervals is useful. Have you investigated potential impacts on sampling global cloud distributions and cloud diurnal cycle?
- Page 6: The phrase “In order to evaluate the high-resolution data” is unclear. Coarse-scale data cannot evaluate high-resolution data; please rephrase more precisely.
- Page 6: In addition to comparing variable ranges, it seems more important to analyze joint distributions, which reflect underlying physics and correlations that machine would try to learn.
- Page 8: The statement “which may be due to the different spread in cloud water at 1 km (Figure 2b)” appears to be an educated guess. It would be more informative to provide a more insightful explanation, potentially linking this difference to specific physical processes or parameterization choices that could be responsible.
- Page 9: The statement regarding “horizontally homogeneous input parameters” should explicitly specify the corresponding dataset, particularly in relation to the datasets described in the opening paragraph of Section 4.
- Page 9: The statement “exceeding 5 K/d” seems inconsistent with Figure 4 (SW max appears closer to 3 K/d).
- Page 10: Figure 4 caption incorrectly labels “samples with partial cloudiness (right column)”.
- Page 10: The statement that cloudier scenes have larger errors is correct but it would be better to provide deeper insight.
Citation: https://doi.org/10.5194/egusphere-2025-4949-RC2 - In the original QUBICC simulations (5 km):
-
CEC1: 'Comment on egusphere-2025-4949 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Dec 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived the code for ICON 2.6.4 on a Git site. However, Git sites are not a suitable repositories for scientific publication, something that our policy clearly states. Also, you have failed to provide the input and output data used for the computations in your manuscript.Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please include the relevant primary input/output data.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-4949-CEC1
Model code and software
Representing Subgrid-Scale Cloud Effects in a Radiation Parameterization using Machine Learning: MLe-radiation v1.0 Katharina Hafner https://doi.org/10.5281/zenodo.17280639
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 179 | 0 | 3 | 182 | 0 | 0 |
- HTML: 179
- PDF: 0
- XML: 3
- Total: 182
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Major:
The introduction effectively argues for separating cloud radiative impacts from all-sky radiation, suggesting this approach could benefit climate change simulations by avoiding the need for retraining. However, this potential benefit is not supported by evidence in the results. To strengthen this claim, the authors should either:
I believe the model would also work for all-sky heating rate as the target, performing as well as it does for the cloud radiative effect. In this case, we don’t have to go through all the separation process.
Minor:
“O3, ρ, T, and Tsurf are normalized using their mean values µ and standard deviation σ”: Please specify the dimension over which the mean and standard deviation are calculated. Are they computed over the whole dataset? Is there any height dependency?
“We discarded a few coarse-grained cells, e.g., if the surface height of the coarse-grained cell deviated by more than 0.5m from the coarse-scale surface height.” I don’t get this part? Do you mean the variance of the fine-grained cell is larger than a certain threshold?
Figure 3. The difference between coarse-scale and coarse-grained cloud impact below 1km is quite obvious for both lw and sw. Is it concerning?
Figure 4. The notation should be improved to avoid confusing. I assume the pyRTE results are meant to represent the coarse-scale radiation result, which is the baseline here. The ground truth is the saved results from QUBICC simulation. It would be less confusing if you can make this clear in both text and the figure/caption.
“The second column of Figure 4 shows results for fully cloudy samples (total cloud cover of 100%). For pyRTE, the MAE peaks near 10km, exceeding 5K/d for both SW and LW.”: Is the pyRTE SW/LW MAE larger than 5K/d? The blue line is ~0.5K/d for SW and ~1K/d for LW.
“The corresponding R2 are low, with average values of 0.83 (SW) and 0.66 (LW), compared to 0.98 for the ML-enhanced scheme”. How are the averaged values computed? Weighted by mass or simple average over values at different levels (how the levels are distributed)?
Figure 5. The breakdown of the different regions is informative. Is it possible to make a map of bias and MAE (if you have enough samples for the 80km resolution grid or even 200km)? It would provide more information for different audiences. For example, I am curious about the quality in the Antarctica region.
Figure C1. Could you comment on the large error in the stratosphere for both pyRTE and ML?