the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Stochastic perturbation of inputs to parametrisation schemes machine-learnt from high-resolution model variability
Abstract. Stochastic parametrisation schemes represent sources of uncertainty in atmospheric model and several types of these schemes are in widespread use in general circulation models across a variety of temporal and spatial resolutions. We introduce a new stochastic scheme for use in global atmospheric models, which uses a machine learning model trained on high-resolution convection-permitting simulation data to estimate properties of the distribution of subgrid variability in potential temperature. This then informs the profile of stochastic perturbations being applied to the inputs of traditional parametrisation schemes. This scheme is tested in single column model experiments over the tropical west Pacific and is shown to improve model performance in this case.
- Preprint
(3026 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 15 May 2026)
-
CEC1: 'Comment on egusphere-2025-6312', Juan Antonio Añel, 28 Mar 2026
reply
-
AC1: 'Reply on CEC1', Helena Reid, 30 Mar 2026
reply
Have uploaded these to Zenodo. The GitHub references can be revised to use https://doi.org/10.5281/zenodo.19331887 and https://doi.org/10.5281/zenodo.19331816 for ENNUF and LFRic respectively.
Citation: https://doi.org/10.5194/egusphere-2025-6312-AC1
-
AC1: 'Reply on CEC1', Helena Reid, 30 Mar 2026
reply
-
RC1: 'Comment on egusphere-2025-6312', Anonymous Referee #1, 20 Apr 2026
reply
The authors present a novel approach that leverages machine learning (ML) to generate perturbed inputs for a suite of physical parameterizations. Using output from several limited-area model simulations, the proposed method emulates subgrid-scale variability in key thermodynamic variables. The resulting framework, referred to as PAPILLON, produces stochastic perturbations that are then applied to the inputs of conventional physical parameterizations.
The ML emulator is evaluated in a single-column model configuration against the ERA-5 dataset, based on a single test case. The results suggest that using PAPILLON to perturb the inputs leads to slightly improved performance compared to an ensemble generated using the SPT perturbation scheme.
I appreciate the originality of the proposed approach. The framework introduced here provides an interesting way to combine machine learning with existing physical parameterizations. However, the conclusions would be strengthened by the inclusion of additional test cases to assess the robustness of the results.
While the manuscript is generally well written, I found parts of it difficult to follow. In particular, the level of detail provided in some sections tends to obscure the main message. Streamlining the presentation and improving the overall structure, e.g. by introducing additional subsections and clearer signposting, would significantly enhance readability.
General comments
- The introduction contains a substantial amount of useful background information. However, I found it somewhat difficult to follow, and the main line of reasoning is not always clear. Clarifying or simplifying the introduction would strengthen the argument.
- The manuscript would benefit from substantial restructuring to improve clarity and focus. At present, the content is somewhat diffuse, with a level of detail in places that tends to obscure the main message. Streamlining the text and emphasizing the key ideas more directly would significantly improve readability.
- It is not entirely clear to me why perturbations are applied only to potential temperature. While the introduction highlights the importance of this variable for convection, it is presented more as an example than as a justification for this specific choice. The authors are encouraged to clarify this point more explicitly. It may also be more appropriate to move this discussion to the Methods section.
- Although I understand that running a large number of limited-area model simulations is computationally expensive, I wonder whether sampling variability over only one month is sufficient to ensure robustness. Some discussion of this limitation would be helpful.
- It would be useful to include a brief description in the main text of the numerical implementation in the single-column model (SCM), in particular regarding the use of ENNUF.
Specific comments
- l. 141-142: Was the model trained using randomly selected samples across all spatial domains and timesteps? To improve independence between training, validation, and test datasets, the authors might consider leaving out entire simulations (e.g. some LAM runs) or contiguous time periods.
- l. 270: How is the height of the troposphere diagnosed?
- Figure 8: When either the length scale or the time parameter is varied, what value is used for the other parameter that is held fixed?
Citation: https://doi.org/10.5194/egusphere-2025-6312-RC1
Data sets
CRMML Cyril Morcrette https://doi.org/10.5281/zenodo.13332843
Model code and software
LFRic atmospheric model UK Met Office https://github.com/MetOffice/lfric_apps/
ENNUF machine learning translator Helena Reid, Theano Xirouchaki, Joana Rodrigues, and Cyril Morcrette https://github.com/MetOffice/ennuf
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 218 | 68 | 12 | 298 | 9 | 12 |
- HTML: 218
- PDF: 68
- XML: 12
- Total: 298
- BibTeX: 9
- EndNote: 12
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
In addition, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor