Stochastic perturbation of inputs to parametrisation schemes machine-learnt from high-resolution model variability

Reid, Helena; Morcrette, Cyril Julien

doi:10.5194/egusphere-2025-6312

Preprints

https://doi.org/10.5194/egusphere-2025-6312

Preprints

20 Mar 2026

| 20 Mar 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

Stochastic perturbation of inputs to parametrisation schemes machine-learnt from high-resolution model variability

Helena Reid and Cyril Julien Morcrette

Abstract. Stochastic parametrisation schemes represent sources of uncertainty in atmospheric model and several types of these schemes are in widespread use in general circulation models across a variety of temporal and spatial resolutions. We introduce a new stochastic scheme for use in global atmospheric models, which uses a machine learning model trained on high-resolution convection-permitting simulation data to estimate properties of the distribution of subgrid variability in potential temperature. This then informs the profile of stochastic perturbations being applied to the inputs of traditional parametrisation schemes. This scheme is tested in single column model experiments over the tropical west Pacific and is shown to improve model performance in this case.

Received: 17 Dec 2025 – Discussion started: 20 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Helena Reid and Cyril Julien Morcrette

Status: open (extended)

Post a comment Subscribe to comment alert

CEC1:
'Comment on egusphere-2025-6312', Juan Antonio Añel, 28 Mar 2026 reply

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
In addition, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Reply

Citation: https://doi.org/10.5194/egusphere-2025-6312-CEC1
- AC1: 'Reply on CEC1', Helena Reid, 30 Mar 2026 reply
  
  Have uploaded these to Zenodo. The GitHub references can be revised to use https://doi.org/10.5281/zenodo.19331887 and https://doi.org/10.5281/zenodo.19331816 for ENNUF and LFRic respectively.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2025-6312-AC1
RC1:
'Comment on egusphere-2025-6312', Anonymous Referee #1, 20 Apr 2026 reply
The authors present a novel approach that leverages machine learning (ML) to generate perturbed inputs for a suite of physical parameterizations. Using output from several limited-area model simulations, the proposed method emulates subgrid-scale variability in key thermodynamic variables. The resulting framework, referred to as PAPILLON, produces stochastic perturbations that are then applied to the inputs of conventional physical parameterizations.
The ML emulator is evaluated in a single-column model configuration against the ERA-5 dataset, based on a single test case. The results suggest that using PAPILLON to perturb the inputs leads to slightly improved performance compared to an ensemble generated using the SPT perturbation scheme.
I appreciate the originality of the proposed approach. The framework introduced here provides an interesting way to combine machine learning with existing physical parameterizations. However, the conclusions would be strengthened by the inclusion of additional test cases to assess the robustness of the results.
While the manuscript is generally well written, I found parts of it difficult to follow. In particular, the level of detail provided in some sections tends to obscure the main message. Streamlining the presentation and improving the overall structure, e.g. by introducing additional subsections and clearer signposting, would significantly enhance readability.
General comments
The introduction contains a substantial amount of useful background information. However, I found it somewhat difficult to follow, and the main line of reasoning is not always clear. Clarifying or simplifying the introduction would strengthen the argument.

The manuscript would benefit from substantial restructuring to improve clarity and focus. At present, the content is somewhat diffuse, with a level of detail in places that tends to obscure the main message. Streamlining the text and emphasizing the key ideas more directly would significantly improve readability.

It is not entirely clear to me why perturbations are applied only to potential temperature. While the introduction highlights the importance of this variable for convection, it is presented more as an example than as a justification for this specific choice. The authors are encouraged to clarify this point more explicitly. It may also be more appropriate to move this discussion to the Methods section.

Although I understand that running a large number of limited-area model simulations is computationally expensive, I wonder whether sampling variability over only one month is sufficient to ensure robustness. Some discussion of this limitation would be helpful.

It would be useful to include a brief description in the main text of the numerical implementation in the single-column model (SCM), in particular regarding the use of ENNUF.

Specific comments
l. 141-142: Was the model trained using randomly selected samples across all spatial domains and timesteps? To improve independence between training, validation, and test datasets, the authors might consider leaving out entire simulations (e.g. some LAM runs) or contiguous time periods.

l. 270: How is the height of the troposphere diagnosed?

Figure 8: When either the length scale or the time parameter is varied, what value is used for the other parameter that is held fixed?

Reply
Citation: https://doi.org/10.5194/egusphere-2025-6312-RC1

Helena Reid and Cyril Julien Morcrette

Data sets

CRMML Cyril Morcrette https://doi.org/10.5281/zenodo.13332843

Model code and software

LFRic atmospheric model UK Met Office https://github.com/MetOffice/lfric_apps/

ENNUF machine learning translator Helena Reid, Theano Xirouchaki, Joana Rodrigues, and Cyril Morcrette https://github.com/MetOffice/ennuf

Helena Reid and Cyril Julien Morcrette

Viewed

Total article views: 941 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
691	193	57	941	51	58

HTML: 691
PDF: 193
XML: 57
Total: 941
BibTeX: 51
EndNote: 58

Views and downloads (calculated since 20 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	559	142	51	752
Apr 2026	118	40	6	164
May 2026	14	11	0	25

Cumulative views and downloads (calculated since 20 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	559	142	51	752
Apr 2026	118	40	6	164
May 2026	14	11	0	25

Viewed (geographical distribution)

Total article views: 938 (including HTML, PDF, and XML) Thereof 938 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 May 2026

Short summary

Atmospheric models used for weather and climate benefit from representing the random effects of processes that are too small to be resolved by the model. Here, very detailed simulations are used to learn about the amount of variability that would be expected in a coarser model. We then use machine learning techniques to predict that fine-scale variability and show that including these predictions improve some idealised simulations over the tropical ocean.


Total:	0
HTML:	0
PDF:	0
XML:	0