Using a data-driven statistical model to better evaluate surface turbulent heat fluxes in weather and climate numerical models: a demonstration study

Zouzoua, Maurin; Bastin, Sophie; Chiriaco, Marjolaine; Lohou, Fabienne; Lothon, Marie; Jome, Mathilde; Mallet, Cécile; Barthes, Laurent; Canut, Guylaine

doi:https://doi.org/10.5194/egusphere-2024-568

Preprints

https://doi.org/10.5194/egusphere-2024-568

Preprints

22 Apr 2024

| 22 Apr 2024

Status: this preprint is open for discussion.

Using a data-driven statistical model to better evaluate surface turbulent heat fluxes in weather and climate numerical models: a demonstration study

Maurin Zouzoua, Sophie Bastin, Marjolaine Chiriaco, Fabienne Lohou, Marie Lothon, Mathilde Jome, Cécile Mallet, Laurent Barthes, and Guylaine Canut

Abstract. This study proposes the use of a data-driven statistical model to freeze the errors due to differences in environmental forcing when evaluating the surface turbulent heat fluxes from weather and climate numerical models with the observations. It takes advantage of continuous acquisition over approximately ten years of near-surface sensible and latent heat fluxes (H and LE respectively) together with ancillary parameters over the supersite "Météopole" of the French national research infrastructure ACTRIS-FR, located in Toulouse. The statistical model consists of several multi-layer perceptrons (MLPs) with the same architecture. Thirteen variables characterizing the environmental forcing in the surface layer at an hourly time scale are used as input parameters to estimate H and LE simultaneously. The MLPs are trained using 5-year observational data under a 5-fold cross-validation. The remaining data is used to test the estimates on unknown conditions. A case study is performed with data from a regional climate simulation. The performance of the statistical model ranges within the state-of-the-art surface parametrization schemes on hourly and seasonal time scales. It has also a good generalization ability, but hardly estimates negative H and large LE. The statistical model is used to evaluate the simulated fluxes under the simulated environment to better examine the flaws of their numerical formulation throughout the simulation. Comparison of simulated fluxes with observed and MLP-based fluxes show different results. According to MLP-based fluxes in the simulated environment, the land surface scheme of this climate model tends to underestimate large sensible heat flux. Thus, it incorrectly partitions between surface heating and evaporation during the late summer. Our innovative method provides insight into differently evaluating the simulated near-surface turbulent heat fluxes when a long period of comprehensive observations is available. It can usefully support ongoing efforts for improvements of surface parametrization schemes.

Received: 01 Mar 2024 – Discussion started: 22 Apr 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Maurin Zouzoua, Sophie Bastin, Marjolaine Chiriaco, Fabienne Lohou, Marie Lothon, Mathilde Jome, Cécile Mallet, Laurent Barthes, and Guylaine Canut

Status: open (extended)

Post a comment Subscribe to comment alert

CEC1:
'Comment on egusphere-2024-568 - No Compliance with GMD's policy', Juan Antonio Añel, 12 May 2024 reply

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have not published in a permanent repository neither the code used in your work nor the input or output data you use, which does not comply with our policy. Therefore, you must reply to this comment as soon as possible with the requested information (both links and DOIs).
Also, you must include the modified 'Code and Data Availability' section, including the mentioned information, in any potentially reviewed version of your manuscript.
If you do not fix this problem, we will have to reject your manuscript for publication in our journal. I should note that, given this lack of compliance with our policy, your manuscript should not have been accepted in Discussions. Therefore, the current situation with your manuscript is irregular.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Reply

Citation: https://doi.org/10.5194/egusphere-2024-568-CEC1
- CC1: 'Reply on CEC1', Maurin Zouzoua, 21 May 2024 reply
  
  Hello,
  Thank you for your message. I'm working to rectify this anomaly.
  Regards,
  Maurin ZOUZOUA
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2024-568-CC1
- AC1: 'Reply on CEC1', Maurin Zouzoua, 24 May 2024 reply
  
  Dear Juan,
  Thank you for your helpful feedback. An example of a workflow to evaluate the modelled turbulent heat fluxes using our approach can be found at this link: https://doi.org/10.5281/zenodo.11261853. The latest update of Meteopole observational data is now in use. The manuscript will be updated accordingly.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2024-568-AC1
RC1: 'Comment on egusphere-2024-568', Anonymous Referee #1, 20 Jun 2024 reply

The paper by Maurin ZOUZOUA et al. exhibits a demonstration study in which they propose a data-driven statistical model to better represent the turbulent heat flux exchanges in the atmospheric surface layer. The aim of this improvement serves as a potential alternative to the classical parameterization schemes that are inherently violated in weather and climate numerical models. The developed model uses approximately 10 years of near-surface sensible and latent heat fluxes over the supersite "Météopole" of the French national research infrastructure ACTRIS-FR, located in Toulouse, for training and testing.

The authors selected thirteen variables to characterize the environmental forcing at an hourly time scale to estimate these near-surface fluxes (heat and latent fluxes). The 10-year observational data was split half-half between training and testing. The suggested model suffers from predicting negatively large fluxes, a signature of the stable regimes which current models suffer from too.

Technically, the work is well executed. The title and the abstract are informative, succinct, and clear. The manuscript is well organized, logically and clearly conveyed. The authors took care of clearly explaining the motivation behind their work, depicting all the deployed measurement tools, and formulating the research topics to which the manuscript is answering. This is both instrumental for understanding the scope and content of the manuscript, making its reading easy and pleasant.

Although I think some of my comments below are significant and I would like to see the responses, there are number of points that must be clarified before accepting it for publication.

I recommend acceptance with minor revisions.

General comment: Some of the grammar/language shall be revisited

Specific comments:

Line 25: Mention also the first important source of bias.
Line 74: In section 5

3.1:
Why not splitting the training datasets in each stability category (Stable, Neutral, Unstable) and different friction velocity ranges
Line 157: (ii) instead of (iii)

3.2:

Coarse time resolution (half-hour averages data may mask physical information for smaller time scales, at what time step the simulations were run?)

Lines 201-202:
The most conventional meteorological variables, such as T and RH at 2 m agl are also available. The lowest level as mentioned is within 20 m (pressure level akin to sigma coordinate which dynamically changes with time), so half eta-level is still higher than 2m. Any interpolation here?

Although, you mentioned this here Lines 303-305:
Moreover, the meteorological variables are directly taken at the first half-eta level (M=1, around 8 magl) instead of diagnostic variables as much as possible.

But 2m and 8m are quite far especially under stable conditions… Maybe, you need to comment on this..

Line 202:
The data are stored at a temporal resolution of 3 hours …

How did you compare the instantaneous 3^rd hour snapshot with observations averaged in half-hour chunks?

Line 204:

Meanwhile, the surface data, mostly provided by ORCHIDEE, consist of time-centred mean over a 3 hours window…

Specify what surface data…

Lines 209-213:

Could you show in Fig. 3 the grid-layout that shows the real observation geographic coordinate and the 2 nearest grid cells considered in the analysis.

In this context, given the locality of space (dx=dy=20 Km) and the point observations collected from a tower at specific coordinates, have you done sensitivity analysis on the effect of spatial averaging? Also, any comments on the local dynamics forcings in the considered two grid cells compared to the observations? You considered the surface type aspect as a main criterion in selecting these 2 grid cells, but what about the topography effects and the local environmental dynamics (wind speed and directionality, temperature, …) when comparing outputs in these 2 grid cells to the observation’s dataset at that specific grid location?

3.3.1:
What’s the criteria for selecting the training dataset?

Is it season independent?

Have you done any sensitivity analysis on choosing different dataset i.e., dataset convergence in terms of variability in the final weights and biases of MLP?

3.3.2:

Lines 298-300: Define these variable acronyms especially:

Eventually, 4 trigonometric temporal coordinates are added for the description of seasonal (dx, dy) and diurnal (hx, hy) cycles.

Reply

Citation: https://doi.org/10.5194/egusphere-2024-568-RC1

Maurin Zouzoua, Sophie Bastin, Marjolaine Chiriaco, Fabienne Lohou, Marie Lothon, Mathilde Jome, Cécile Mallet, Laurent Barthes, and Guylaine Canut

Viewed

Total article views: 348 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
229	94	25	348	18	16

HTML: 229
PDF: 94
XML: 25
Total: 348
BibTeX: 18
EndNote: 16

Views and downloads (calculated since 22 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	91	32	8	131
May 2024	86	32	8	126
Jun 2024	32	17	6	55
Jul 2024	20	13	3	36

Cumulative views and downloads (calculated since 22 Apr 2024)

Month	HTML	PDF	XML	Total
Apr 2024	91	32	8	131
May 2024	86	32	8	126
Jun 2024	32	17	6	55
Jul 2024	20	13	3	36

Viewed (geographical distribution)

Total article views: 342 (including HTML, PDF, and XML) Thereof 342 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2024

Short summary

This study proposes using a statistical model to freeze errors due to differences in environmental forcing when evaluating the surface turbulent heat fluxes from numerical simulations with observations. The statistical model is first built with observation and then applied to the simulated environment to generate possibly observed fluxes. This novel method provides insight into differently evaluating the numerical formulation of turbulent heat fluxes with a long period of observational data.


Total:	0
HTML:	0
PDF:	0
XML:	0