A hybrid optimal estimation and machine learning approach to predict atmospheric composition

Werner, Frank; Bowman, Kevin W.; Lee, Seungwon; Laughner, Joshua L.; Payne, Vivienne H.; McDuffie, James L.

doi:10.5194/egusphere-2025-4864

Preprints

https://doi.org/10.5194/egusphere-2025-4864

Preprints

07 Oct 2025

| 07 Oct 2025

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

A hybrid optimal estimation and machine learning approach to predict atmospheric composition

Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie

Abstract. We present a HYbrid REtrieval Framework (HYREF) that predicts subcolumn carbon monoxide (CO) concentrations from Cross-track Infrared Sounder (CrIS) observations, trained to replicate the TRopospheric Ozone and its Precursors from Earth System Sounding (TROPESS) retrievals based on optimal estimation (OE). Unlike the OE algorithm, which produces retrievals for only a small fraction of available CrIS observations due to expensive but physically accurate radiative transfer, the addition of machine learning (ML) techniques enables full coverage by providing high-resolution predictions for every valid CrIS sample. Importantly, in addition to CO concentrations, TROPESS-HYREF also predicts key retrieval diagnostics, namely column averaging kernels, degrees of freedom, and retrieval errors, that are essential for meaningful comparison with other observations, models, and ingestion into data assimilation.

The new framework achieves excellent performance with correlation coefficients r>0.99 and a bias <0.1% when benchmarked against an independent test set, and reproduces fine-scale spatial patterns in CO fields observed during a major wildfire over North America. A scale analysis reveals substantial variability in CO concentrations below the nominal 0.80° resolution of the TROPESS OE retrieval, which TROPESS-HYREF successfully resolves. Inference is computationally efficient, with daily global predictions completed in minutes on a single compute node. Continuous training with the operational TROPESS OE algorithm ensures that TROPESS-HYREF adapts to changes in the trends and variability of atmospheric composition. This threading of OE-derived physical information and ML-driven efficiency provides a practical pathway to high-resolution atmospheric CO monitoring with robust diagnostics.

Received: 01 Oct 2025 – Discussion started: 07 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie

Status: open (until 05 Jan 2026)

Post a comment Subscribe to comment alert

Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie

Viewed

Total article views: 390 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
279	89	22	390	20	21

HTML: 279
PDF: 89
XML: 22
Total: 390
BibTeX: 20
EndNote: 21

Views and downloads (calculated since 07 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	213	27	7	247
Nov 2025	45	31	10	86
Dec 2025	21	31	5	57

Cumulative views and downloads (calculated since 07 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	213	27	7	247
Nov 2025	45	31	10	86
Dec 2025	21	31	5	57

Viewed (geographical distribution)

Total article views: 382 (including HTML, PDF, and XML) Thereof 382 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 Dec 2025

Short summary

We developed a hybrid machine learning-optimal estimation retrieval system that efficiently and accurately mimics operational retrieval results. Crucially, this algorithm also predicts critical diagnostic variables including observation operators needed for comparison with independent data and ingestion into downstream chemical data assimilation models.


Total:	0
HTML:	0
PDF:	0
XML:	0