Preprints
https://doi.org/10.5194/egusphere-2025-4864
https://doi.org/10.5194/egusphere-2025-4864
07 Oct 2025
 | 07 Oct 2025
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

A hybrid optimal estimation and machine learning approach to predict atmospheric composition

Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie

Abstract. We present a HYbrid REtrieval Framework (HYREF) that predicts subcolumn carbon monoxide (CO) concentrations from Cross-track Infrared Sounder (CrIS) observations, trained to replicate the TRopospheric Ozone and its Precursors from Earth System Sounding (TROPESS) retrievals based on optimal estimation (OE). Unlike the OE algorithm, which produces retrievals for only a small fraction of available CrIS observations due to expensive but physically accurate radiative transfer, the addition of machine learning (ML) techniques enables full coverage by providing high-resolution predictions for every valid CrIS sample. Importantly, in addition to CO concentrations, TROPESS-HYREF also predicts key retrieval diagnostics, namely column averaging kernels, degrees of freedom, and retrieval errors, that are essential for meaningful comparison with other observations, models, and ingestion into data assimilation.

The new framework achieves excellent performance with correlation coefficients r>0.99 and a bias <0.1% when benchmarked against an independent test set, and reproduces fine-scale spatial patterns in CO fields observed during a major wildfire over North America. A scale analysis reveals substantial variability in CO concentrations below the nominal 0.80° resolution of the TROPESS OE retrieval, which TROPESS-HYREF successfully resolves. Inference is computationally efficient, with daily global predictions completed in minutes on a single compute node. Continuous training with the operational TROPESS OE algorithm ensures that TROPESS-HYREF adapts to changes in the trends and variability of atmospheric composition. This threading of OE-derived physical information and ML-driven efficiency provides a practical pathway to high-resolution atmospheric CO monitoring with robust diagnostics.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie

Status: open (until 12 Nov 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie
Frank Werner, Kevin W. Bowman, Seungwon Lee, Joshua L. Laughner, Vivienne H. Payne, and James L. McDuffie
Metrics will be available soon.
Latest update: 07 Oct 2025
Download
Short summary
We developed a hybrid machine learning-optimal estimation retrieval system that efficiently and accurately mimics operational retrieval results. Crucially, this algorithm also predicts critical diagnostic variables including observation operators needed for comparison with independent data and ingestion into downstream chemical data assimilation models.
Share