A hybrid optimal estimation and machine learning approach to predict atmospheric composition
Abstract. We present a HYbrid REtrieval Framework (HYREF) that predicts subcolumn carbon monoxide (CO) concentrations from Cross-track Infrared Sounder (CrIS) observations, trained to replicate the TRopospheric Ozone and its Precursors from Earth System Sounding (TROPESS) retrievals based on optimal estimation (OE). Unlike the OE algorithm, which produces retrievals for only a small fraction of available CrIS observations due to expensive but physically accurate radiative transfer, the addition of machine learning (ML) techniques enables full coverage by providing high-resolution predictions for every valid CrIS sample. Importantly, in addition to CO concentrations, TROPESS-HYREF also predicts key retrieval diagnostics, namely column averaging kernels, degrees of freedom, and retrieval errors, that are essential for meaningful comparison with other observations, models, and ingestion into data assimilation.
The new framework achieves excellent performance with correlation coefficients r>0.99 and a bias <0.1% when benchmarked against an independent test set, and reproduces fine-scale spatial patterns in CO fields observed during a major wildfire over North America. A scale analysis reveals substantial variability in CO concentrations below the nominal 0.80° resolution of the TROPESS OE retrieval, which TROPESS-HYREF successfully resolves. Inference is computationally efficient, with daily global predictions completed in minutes on a single compute node. Continuous training with the operational TROPESS OE algorithm ensures that TROPESS-HYREF adapts to changes in the trends and variability of atmospheric composition. This threading of OE-derived physical information and ML-driven efficiency provides a practical pathway to high-resolution atmospheric CO monitoring with robust diagnostics.