A Machine Learning Method for Estimating Atmospheric Trace Gas Concentration Baselines

Gerrand, Kirstin; Fillola, Elena; Manning, Alistair J.; Arduini, Jgor; Krummel, Paul B.; Lunder, Chris R.; Mühle, Jens; O'Doherty, Simon; Park, Sunyoung; Prinn, Ronald G.; Reimann, Stefan; Young, Dickon; Rigby, Matthew

doi:10.5194/egusphere-2025-4137

Preprints

https://doi.org/10.5194/egusphere-2025-4137

Preprints

04 Sep 2025

| 04 Sep 2025

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

A Machine Learning Method for Estimating Atmospheric Trace Gas Concentration Baselines

Kirstin Gerrand, Elena Fillola, Alistair J. Manning, Jgor Arduini, Paul B. Krummel, Chris R. Lunder, Jens Mühle, Simon O'Doherty, Sunyoung Park, Ronald G. Prinn, Stefan Reimann, Dickon Young, and Matthew Rigby

Abstract. Estimates of trace gas baseline mole fractions in high-frequency atmospheric measurement records are crucial for analysing long-term changes in atmospheric composition. Baseline mole fractions are those that would be observed far from emission sources (and hence are representative of background conditions) at specific latitudes in the atmosphere. Previous methods for inferring baseline mole fractions have used statistical or meteorological approaches, or, if available, co-measured tracer species thought only to be emitted from non-baseline wind sectors. Combinations of these techniques have also been employed in some applications. Statistical methods typically fit a baseline to the observations themselves, while meteorological methods use atmospheric models of varying complexity to categorise air mass origins. In this paper, we present a novel machine learning method for estimating trace gas baseline mole fractions, which benefits from the physical basis of model-based filtering without the need for running an expensive simulator. Our approach offers the accessibility and computational cost-effectiveness of statistical models, without the associated smoothing or difficulty in identifying rapid baseline variations. By training on historical Lagrangian particle dispersion model outputs, our model learns to predict baseline mole fractions directly from meteorological fields. This advancement opens new avenues for low-latency trace gas time series data analysis, reconstruction of historical baseline trends, and improved utilisation of tracer measurement air mass classification methods.

Received: 26 Aug 2025 – Discussion started: 04 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2463 KB)

Supplement (20136 KB)

Download & links

Preprint (2463 KB)
Metadata XML
Supplement (20136 KB)
BibTeX
EndNote

Status: open (until 10 Dec 2025)

Post a comment Subscribe to comment alert

Supplement

https://doi.org/10.5194/egusphere-2025-4137-supplement

Viewed

Total article views: 1,790 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,731	52	7	1,790	14	25	23

HTML: 1,731
PDF: 52
XML: 7
Total: 1,790
Supplement: 14
BibTeX: 25
EndNote: 23

Views and downloads (calculated since 04 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	1,647	39	2	1,688
Oct 2025	74	11	4	89
Nov 2025	10	2	1	13

Cumulative views and downloads (calculated since 04 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	1,647	39	2	1,688
Oct 2025	74	11	4	89
Nov 2025	10	2	1	13

Viewed (geographical distribution)

Total article views: 1,700 (including HTML, PDF, and XML) Thereof 1,700 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Nov 2025

Short summary

To analyse long-term trends in atmospheric trace gas concentrations, it is important to identify data points minimally affected by local pollution sources or air masses carried from other latitudes or altitudes. Traditional methods for detecting these “baselines” are computationally expensive or lack a basis in physical principles. This paper introduces a machine-learning method that uses meteorological data and offers significantly lower computational costs compared to physics-based techniques.


Total:	0
HTML:	0
PDF:	0
XML:	0