the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Machine Learning Method for Estimating Atmospheric Trace Gas Concentration Baselines
Abstract. Estimates of trace gas baseline mole fractions in high-frequency atmospheric measurement records are crucial for analysing long-term changes in atmospheric composition. Baseline mole fractions are those that would be observed far from emission sources (and hence are representative of background conditions) at specific latitudes in the atmosphere. Previous methods for inferring baseline mole fractions have used statistical or meteorological approaches, or, if available, co-measured tracer species thought only to be emitted from non-baseline wind sectors. Combinations of these techniques have also been employed in some applications. Statistical methods typically fit a baseline to the observations themselves, while meteorological methods use atmospheric models of varying complexity to categorise air mass origins. In this paper, we present a novel machine learning method for estimating trace gas baseline mole fractions, which benefits from the physical basis of model-based filtering without the need for running an expensive simulator. Our approach offers the accessibility and computational cost-effectiveness of statistical models, without the associated smoothing or difficulty in identifying rapid baseline variations. By training on historical Lagrangian particle dispersion model outputs, our model learns to predict baseline mole fractions directly from meteorological fields. This advancement opens new avenues for low-latency trace gas time series data analysis, reconstruction of historical baseline trends, and improved utilisation of tracer measurement air mass classification methods.
- Preprint
(2463 KB) - Metadata XML
-
Supplement
(20136 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4137', Anonymous Referee #1, 31 Jan 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4137/egusphere-2025-4137-RC1-supplement.pdfCitation: https://doi.org/
10.5194/egusphere-2025-4137-RC1 -
RC2: 'Comment on egusphere-2025-4137', Anonymous Referee #2, 09 Feb 2026
The authors make a compelling case for the ML technique they present including the benefit of the technique over traditional methods, and its worthwhile use cases including computational cost savings, the short turnaround time compared to other techniques which require complex archived meteorology, and the applicability to looking at trends further back than is possible with other techniques. They also outline the limitations of the ML technique, but I would like to see them address these limitations in light of some of the results to a larger extent.
Overall, I recommend this for publishing once my comments and the comments of the other reviewer are addressed.
General Comments
The “mole fraction in air” time series predicted baselines often include what appear to be massive deviations from the other baseline points. This isn’t addressed in the paper, but seems to be a rather glaring showstopper for many species. E.g., CF4 from Gosan, HFC-134a in Monte Cimone, especially as this contrasts the comment on line 279 that suggests that CF4 should have fewer false positives or negatives. I understand that the prevalence of false positives is quantified, and used as a justification that the monthly mean may be the useful component, but it seems that the expectation for the ML method output is being set rather low, and further, there isn’t sufficient discussion on what is useful and what is not. Are there species for which this is not a useful methodology, and/or locations where there are too many local sources for this to be a viable methodology?
The figures in the current Section S4.4 (and beyond – i.e., pages 10-95) should have Figure S numbers, letters to differentiate the panels, and figure captions. This would be very helpful for the reader to quickly understand the differences between the middle and bottom plots in each set of three, and also what exactly is being called out with the month YYYY markers on the monthly means plots at the bottom of each page. Further, on the monthly means plots – some of these have notations that are difficult to read and should be made readable – e.g., page 23 (CH2Cl2 from Monte Cimone), page 14 (CH2Cl2 from Kennaook/Cape Grim), page 25 (HCFC-22 from Monte Cimone), page 61 (HFC-125 from Ragged Point), etc.
Lastly, a table of contents for the SI would be really helpful for faster reference to specific locations/species for the reader.
Technical Corrections:
Lines 39-43 – “to the West/East/West/North/East” should all be “to the west/east/west/north/east”, however “Northern Hemisphere” should be capitalized, so the text should be “Southern-Hemispheric air…” and “below-Northern-Hemispheric mole fractions.”
Lines 71-72 – when citations are integrated into the text, they should be separated by commas as such: “Henne et al. (2008), Lööv et al. (2008), and Salvador et al. (2010) computed back trajectories…”
Lines 139-140 – probably better not to split up the clause, and rewrite as: “… gave a less than 10 % error in wind direction across a six-month sample period (January to June 2015) at Mace Head, Ireland.”
Lines 196-197 – there should be a space after “Eq.”: i.e., “(Eq. (1))”, and “(Eqs. (2) and (3))”
Line 221 – for consistency, “Material” should be capitalized.
Line 225 – “Gosan, South Korea”
Line 269 – “Northern and Southern Hemisphere” should be capitalized.
Figure 1 – in the caption, add a comma after “Ireland”, and it would be best to be consistent with “Gosan, South Korea”, both in the caption and in the figure. Also, spell out “January”
Table 1 – similarly, “Gosan, South Korea”. Also, from the style guide, “Coordinates need a degree sign and a space when naming the direction (e.g. 30° N, 25° E).”
Supplement:
Figure S1 – “10-m winds”
Table S1 – from the style guide, “Spaces must be included between number and unit (e.g. 1 %, 1 m).”
Table S5 – “Most important” would perhaps be a better heading than “Most importance”. Also, for consistency, pressure in “Surface pressure” should be either always capitalized or never.
Section [S]4.3 “MLP Plots” doesn’t seem to have anything inside it, as Section 4.4 starts immediately after. Perhaps Section 4.4 should be Section 4.3.1?
Citation: https://doi.org/10.5194/egusphere-2025-4137-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,985 | 182 | 33 | 2,200 | 82 | 42 | 42 |
- HTML: 1,985
- PDF: 182
- XML: 33
- Total: 2,200
- Supplement: 82
- BibTeX: 42
- EndNote: 42
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1