the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enabling Fast Greenhouse Gas Emissions Inference from Satellites with GATES: a Graph-Neural-Network Atmospheric Transport Emulation System
Abstract. Atmospheric observation-based “inverse” greenhouse gas flux estimates are increasingly important to evaluate national inventories, with a dramatic improvement in “top-down” flux inference expected in the coming years due to the rapidly growing number of measurements from space. However, many well-established inverse modelling techniques face significant computational challenges scaling to modern satellite datasets, particularly those that rely on Lagrangian Particle Dispersion Models (LPDM) to simulate atmospheric transport. Here, we introduce GATES (Graph-Neural-Network Atmospheric Transport Emulation System), a data-driven LPDM emulator which outputs source-receptor relationships (“footprints”) using only meteorology and surface data as inputs, approximately 1000x times faster than an LPDM. We demonstrate GATES’s skill in estimating footprints over South America and integrate it into an emissions estimation pipeline, evaluating Brazil’s methane emissions using GOSAT (Greenhouse Gases Observing Satellite) observations for 2016 and 2018 and finding emissions that are consistent in space and time with the physics-driven estimate. This work highlights the potential of machine learning-based emulators like GATES to overcome a key bottleneck in large-scale, satellite-based inverse modeling, accelerating greenhouse gas emissions estimation and enabling timely, improved evaluations of national GHG inventories.
- Preprint
(7539 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-2392 - No compliance with the policy of the journal', Juan Antonio Añel, 28 Jul 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlThe main problem with your manuscript is that you do not make the assets available, for example not publishing them and stating that are available under request or pointing to an email address to get access to the NAME transport model. Also, you have stored your code and data in sites that do not comply with our policy. I must be clear here, we can not accept this.
First, as I said, some of the assets in your manuscript are available only under request. This is totally unacceptable according our policy, and your manuscript should have never been accepted for Discussions given such flagrant lack of compliance. Our policy clearly states that all the code and data necessary to replicate a manuscript must be published openly and freely to anyone before submission.
Second, you link GitHub to get access to several assets in your work. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo, and our policy makes a clear and specific reference to it.
Also, you have stored part of the data in the CEDA, which is not a suitable repository for long-term publication.
Therefore, the current situation with your manuscript is irregular. Please, publish all the code, input and output data used in your manuscript in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2392-CEC1 -
AC1: 'Reply on CEC1', Elena Fillola, 01 Aug 2025
Dear Dr Añel,
We apologise for the issues with the availability of assets and code. We hope that the following information resolves them. The ‘Code and Data Availability’ will be modified accordingly.
The two code repositories mentioned are now stored in Zenodo with their respective DOIs
- elenafillo/GATES_LPDM_emulator (to train and evaluate the model): DOI: 10.5281/zenodo.16679175
- elenafillo/extract_iris_met (to prepare the meteorological inputs): DOI: 10.5281/zenodo.16634862
The meteorology data has been published to operational standards by the Met Office at CEDA, and can be accessed directly through https://data.ceda.ac.uk/badc/name_nwp/data/global, or at https://catalogue.ceda.ac.uk/uuid/2bb4d76ed2fa4fc2af3fbbca6eb80965 and https://catalogue.ceda.ac.uk/uuid/7d0fff9f59b94a3da347e3ae10bd8fc1 .
A sample of the footprints, usable for testing and recreating the results, can be found at DOI 10.5281/zenodo.15497433.
Please do let us know if there are any further changes required to the data and assets.
Best wishes,
The authors
Citation: https://doi.org/10.5194/egusphere-2025-2392-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 01 Aug 2025
Dear authors,
Many thanks for your quick reply to the concerns raised. We acknowledge your effort, for example in indicating the version number of the dependencies of the software in the Zenodo repositories shared.
However, two issues remain. First, you continue to host your data at the CEDA. I must insist that we can not accept this, but you must host them in a trusted repository. If you think that something prevents you of doing it, such as a too large dataset, please, let us know the size, and we will check if we can grant you an exception to our policy.
Also, it is my understanding here that you do not run the NAME model, but only use the NAME outputs hosted at CEDA. If this is the case, then you do not need to include information about how to obtain access to NAME in your Code and Data Availiability section. Please, clarify this issue.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2392-CEC2 -
AC2: 'Reply on CEC2', Elena Fillola, 06 Aug 2025
Dear Dr Añel,
Thank you for your prompt reply.
We understand your concern regarding storage of meteorological files in CEDA. As you mention, the dataset size is the main barrier for storage elsewhere - a single year of the required meteorology, as downloaded directly from CEDA, would occupy 400GB. Once extracted and processed with the code we provided, the files still add up to 250GB per year. Given that in our paper we use four years of meteorology for training and testing, and that Zenodo has a maximum dataset size of 50GB, we hope that you can grant an exception to your policy, to avoid having to upload over 1TB of data to a repository.
Regarding the NAME model, we have stored all the outputs from our runs at https://doi.org/10.5281/zenodo.16748754, and therefore we are happy to remove the information on how to access NAME from our Code and Data Availability section.
Thank you again for your comments, please do let us know if anything else needs addressing.
Best wishes,
The authors
Citation: https://doi.org/10.5194/egusphere-2025-2392-AC2
-
AC2: 'Reply on CEC2', Elena Fillola, 06 Aug 2025
-
AC1: 'Reply on CEC1', Elena Fillola, 01 Aug 2025
-
RC1: 'Comment on egusphere-2025-2392', Anonymous Referee #1, 15 Aug 2025
General comments:
Fillola et al. developed a machine learning method, GATES, to simulate Lagrangian footprints. They demonstrated the efficiency of such an approach compared to the traditional physics-based LPDM models to fast generate footprints and do an emission inference using satellite GHG data. With the increasing amount of GHG data, these ML-based methods could be increasingly useful in this area. Surprisingly, the input data required to do these ML emulations is much less than those required in traditional LPDM footprint simulations. But they can still generate footprints that have an acceptable quality as the physics-based footprints. For example, in the NAME simulations, it requires 15 met variables at 50 levels at over 200 timesteps for 30 days, whereas the ML emulation only needs 12 hours of met information with 7 levels (only 3 levels for wind information). Does this mean that the traditional LPDM footprints may not need to simulate for 30 days? Instead, they could only run footprints backward in time for 12 hours to generate an acceptable-quality of footprints?
It is also unclear to me that, in the traditional LPDM-based inversions, back-trajectories simulated from the LPDM models are often used to estimate background before inversions. But in GATES, they don’t emulate back-trajectories. Does this mean that GATES cannot be used alone for inversions, as they don’t have estimates of back-trajectories, which are generally required to estimate background? If both GATES and NAMES need to be run for inversion applications, this would not save computation time.
Specific comments:
Lines 63 - 64: the satellite observations often got coarsened, no matter whether they are used in Lagrangian-based inversions or Eulerian inversion models.
Lines 79: why 2D? Many LPDM simulations have 3D footprints, including the third dimension in time.
Lines 81 – 83: this is true except for the XSTILT model
Lines 91 – 93: HYSPLIT is pretty well-known and has lots of users as well.
Lines 148 – 160: the subpanel labels seem wrong in this paragraph, as there is no f or g in Fig. 1. Also, Fig. 1 is not easy to understand for non-ML experts on what exactly the model contains or how the model is formulated.
Table 1: how do you determine the 7 and 3 levels? Can you use 6 levels or 10 levels? Also, how do you determine which level to choose?
Lines 224 – 229: Validation: why can’t you tune the model hyperparameters during training periods? Also, why do you need to apply an additional bias correction after training using validation datasets? Does this mean your training did not do a good job?
Line 236: Unclear how you treat footprints with zero values, as the log transform of 0 is negative infinite.
Section 4.3.2: what does the size mean? The number of nodes in each layer in the GNN? How do you decide the number of hidden layers you will need in the model?
Line 257: what does the learning rate of batch size mean? Can you provide more context on this?
Section 4.4.1: if the threshold is set by validation set, does this mean you will need to retrain your data with the test set after setting grid cells with values lower than this threshold 0?
Section 4.4.2: same as my question above, do you need to retrain the data with the test set after bias correction?
Line 280: 1000x faster after training, right?
Line 433: LPDMs, not “LDPMs”.
Citation: https://doi.org/10.5194/egusphere-2025-2392-RC1 -
RC2: 'Comment on egusphere-2025-2392', Anonymous Referee #2, 05 Sep 2025
Review of “Enabling Fast Greenhouse Gas Emissions Inference from Satellites with GATES: a
Graph-Neural-Network Atmospheric Transport Emulation System” by Fillola et al
This manuscript documents the development and application of a new machine-learning system that can rapidly create “footprints” for observations in our atmosphere. Footprints are typically used to find optimal surface sources or sinks of trace gases of interest, based on Bayesian minimization of the difference to observations. The authors of this work have an impressive history in this field, and have demonstrated capacity to create footprints with a Lagrangian Particle Dispersion Model (LPDM) and use them in a self-developed hierarchical inversion system before. Previous work also included a machine-learning (ML) method to quickly create footprints for surface locations, bypassing the LPDM model after training on it.
In this manuscript, the authors have redesigned their ML system now called GATES, to make it better, faster, and more versatile. And rather than just tackling surface observations they also demonstrate its use for satellite data that sample the atmospheric column. They apply their new capacity in a mirrored-inversion setup with the original LPDM footprints and their learned counterparts from GATES, following the design of an earlier similar inversion study over South America.
Overall, this is a very interesting manuscript that sketches the contours of what future satellite-driven emission estimates are going to look like. It is well-written for an audience not familiar with some of the technical details of machine-learning but versed and interested in flux estimates. The authors convincingly show the potential benefits, while not overstating its capacity and interpreting their work within the limits of the current setup.Overall I strongly recommend this paper to be published and I think the wider community will appreciate it highly. I have a number of minor suggestions for improvement, some requests for additional materials, and one major improvement I would like to see before the manuscript is accepted.
I also would like to state for the record that I am not a machine-learning specialist and I found myself unable to review the details of the graph-neural-network approach, nor judge the merits, consequences, or logic of the choices made. I have assumed for the preparation of this review that the ML-approach is fit-for-purpose.
Major improvement
To me, the vertical component of the study is quite novel and an innovation over previous work by this team. The skill to create vertical gradients of trace gases transported from the surface with GATES is something I would like to see demonstrated, especially for a complex area like this. I realize that in this application the footprints are kernel-weighted averages over height to make total column sensitivity, but this is likely to mask errors and dampen the impact of an imprecisely learned LPDM. As I started to think about this dampening due to vertical averaging, I also realized that in many places I am unsure whether I was already looking at weighted footprints, and simulated XCH4 enhancements or at surface footprints and CH₄ mole fractions. An example is Figure 2, but also other places in the text. Adding this information would help.
But also in Section 5.1 and 5.2 I would really like to see back the explicit vertical dimension in your evaluations. Are footprints at 500 hPa learned just as well as those near the surface, or better? What is the skill of GATES at the typical peak sensitivity of GoSAT? If we would construct the vertical profiles of CH₄ *before* collapsing them to an XCH4, would they have differed more between the LPDM and GATES than in XCH4 units (which we are shown in the tables and figures I presume)?
I suggest to include in Section 5.2 the locations where aircraft data are routinely gathered (Gatti network + Manaus), as violin plots of (LPDM-minus-GATES) CH₄ differences with altitude on the y-axis (based on prior fluxes, or on their respective posterior ones). Even better would be a Hovmoller plot with months on the x-axis, and the differences on a colorbar. Such figures would allow the reader to see GATES performance over a much larger spatiotemporal domain than the provided example time series in Fig 2, including the vertical domain.
If the authors can think of a better way to convince me that their vertical reconstruction of a methane profile is good enough, I also would accept that of course. But I would not be satisfied if GATES is only suited for column-averaging, as one could then never trust the fluxes+GATES to reproduce aircraft profiles or other independent datasets we use for assessment.
Request for extra material
As a follow-up on the major request above, I found myself often looking for some more reasoning behind the evaluation metrics that are now provided. The choice of 4 locations and several months does not suggest a wide range of geographical and meteorological circumstances, as the text says. What conditions would one expect to encounter in Amazonia that could affect performance? How did you systematically assess these? A more explicit strategy would be nice to see, also in the metrics presented. I especially find the distinction between dry season and wet season conditions of interest, as both fluxes and footprints are likely to differ substantially, the latter especially in their vertical extent.
In most of the paper you furthermore show the enhancements over background, but the background itself must also be included for each XCH4 prediction. This comes from the CAMS boundary condition, transported with a footprint that traces back to the boundary of the domain. I am unsure after reading if this BC-sensitivity was also trained and reproduced with GATES, and thus part of the challenge/difference? If so, some results and discussion of the performance would be nice. If not, it must be mentioned that this is not part of the evaluation.Thinking about it more, I would say it would be a bit unfair to ignore the BC-transport in this paper that so nicely introduced GATES capacity in an inverse pipeline, and I would really urge the authors to include it in the effort (if not done already), and in the manuscript.
If the authors find it of interest, some extra material to show/explain the difference in posterior uncertainty of the flux would be appreciated. It seems that GATES has some 10% larger errors than when using the LPDM. I’d like to read your thoughts about this, either in the Results or Discussion section.
Finally, in addition to the Data Statement at the end, the work could benefit from adding a paragraph for a prospective user on how to leverage GATES. What would they do? What would they need? What resources can they expect to help them create their own footprints with GATES?
Minor suggestions
Individual minor remarks were left in an annotated PDF.
line 88: Since you already wrote this sentence above when explaining the general principle, I would like to have the actual number in your settings, per height. Is the number of heights fixed?
line 139: not sure what this word signifies here. What is integrated over time?
line 345: Less …data to train…or is the meteorology easier to capture? Or is the vertical structure less complex?
line 349, Table 2: I would appreciate more subsetting of the data to find metrics specific to seasons/heights/meteo situations.
line 389: True, but panel (c) does show a pattern of east-west differences that is of similar magnitude as the flux adjustments made in panel (b) (they even use the same color scale ). If we assume that this pattern is purely due to transport then the large blue-ish patch in western Amazonia could be significant. What is your take on this? Can you at least mention the difference and discuss it?
line 457: Perhaps it was mentioned, but what determines the availability of LPDM footprints for training? Was this set created previously based on GoSAT coverage?
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,043 | 54 | 18 | 1,115 | 12 | 18 |
- HTML: 1,043
- PDF: 54
- XML: 18
- Total: 1,115
- BibTeX: 12
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1