the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Transformer-based agent model of GEOS-Chem v14.2.2 for informative prediction of PM2.5 and O3 levels to future emission scenarios: TGEOS v1.0
Abstract. Efficient and informative air quality modeling in future emission scenarios is vital for effective formulation of emission reduction policies. Traditional chemical transport models (CTMs) struggle with the computational demands required for timely predictions. While advanced response surface models (RSMs) were proposed and offered much faster estimates than CTMs, they fall short in providing comprehensive estimates of future air quality due to their simplistic and inflexible structural frameworks. Additionally, current RSMs often have difficulty simultaneously accounting for varying emission variables and the effects of regional transport, which limits their applicability and undermines prediction accuracy. In this study, an informative future air quality prediction model "TGEOS v1.0" based on the Transformer framework is developed as an efficient GEOS-Chem agent model. TGEOS is able to swiftly and accurately conduct online predictions of probability distributions for PM2.5 and O3 concentrations under future emission scenarios and capture potential extreme pollution events. The model incorporates sectoral emissions of up to 26 distinct species as well as the impacts of regional emissions and meteorology on pollutant concentrations, enhancing its versatility and predictive accuracy. The spatial and probability distributions predicted by TGEOS are in good agreement with GEOS-Chem, with the correlation coefficients for PM2.5 and O3 exceed 0.97 and 0.96, respectively. Notably, TGEOS achieves remarkable computational efficiency, executing one-year predictions in approximately 2.51 seconds. Compared with other machine learning models, TGEOS based on Transformer framework showcases superior performance, underscoring the potential of the Transformer framework in air quality modeling.
- Preprint
(16555 KB) - Metadata XML
-
Supplement
(19204 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-2186 - No compliance with the policy of the journal', Juan Antonio Añel, 22 Jun 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlIn the "Code and Data Availability" statement of your manuscript, you have not included the information on the repositories that contain the data that you use to produce your manuscript, namely the GEOS-Chem output data, the training datasets (multi-scenario datasets) and the data used for the validation of your models.
Therefore, the current situation with your manuscript is irregular, as we can not accept manuscripts in Discussions that do not comply with our policy. Please, publish your data in one of the appropriate repositories according to our policy and reply as soon as possible to this comment with a modified 'Code and Data Availability' section for your manuscript, which must include the relevant information (link and handle or DOI) of the new repositories, and which you should include in a potentially reviewed manuscript.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-2186-CEC1 -
AC1: 'Reply on CEC1', Jianbing Jin, 24 Jun 2025
Dear Editor,
We created a new item under link https://zenodo.org/records/15717908 (doi:10.5281/zenodo.15717908) that is open for public. It stores all the training and validation dataset from GEOS-Chem output. We would also update this link in the manuscript in the next round of revision.
With this we hope that we have satisfied all requirements.Jianbing Jin
Citation: https://doi.org/10.5194/egusphere-2025-2186-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
Dear authors,
Many thanks for your quick reply, and your willingness to comply with the policy of the journal. We can consider now the current version of your manuscript in compliance with the policy.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2186-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
-
AC1: 'Reply on CEC1', Jianbing Jin, 24 Jun 2025
-
RC1: 'Comment on egusphere-2025-2186', Anonymous Referee #1, 27 Jun 2025
This paper describes the development of a Transformer-based model, TGEOS v1.0, which serves as a proxy for the GEOS-Chem chemical transport model to represent future air quality under different emission scenarios. The model is trained using multi-scenario emission inventories and meteorological data to estimate PM2.5 and O3 concentrations. It takes high-dimensional sectoral emissions and spatial information as input features for each grid cell, and outputs twelve statistical indicators. Substantial revisions are needed for the paper to fully realize its potential and to deliver clearer scientific contributions.
Major comments:
1. My primary concern lies in the lack of appropriate model comparisons and insufficient clarity regarding model performance. Specifically, I would like to understand why the authors chose to compare their model against Multi-Layer Perceptron (MLP) and Random Forest (RF), rather than Convolutional Neural Networks (CNNs), given that they mentioned DeepRSM (Xing et al., 2020) earlier in the text but did not include it in their evaluation. Although the authors state that a series of (hyper)parameter tuning experiments were conducted (L407), it remains unclear how these were performed. For example, using 300 trees and a maximum depth of 25 in the RF model can easily lead to overfitting. This raises the question of whether these baseline models were properly tuned, which may have contributed to their underperformance. Furthermore, while the authors report low RMSE and MAE of the TGEOS v1.0, it is unclear what constitutes “low” in this context. Additional comparisons with values reported in other relevant studies would strengthen the claims of model performance.
2. I am also concerned about the authors’ treatment and definition of RSMs. In L63–68, RSMs are introduced as a statistical method for “emission-concentration” estimates, but later DeepRSM is mentioned, and then the authors refer to machine learning being effective for “emission-concentration” modeling without relying on RSM (L111-114). It would be helpful if the authors could clarify what exactly qualifies as an RSM in this context. Does a model need to be derived from CTM outputs to be considered an RSM? And if the goal is simply to emulate CTM outputs, wouldn’t it be more appropriate to use the broader and more inclusive term “emulator”?
3. The manuscript does not sufficiently demonstrate the advantages of using the Transformer architecture, nor does it clearly describe the model’s structure. Beyond the comparative limitations mentioned in point 1, the authors do not provide any analysis of computational complexity between Transformers and CNNs. Instead, they state that CNN “may increase the demand for computational resources, especially when addressing considerable features” (Lines 208–219), without supporting this claim with the data. Additionally, the input to a Transformer is typically a sequence of tokens, yet the authors refer to them as “channels,” which may cause confusion. In image-based applications, Vision Transformers (ViTs) have shown strong performance, and it is unclear why the authors didn’t explore spatial structures through CNNs or ViTs.
Minor comments:
L81 (“to startup the model”): “Startup” is not typically used as a verb in this context.
L167-168 (“fine-tuning experiments”): I understand that the authors are referring to “fine-tuning experiments” in the context of data assimilation (Text S1), but in a machine learning/AI context, the term “fine-tuning” is typically associated with pretraining followed by fine-tuning, which may cause confusion for readers.
L175-176 (“we set the maximum value of each coefficient matrix to 2.0”): Any justifications?
L181 (“GEOS-Chem chemistry transport model”): It should be chemical transport model (based on the definition https://geoschem.github.io/overview.html).
L195 (“8 key meteorological parameters”): In Table 2, it says “(2) 9 meteorological parameters.”
L213 (“an informative prediction model”): I am not entirely sure how the authors define the term “informative,” which appears multiple times throughout the manuscript, including in the title and abstract. It would be helpful to clarify what is meant by “informative” in this context.
L215: The authors mention that the number of features is 1045 (Table 2), yet the number of input channels is reported as 1048. It would be helpful to clarify how this inconsistency is handled in practice.
L230: Figure 1 should be improved to provide more detailed information (e.g., Transformer Module).
Dataset and methodology: The manuscript does not provide basic information such as hyperparameter tuning procedures and sample size.
Results and discussions: The model evaluation primarily relies on R2 and MAE. It is recommended to include metrics that assess the model’s ability to predict extreme values, such as precision and recall for exceedance events.
Citation: https://doi.org/10.5194/egusphere-2025-2186-RC1 - AC2: 'Reply on RC1', Jianbing Jin, 08 Sep 2025
-
RC2: 'Comment on egusphere-2025-2186', Anonymous Referee #2, 29 Jul 2025
This paper presents TGEOS v1.0, a Transformer-based surrogate model for efficient GEOS-Chem emulation, achieving rapid PM₂.₅/O₃ predictions under emission scenarios with strong correlations (>0.96) to benchmark simulations. While demonstrating computational efficiency and high accuracy, the study suffers from critical limitations: misaligned literature framing unrelated to methodology, unrealistic fixed-meteorology design preventing climate robustness validation, insufficient training-test separation, and overstated distribution reconstruction from six statistical indicators. Major revision is essential to address these foundational concerns.
Major Comments:
- The extensive critique of Response Surface Models (RSM) in Sections 1 appears disconnected from the proposed Transformer-based TGEOS framework. While RSMs rely on empirical statistical approximations to reduce dimensionality, TGEOS operates as a pure deep learning emulator that directly maps high-dimensional inputs to outputs. Thus, positioning TGEOS as addressing core RSM challenges misrepresents its paradigm. The review should focus on deep learning emulator challenges and explicitly contextualize innovations against relevant works like NN-CTM (Huang et al., 2021). Crucially, benchmarking against only architecturally inferior models (RF/MLP) – rather than comparable deep learning approaches like CNN-based Deep-RSM (Xing et al., 2020) or NN-CTM – undermines claims of Transformer superiority.
- The exclusive use of 2017 MERRA-2 meteorology across all 36 emission scenarios creates critical limitations. (1) Artificial performance inflation: Model validation (Section 3) only tests emission sensitivity under identical meteorological conditions, ignoring O₃'s established sensitivity to temperature/radiation. This likely overstates accuracy for real-world applications where meteorology co-varies. (2) Unverified generalizability: No experiments challenge the model with meteorological variability (e.g., heatwaves), leaving robustness under climate fluctuations untested. (3) Neglect of emission-climate feedbacks: The abstract positions TGEOS for "future emission scenarios", yet fixed meteorology cannot capture feedbacks like emission-driven aerosol-radiation interactions affecting O₃. Given the study's policy-assessment ambitions, this design flaw is critical. Cross-meteorological sensitivity tests should quantify key indicator fluctuations to establish operational reliability.
- The dataset design introduces potential leakage between training (DPEC-SSP SSP1/4/5 + DPEC-CA + tuning) and testing (DPEC-SSP SSP2/3) sets. (1) Structural homology: All scenarios derive from the DPEC framework, sharing inherent inventory structures, sectoral mappings, and spatial patterns. (2) Meteorological invariance: Identical 2017 meteorology further constrains emission-concentration mapping diversity. (3) Unverified independence: The sole qualitative comparison (otp2030 vs. SSP2_2050) lacks quantitative validation. No analysis demonstrates statistical separability between DPEC-SSP, DPEC-CA, and tuning scenarios. Sampling only three years (2030/2040/2050) within these correlated DPEC trajectories risks distributional overlap. This undermines claims of rigorous holdout testing, especially given the high R² values (0.96+) that may reflect dataset artifacts rather than true generalizability.
- The claim that TGEOS predicts "probability distributions" is inconsistent with its methodology. (1) Temporal distribution gap: The model outputs six statistical indicators (e.g., monthly percentiles) per grid cell. However, no validation confirms these reconstruct temporal distributions at individual locations. (2) Spatial vs. temporal conflation: Section 3.3 analyzes spatial probability distributions aggregated across regions, which fundamentally differ from the temporal distributions implied by Section 3.2's grid-level statistics. This conceptual ambiguity obscures what "probability distribution" signifies in results. While the six indicators efficiently summarize central tendency and spread, presenting them as full probability distributions overstates methodological capabilities without empirical proof of distributional accuracy at the intended spatiotemporal scale.
Minor Comments:
- Lines 8: The term "online predictions" is ambiguous and potentially misleading. The claimed probability distributions are reconstructed from six statistical indicators rather than dynamically generated in real-time.
- Lines 12: The interpretation of correlation coefficients (0.97 for PM₂.₅, 0.96 for O₃) is unclear. Specify whether these values represent spatial correlation between models, probability distribution accuracy, or overall model performance metrics.
- Line 56: The 350-hour computational benchmark lacks critical context. Specify whether this duration includes: (a) Standalone nested-domain simulation (0.5°×0.625° over China); (b) Coupled global (2°×2.5°) + nested simulations (0.5°×0.625° over China); (c) Hardware specifications (CPU/GPU model, and software)
- Line 102: The term "middle-scale region" requires quantitative definition.
- Table 1: The relationship between DPEC-SSP (socioeconomic pathways) and DPEC-CA (policy scenarios) remains unexplained. Justify scenario combinations' scientific relevance to climate modeling objectives. Expand all acronyms (e.g., SSP1-5, SSP1-26-BHE control, early_peak-net_zero-clean_air control) in table/footnotes.
- Lines 167-172: Explain how MEIC-2017-based perturbations enhance generalizability for 2030-2050 predictions. Address potential biases from applying contemporary (2017) emission factors to distant future scenarios (13-33 year gap).
- Text S1 (Line 16): Correct "Fig ??" with the appropriate figure identifier. Verify all figure citations for accuracy.
- Lines 173-179: Clarify why DPEC-SSP/CA emissions were scaled using 2017 MEIC ratios rather than used directly. Define the "five sectors" referenced. Justify setting coefficient maximums to 2.0. Quantify the percentage of coefficients exceeding this threshold and discuss sensitivity to alternative values (e.g., 1.5 or 2.5). Specify what constitutes the "original inventory."
- Line 184-185: Report the spin-up time for the global GEOS-Chem simulations providing boundary conditions.
- Lines 185-187: Elaborate how MERRA-2 meteorology was integrated into GEOS-Chem. Confirm whether meteorology was prescribed identically across all 36 emission scenarios. Discuss the limitation of using static 2017 meteorology for future scenarios, as it ignores potential meteorology-emission feedbacks and climate variability, particularly for ozone sensitivity.
- Lines 190-191: Justify the use of different emission inventories for China (MEIC) and other regions (CEDS). Address potential inconsistencies in source sectors, speciation, or spatial/temporal resolution between inventories.
- Lines 192-203: Describe GEOS-Chem inputs/outputs (emissions, meteorology, concentrations) in Section 2.1, and TGEOS training inputs/outputs in Section 2.2. Specify the spatiotemporal resolution of all TGEOS input features (emissions, meteorology) and output targets. Clarify if training is grid-cell-based. If so, justify why only the 8 nearest neighbors are sufficient to represent regional transport. List the "8 key meteorological parameters" explicitly. State that "dust components were excluded" means PM2.5 predictions exclude dust aerosols—highlight that this deviates from standard PM2.5 definitions and significantly impacts regions like Northwest China.
- Lines 231-232: The claim that "pollutants generally conform to either a standard normal or skewed distribution" lacks validation.
- Lines 239-242: Justify: (1) Why SSP1/SSP5 were excluded despite representing critical low/high-emission pathways; (2) Whether "low/high" refers to base year or future projections.
- Line 267-268: Define the quantitative metric used to identify otp2030 as the scenario "most similar" to SSP2_2050.
- Line 305-315: Explain why a single fixed initial condition (from the "background scenario") was used for all GEOS-Chem simulations despite varying emissions. This likely introduces errors, especially when simulated concentrations diverge significantly from initial states. Clarify why SSP3 concentrations align better with this initial state than SSP2.
- Lines 319-320: Supplement Figures with equivalent spatial maps of the original GEOS-Chem simulated seasonal indicators for direct comparison with TGEOS predictions.
- Line 342-343: Detail the probability distribution fitting procedure: Specify the distribution type fitted to the regional data. Clarify the data used: Is the PDF based on daily concentrations across all grid cells within a region over a month? List exactly which of the 12 indicators were used as distribution parameters.
- Lines 344-345: Define the geographical boundaries for NCP, YRD, FWP, and SCB regions.
- Lines 364-367: The statement "O₃ concentrations are relatively less influenced by emissions" due to meteorology dominance is misleading. Precursor emissions (NOₓ, VOCs) critically influence O₃ formation. The core limitation is the use of identical 2017 meteorology for all scenarios, preventing assessment of emission impacts under varying meteorology.
- Line 369: The notation "a2 to d2" lacks corresponding figure identification.
- Line 370: Define "high-emission samples". Quantify how many scenarios/samples represent high emissions within the training dataset.
- Line 372-373: The attribution of O₃ underestimation in SSP2_2050 to high precursor emissions (ALK4, ALK5, TOLU) appears inconsistent. If elevated emissions cause this systemic bias, why is a similar or stronger underestimation not observed under the even higher emissions of SSP3_2050 (Fig. S11)?
- Lines 384-385: Justify comparing extreme event probability changes to the "base scenario" (presumably 2017). Since meteorology is identical (2017), changes are solely emission-driven—state this explicitly to clarify the comparison's purpose.
- Line 385: The notation "(b1) and (c1)" lacks corresponding figure identification.
- Line 437: Provide hardware specifications for the 2.51-second/year prediction benchmark (e.g., CPU/GPU model, and software).
- Ensure consistent formatting throughout: Use subscripts for chemical species (e.g., O₃, PM₂.₅) and superscripts for statistical terms (e.g., R²). Thoroughly check all text, figures, and tables.
Citation: https://doi.org/10.5194/egusphere-2025-2186-RC2 - AC3: 'Reply on RC2', Jianbing Jin, 08 Sep 2025
- AC4: 'Reply on RC2', Jianbing Jin, 08 Sep 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
786 | 111 | 18 | 915 | 28 | 14 | 22 |
- HTML: 786
- PDF: 111
- XML: 18
- Total: 915
- Supplement: 28
- BibTeX: 14
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1