the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-driven discovery and model reduction methods for the atmospheric effects of high altitude emissions
Abstract. Chemistry transport models play a crucial role in the evaluation of the effect of anthropogenic emissions on the atmosphere and climate, but they come with high computational costs and require specialized know-how. This renders them impractical for applications in multidisciplinary optimisation, or regulatory and operational-decision making processes where environmental effects are to be considered. Such applications require computationally efficient surrogate models of the complex chemistry transport models. Here we investigate the use of data-driven discovery and reduced-order modelling methods for this purpose. Specifically, we examine the dynamic mode decomposition (DMD) and proper orthogonal decomposition coupled with the sparse identification of non-linear dynamics (POD-SINDy). We evaluate their ability to reconstruct and forecast changes in the distribution of ozone in response to the introduction of supersonic aircraft as modelled by the GEOS-Chem chemistry transport model. Of the tested methods, we find that optimized DMD and bagging optimized DMD perform best. These methods can reconstruct and forecast full-atmospheric ozone responses for up to several years without losing stability, at smaller errors than estimates using the spatio-temporal mean of the data. On average, the optimized DMD method reduces the reconstruction error by 55.2 % and that of forecasting by 19.4 %. For the bagging optimized DMD these reductions are 40.3 % and 7.9 %, respectively. The resulting change in global ozone column, calculated from the reconstructed atmospheres, has an error smaller than 10 %. This is achieved while reducing the computational and storage requirements by several orders of magnitude, which may be a worthwhile tradeoff for some applications.
- Preprint
(3116 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-2661', Hannes Bruder, 24 Nov 2025
-
RC2: 'Comment on egusphere-2025-2661', Anonymous Referee #2, 27 Nov 2025
1) Scientific significance
The manuscript presents an application of reduced-order modelling (DMD variants and POD-SINDy) to complex chemistry transport models (GEOS-Chem).
- However, the introduction should reference Linear Inverse Modeling (LIM) to better contextualize the approach within existing DMD-based atmospheric science methods.
- The discussion must explicitly address the physical implications of modeling this system via DMD/POD-SINDy, perhaps regarding the stationarity of the time series and whether properties like physical conservation laws (e.g., ozone mass conservation) are obeyed.
2) Scientific quality
The approach is valid, but specific methodological choices lack justification.
- The authors should explain why only “low-order polynomial terms” were selected for the SINDy library (line 365).
- Additionally, were different time lags and time delay embeddings tested?
- The hyperparameter search involving “50,000 variations” appears excessive; a heuristic approach or clearer explanation of the grid search density is needed.
- Additionally, could the authors kindly justify why BOPDMD requires more data than the other DMD methods (line 520).
- Please justify how using the derivative for modeling SINDy make it “susceptible to noise”(ine 150).
- Is it possible to constrain the eigenvalues in BOPDMD to offer yet another DMD variant?
3) Scientific reproducibility
Would the authors kindly release the code used for the experiments in this paper? This is an essential step to ensuring scientific reproducibility. Some smaller comments include:
- Why are 22 degrees North selected in (line 225).
- What are the drawbacks of flattening latitude and altitude dimensions?
- Would the authors consider verifying the numerical stability of the DMD models using their eigenvalues?
4) Presentation quality
The work is well presented and organized. There are only some small comments.
- Notation requires correction; Equation 1 should use boldface for vectors and arguably represent a model rather than just data structure.
- The naming convention “OptDMD-C” is confusing (implies control rather than constraints). Perhaps use “C-OptDMD?”
- Would the authors kindly be consistent with using (-) in (e.g., 60% vs -40% and -10%). (line 270)
Citation: https://doi.org/10.5194/egusphere-2025-2661-RC2 -
CEC1: 'Comment on egusphere-2025-2661 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Dec 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have not shared your code and part of the data for your submitted manuscript. In the Code and Data Availability section you state "The data and code supporting this work are publicly available on a 4TU.Researchdata repository, to be minted upon acceptance of this manuscript." However, this is something we do not accept. Our policy is very clear regarding the fact that all the code and data must be made public and without restrictions to access it before submitting a a manuscript. Therefore, the current situation with your manuscript is irregular. Please, publish your code and data without restrictions and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-2661-CEC1 -
AC1: 'Reply on CEC1', Irene Dedoussi, 10 Dec 2025
Thank you for your comment. Please see attached.
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
Dear authors,
Many thanks for addressing this issue. The lack of availability of the pySINDy library and the changes produced by using a new version of it, are two of the main reasons for the existence of the code and data policy, and its need to avoid compromising the replicability of manuscripts submitted to and published by GMD. It is great to hear that you have detected the problem, and worked to solve it.
We can consider now the current version of your manuscript in compliance with the policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-2661-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 10 Dec 2025
-
AC1: 'Reply on CEC1', Irene Dedoussi, 10 Dec 2025
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 193 | 61 | 25 | 279 | 16 | 15 |
- HTML: 193
- PDF: 61
- XML: 25
- Total: 279
- BibTeX: 16
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
### General comments ###
The paper analyses different methods to develop surrogate models for the effect of different supersonic aviation scenarios on the global ozone concentration based on heavy chemistry transport model computations. The core idea of the paper to use surrogate models for atmospheric modelling to enable an integration of environmental effects into processes, where extensive computations are infeasible, like iterative design optimizations, is very relevant. Testing DMD- and POD-SINDy-based approaches on existing datasets for the exemplary case of ozone concentration changes can support scientific progress in the field and help fellow researchers in the design of future studies and on the application of the examined approaches, even though the choice and suitability of DMD and POD-SINDy for the aimed application seems questionable.
In general, the paper gives a good overview over the used methods and describes and discusses the results well structured, ordered and clear. In some parts there is a focus on subparts of the study, leaving other parts of similar importance behind or lacking a summary of the remaining parts. In addition, some assumptions and basic settings chosen lack fully convincing explanations.
### Specific comments ###
In the choice of the study setup as a hyperparameter search picking the median performing model as the “result” the mentioned point becomes the clearest.
Doing a hyperparameter search is clearly necessary due to a lack in best-practice settings, as mentioned in Appendix A. Therefore, it is recommended on page 24, to do such a search for new, similar studies with another dataset. Now the question comes up, why this study uses only the median model of the hyperparameter search as a representative result, if new studies should also do a hyperparameter search and could apparently choose their hyperparameters optimal according to the search. Therefore, the models with optimal hyperparameters at least for the reconstruction data should be taken as reference or be discussed as well.
As an explanation for the chosen median model, the paper says on page 18, that prospective users will not have access to future data to assess the forecasting performance and thereby they will not be able to find the optimal forecast model. It is correct, that there will be no reference data to validate against in many cases, but still to my understanding the forecasting quality can be assed just as done in this study, by excluding e.g. the last one and a half years of the existing data (parallel to test/train split in ML). Then the settings of the model with the best forecasting performance could be used to train a new model based on the full dataset and I would expect a comparable forecasting quality for the first one and a half future years. And even if no split into a train/test dataset is possible dues to a lack of data, still a hyperparameter search for the reconstruction data could lead to a far better model, than just the median. I would find it interesting to have a look at such options. Never the less, the comparison of the different approaches based on the median model still gives a good overview over their performance and characteristics, even though not showing the full potential.
As the goal of this study you mention the usability in applications like MDO. In my understanding the models you created and the methods you use show very limited usability in this field, as for MDO the surrogate model needs to be able to do predictions based on changed inputs (e.g. changed propulsion emissions, changed AC geometry, changed routes). The models you created are able to reconstruct data from chemistry transport models and do extrapolations/forecasts based in the data, but do not feature the possibility to input the mentioned changes into the model. Therefore, it seems questionable, if the goal of this study can be reached with the chosen methods.
Section 4.2 focuses in the TAC204 dataset. It does not become clear why particularly TAC204 is chosen, as the global scenarios are assumed to be more complex and thereby the more critical scenario to look at. The findings from the other datasets are not mentioned and not summarized, leaving the question, if their results are mostly the same as for TAC204 or not.
The first part of the results section (4.1) focuses on the interpretation of the surrogate models with their different spatial modes. Here a more detailed qualitative look into the single atmospheric dynamics covered and also those not covered would be great in my eyes, even though it is mentioned, that through the combination of effects a direct attribution is challenging. But e.g. understanding which dynamics are not covered, could help in the assessment of the capabilities of DMD and POD-SINDy for future studies (just a few examples). A summary of these interpretations could also be included in the discussion section.
The goal of the study is according to the introduction the creation and examination of models, that can be used for MDO, etc. In the conclusion the step back to the level of looking at the application of the models and the implications arising from that is missing. Are models of the quality you found usable for MDO or is the quality most likely insufficient or only usable for the main huge effects?
In the conclusion section the quality of the models is mostly evaluated based on reconstruction results, but not the forecast, which is at least equally relevant for many applications. Please include forecast results here as well.
There are two points I see as interesting for future studies and I am interested in what you think about them or whether you are planning to continue your research into that direction.
First is, if you think some kind if informed DMD could lead to reasonable improvements, because one could feed physics and additional information into the models. The second one you already mentioned in your discussion (page 18). You wrote that you do only use linear and quadratic terms for POD-SINDy and better results might be achieved with further terms. I want to emphasize this point, as the complex dynamics of the atmosphere might not be easily depicted by first and second order polynomials. I would be very interested in seeing the results considering more complex terms.
### Technical corrections ###
- p.1, l.12: “… DMD method reduces …”: Reduces compared to what?
- p.2, l.27: “… to integrate their environmental …”: The reference of “their” is not fully clear.
- p.3, l.64: “… 8 Tg …”: 8 Tg of what - CO2/NOx/Total?
- p.5, Figure 2: Unit of z-axis missing.
- p.6, l. 122: “… spatial and temporal coefficients …”: In the sentence before you wrote spatial mode and temporal coefficient. Please be precise and consistent in the naming.
- p.6, l.125: “… to be forecast, …”: I think it should be “forecasted” here.
- p.6, l.144: “... sequential grid search …”: I did not find the parameters for the grid search in the paper. It might be helpful to see which parameters and which options you used.
- p.7, l.158: The acronym DMD is introduced a second time.
- p.9, Figure 3 – caption: “… OptDMD-C methods (left) …”: Should be “method” only; “… shown as as pairs …”: Delete one as.
- p.14, Table 2 - caption: Introduce acronyms R & F.
- p.16, l.318 & l.328: In both lines, one for reconstruction, one for forecast, 55.1% are better than mean. Is this maybe a copy and paste error?
- p.19, l.399, l.403: “… small errors …”, “… good performance …”: What does small/good mean here. Please concretize.
- p.19, l.404: “… suitable …”: Please concretize, suitable for what and why.
- p.23, l.509: “… no clear best practice with regards to these parameters, …”: Please include reference/source.
- p.25 Figure A2 – caption: “… over the DMD rank, …”: Copy and paste error, should be “over the fitting dataset length”.
- Figure A1, A2, A3: Why do you not include the results from all datasets?
- p.27, Table B1 - caption: Introduce acronyms R & F.
- p.29, Figure C2: Marker for reconstruction is circle in figure, but cross in legend.