the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Applications of Machine Learning and Artificial Intelligence in Tropospheric Ozone Research
Abstract. Machine learning (ML) is transforming atmospheric chemistry, offering powerful tools to address challenges in tropospheric ozone research, a critical area for climate resilience and public health. As in adjacent fields, ML approaches complement existing research by learning patterns from ever-increasing volumes of atmospheric and environmental data relevant to ozone. We highlight the rapid progress made in the field since Phase 1 of the Tropospheric Ozone Assessment Report, focussing particularly on the most active areas of research, namely short-term ozone forecasting, emulation of atmospheric chemistry and the use of remote sensing for ozone estimation. Despite these advances, many challenges in the field remain, including the quality of data, benchmarks, and limited model generalisation and explainability. This review provides a comprehensive synthesis of recent advancements, highlights critical challenges, and proposes actionable pathways to further advance ML applications in ozone research. Achieving this potential will require close collaborations across atmospheric chemistry, ML and computational science, aimed at addressing key challenges such as the development of global benchmark datasets and robust, explainable models.
Competing interests: Some authors are involved in editorial work for Copernicus Journals, and ORC is an editor for the TOAR2 special issue
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(2180 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3739', Anonymous Referee #1, 02 Apr 2025
General comments
This is a comprehensive, well-written review written by experts in the field, and clearly deserves to be published in GMD. It will be useful for the customary purposes of a review paper (e.g. giving active participants in the reviewed field and in closely related fields entry points into the literature, providing nice summary graphics, and discussing methodologies and challenges that will underlie future research). As someone working on ML applications in a related geophysical field, I find that most of the broader themes in the text (e.g. heterogeneous datasets, end-to-end prediction, issues and challenges having to do with learning wide ranges of space and time scales and lots of correlated predictors, long-term emulator drift, explainability and PINNs, effective benchmarks and intercomparisons, foundation models) would apply just as well to my own area of research.
One thing I look for in a review article is to highlight some crisp, intellectually exciting problems that could launch a new student or postdoc into career-launching research directions. One could glean inspirations from the ‘Future Outlook’ subsections and Section 5 on ‘Challenges and Limitations’ and ‘Future Directions’, but the issues raised there mostly involve large coordinated efforts with a heavy software engineering focus. One could argue that such efforts are the primary path to further progress in ML for tropospheric ozone and related chemistry, but are there also relevant conceptual questions you’d like to highlight that are more accessible to academic researchers?
Specific comments
L181: Reference formatting
L199: Delete ‘so’
L243: What is an ‘NMB’?
L386: What is ‘MDA8’?
Citation: https://doi.org/10.5194/egusphere-2024-3739-RC1 -
RC2: 'Comment on egusphere-2024-3739', Brian Henn, 06 May 2025
Review of Hickman et al., “Applications of Machine Learning and Artificial Intelligence in
Tropospheric Ozone Research”, submitted to GMD.
Brian Henn, Ai2 climate modeling, Seattle, WA
General Comments
The authors provide a perspective piece on the use of machine learning/artificial intelligence (ML/AI) to help with various aspects of prediction and process understanding surrounding atmospheric ozone (O3) concentrations. The authors discuss the chemical processes and scales that control O3 concentrations and then they lay out the various reasons why O3 concentrations are important for human health and other impacts (primarily in the troposphere), and discuss the challenges in the current state of observing and predicting those concentrations. They then focus on three aspects of ML/AI application towards O3 prediction: 1) making short-term predictions for specific ground locations related to health hazard thresholds and operational forecasting programs, 2) predicting O3 within regional and global models, which is currently possible though computationally demanding via chemical transport modules within atmospheric dynamic models, and 3) improving how remotely-sensed information about O3 and its related chemical species can be incorporated into production datasets and forecasts. For each of these sections they discuss recent research and challenges around ML/AI. They end with broad identification of unsolved challenges and possible paths forward for making better use of ML/AI in this field.
Overall, I found the paper to be very comprehensive and well written, and it was useful for me as someone who is not an expert on O3, but who is generally familiar with the challenges of using ML/AI in physical modeling, to see where progress has been made in this particular field. I think that some of the authors’ recommendations, such as the need for a benchmark dataset that the O3 modeling community can agree upon to drive progress in ML/AI forecasts, will be helpful to organize progress.
I found that the framing of the paper is at times unclear, as many of the issues brought up are not specific to O3 modeling, but are instead general to ML/AI and/or its application to physical modeling. This meant that it was often unclear whether the challenges being discussed really were specific to O3 and if not, whether they really are the most important items to mention in this context. Relatedly, the manuscript often repeats general issues in several locations, making it overly long.
Regarding specific sections:
- Section 3.2: I was a bit unclear on the difference between 3.2.1 (ML emulation and reduced order modeling) and 3.2.2 (ML models implemented within global CTMs). Are the studies in the former section essentially doing “offline” ML emulation of model output datasets? And the latter section is “online”? For example, “reduced order modeling” could seem to apply to either situation.
- Section 3.3: One challenge that I did not see discussed in the paper, and would be appropriate to mention here, is how to marry the current success in ML/AI weather and climate model emulation with chemical transport modeling. Thus far ML/AI emulators have largely excluded CTMs from their scope. The current framework of training these emulators also generally is non-extensible, for example it is not possible to add even a conservative tracer to the ML framework that already has been trained on atmospheric dynamics, even though conceptually the ML should be able to “know” about tracer advection. Instead the entire training must be rerun with the tracer among the predicted variables. Do the authors see any approaches for bridging this gap? It could dramatically speed up the inclusion of species like O3 in the predictions made by ML/AI emulators.
- Section 3.3 It is also worth noting that the ML/AI weather forecasting state of the art has moved heavily towards probabilistic/diffusion-based architecture that have the inherent ability to produce sharp forecasts and uncertainty estimates. How does this impact potential for O3 forecasts?
- Sections 5 and 6: I found that these sections repeated points made early rather heavily and could have been more concise. I am also curious if the authors are willing to offer opinions regarding which of the recommendations they find most likely to produce success for the modeling community in the short and medium term. For example, the manuscript lists many possible paths forward, but are all equally important? It seems likely that a benchmark dataset could spur advances in ML/AI skill for O3, and the authors list this as the first suggestion. Are the other suggestions equally likely to produce success, or are they more general suggestions for good practices or things that would be “nice to have”? For example, while XAI and PINNs offer conceptual and process understanding benefits, they lack a track record of success in ML/AI forecasts as compared to deep learning black box approaches. How do the authors feel that these goals should be prioritized?
Specific Comments
(I didn't have time to do a line-by-line read, but noticed a couple items.)- Figure 2: Some explanation of the meaning of the arrows and colors in the lower panel would be helpful.
- TOAR is not properly introduced on first reference, for those not familiar with the acronym
Citation: https://doi.org/10.5194/egusphere-2024-3739-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
634 | 226 | 23 | 883 | 6 | 10 |
- HTML: 634
- PDF: 226
- XML: 23
- Total: 883
- BibTeX: 6
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1