Moving beyond post-hoc XAI: Lessons learned from dynamical climate modeling

O'Loughlin, Ryan; Li, Dan; O'Brien, Travis

doi:https://doi.org/10.5194/egusphere-2023-2969

Preprints

https://doi.org/10.5194/egusphere-2023-2969

Preprints

30 Jan 2024

| 30 Jan 2024

Moving beyond post-hoc XAI: Lessons learned from dynamical climate modeling

Ryan O'Loughlin, Dan Li, and Travis O'Brien

Abstract. AI models are criticized as being black boxes, potentially subjecting climate science to greater uncertainty. Explainable artificial intelligence (XAI) has been proposed to probe AI models and increase trust. In this Perspective, we suggest that, in addition to using XAI methods, AI researchers in climate science can learn from past successes in the development of physics-based dynamical climate models. Dynamical models are complex but have gained trust because their successes and failures can be attributed to specific components or sub-models, such as when model bias is explained by pointing to a particular parameterization. We propose three types of understanding as a basis to evaluate trust in dynamical and AI models alike: (1) instrumental understanding, which is obtained when a model has passed a functional test; (2) statistical understanding, which is obtained when researchers can make sense of the modelling results using statistical techniques to identify input-output relationships; and (3) Component-level understanding, which refers to modelers’ ability to point to specific model components or parts in the model architecture as the culprit for erratic model behaviors or as the crucial reason why the model functions well. We demonstrate how component-level understanding has been sought and achieved via climate model intercomparison projects over the past several decades. Such component-level of understanding routinely leads to model improvements and may also serve as a template for thinking about AI-driven climate science. Currently, XAI methods can help explain the behaviors of AI models by focusing on the mapping between input and output, thereby increasing the statistical understanding of AI models. Yet, to further increase our understanding of AI models, we will have to build AI models that have interpretable components amenable to component-level understanding. We give recent examples from the AI climate science literature to highlight some recent, albeit limited, successes in achieving component-level understanding and thereby explaining model behaviour. The merit of such interpretable AI models is that they serve as a stronger basis for trust in climate modeling and, by extension, downstream uses of climate model data.

Received: 08 Dec 2023 – Discussion started: 30 Jan 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1002 KB)

Supplement (482 KB)

Download & links

Ryan O'Loughlin, Dan Li, and Travis O'Brien

Interactive discussion

Status: closed

CC1: 'Cross-validation, Symbolic Regression, Pareto include', Paul PUKITE, 16 Feb 2024

See attached community review PDF

Citation: https://doi.org/10.5194/egusphere-2023-2969-CC1
RC1: 'Comment on egusphere-2023-2969', Julie Jebeile, 06 Jun 2024

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-2969/egusphere-2023-2969-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-2969-RC1
RC2:
'Comment on egusphere-2023-2969', Imme Ebert-Uphoff, 12 Jun 2024
Summary of paper:

This manuscript emphasizes the need to develop more interpretable AI models for climate applications, with emphasis on AI models that provide component-level understanding. It points out that (posthoc) XAI methods that are applied after an AI model is built are not the way to go to achieve that goal. Instead, AI models should be built a priori to allow for component-level understanding. Comparisons are drawn to numerical climate models which tend to be built in components, making them easier to debug and interpret.
Comments:
I agree with the overall intent of the paper to push toward more interpretable AI models, rather than relying on XAI methods. While I agree with this intent, to me this seems to be a well-known goal and thus I do not see significant new contributions in this manuscript. Let me explain this section by section.
Section 1 argues that relying on applying XAI methods after a model has been built has many drawbacks, and that Instead one should build models that are interpretable (what they call component-level understanding) from the start. However, this point has already been made many times. For example, the highly cited paper by Rudin (2019) (which is also cited in this manuscript) is entitled “Stop Explaining Black Box Models for High Stakes Decisions and Use Interpretable Models Instead”, and makes this point very clearly: whenever possible build interpretable models, rather than relying on applying XAI methods after a model has been built. In the context of weather and climate the argument for interpretable models has been made many times, too, see for example:
Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A. and Xiong, H., 2024. Interpretable Machine Learning for Weather and Climate Prediction: A Survey. arXiv preprint arXiv:2403.18864.

Nhu, A.N. and Xie, Y., 2023, November. Towards Inherently Interpretable Deep Learning for Accelerating Scientific Discoveries in Climate Science. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (pp. 1-2).

Hilburn, K.A., 2023. Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable gremlin. Artificial Intelligence for the Earth Systems, 2(3), p.220093.

Section 2 outlines shortcomings and limitations of XAI methods. It mainly cites a few papers that have studied this topic. I did not find anything new here.
Section 3 states that traditional (numerical) climate models tend to be based on components, which makes it easier to attribute problems to specific components of the model, and that the modularity of these components should be followed by AI models. Firstly, as one of the other reviewers already pointed out, the complex interactions of components in climate models can make it very difficult to attribute problems to individual components, so that reasoning does not always work for traditional climate models either. Secondly, traditional climate models are built in a modular structure because that is the only way humans can built such a complex system – by building one module at a time. Sure, that has other advantages as well – such as higher interpretability – but it wasn’t the main reason. In contrast, modern AI tools are not naturally built on modularity, so it takes considerable effort to try to enforce modularity, especially for very complex tasks. Thus, in essence, I agree that it would be nice for AI models to be modular, but it might not always be possible.
Section 4 provides examples of three papers that are supposed to show how “component-level understanding” can be achieved for AI. How exactly is “component-level understanding” defined? Does it mean that we need to understand ONE component of the AI model? Or ALL components? If it’s just one component – which it seems to be in several examples – then how is this different from physics-guided machine learning, which is an entire field? See for example:
Willard, J., Jia, X., Xu, S., Steinbach, M. and Kumar, V., 2020. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), pp.1-34.

Also, there are many, many other examples from climate science that could have been cited, and I did not find any new ideas here.
Section 5 recommends striving for component-level understanding. It’s a good idea to strive for more modular AI architectures whenever possible, but I did not learn anything new here about how that could be achieved. I would also argue that we should focus on the more general topic of achieving interpretability, whether that is achieved through modularity or other means, such as feature engineering and/or symbolic regression.
Review summary: While I agree with the general idea that we should strive to make AI models more interpretable, unfortunately, I do not see any convincing new ideas here.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC2
RC3: 'Comment on egusphere-2023-2969', Yumin Liu, 29 Jun 2024

Summary:
The authors hold the opinion that the artificial intelligence (AI) models should gain trust in the climate science community as the physics-based dynamical climate models do.
They proposed three types of understanding as a basis to evaluate trust in dynamical and AI models alike: instrumental understanding, statistical understanding and component-level understanding. The instrumental understanding is defined as knowing a model performed well (or not), or knowing its error rate on a given test. Statistical understanding is defined as being able to offer a reason why we should trust a given machine learning model by appealing to input-output mappings which can be retrieved by statistical techniques, and the component-level understanding refers to being able to point to specific model components or parts in the model architecture as the cause of erratic model behaviors or as the crucial reason why the model functions well. And they further argue that the currently Explainable artificial intelligence (XAI) models are only helping in increasing statistical understanding and hence not sufficient. They argue that the component understanding is essential for models to gain trust and propose for AI models to have interpretable components that are amenable to component-level understanding. Then the authors demonstrate some examples to support the arguments that XAI models only provide statistical understanding; dynamical climate models provide component understanding and finally AI models can (and should) have component understanding as well.

Overall comment:
The paper is clear and easy to understand with good writing. And it tries to address ML/AI model explainability which is a very important topic in climate science (or in any other areas), and argues that we should improve the explainability of AI models. However, I don’t think this paper provides a comprehensive or innovative approach to achieve this goal. It seems to me that this paper proposes a concept that already exists in the common practice in the community. The key argument of the paper is to advocate for the ‘component level understanding’ which is essentially finding out which part of the model is not working, and tweaking or adjusting that part until it works. It is quite common for researchers to have some intuition or expectation on the functionality of each component in model architecture when they design a ML/AI model (including models applied to climate). Therefore the model naturally will have some ‘component level’ understanding albeit sometimes not explicitly decoupled. If the authors of the paper are arguing for the more explicitly decoupled or independent component for the model, I think they may need more concrete examples to illustrate the concept in the ML/AI models besides the current examples in the paper.

Session comment:
In the following sections of the paper the authors take some examples to illustrate the three types of understandings. In section two the authors explain how an XAI method utilizes saliency map in convolutional neural networks to examine the input / output mapping and achieve statistical understanding, but they argue that XAI methods have limitation of not being able to distinguish between correlation and causation.

In session three the authors take other examples to argue that dynamical models, on the other hand, have component understanding. The examples include fixing errors in the Atmospheric Model Intercomparison Project by identifying and fixing ocean heat transport, and two more examples of updating parameterization help improve model performance and achieve component understanding. The example in this session is from 30 years ago and climate science has advanced a lot since then. It would be better to provide a more recent example.

In session four the authors give three examples to claim that the AI models can achieve component understanding by either intentional model architecture design or finding interpretable model components. However, these examples are not persuasive enough to support the claims, especially for example three, which is in fact an ablation study, and XAI methods can also utilize this mechanism.

In session five the authors further advocate for component level understanding and argue that the XAI methods can be complementary to the component level understanding.

Throughout the paper, the examples are briefly explained in plain text without detailed information or rigorous numbers to support their arguments. There are only two figures in the paper and they do not help much in explaining the contents in the text.

Citation: https://doi.org/10.5194/egusphere-2023-2969-RC3
AC1: 'Comment on egusphere-2023-2969', Ryan O'Loughlin, 07 Sep 2024

Response to all reviewer comments attached.

Citation: https://doi.org/10.5194/egusphere-2023-2969-AC1

Interactive discussion

Status: closed

CC1: 'Cross-validation, Symbolic Regression, Pareto include', Paul PUKITE, 16 Feb 2024

See attached community review PDF

Citation: https://doi.org/10.5194/egusphere-2023-2969-CC1
RC1: 'Comment on egusphere-2023-2969', Julie Jebeile, 06 Jun 2024

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-2969/egusphere-2023-2969-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-2969-RC1
RC2:
'Comment on egusphere-2023-2969', Imme Ebert-Uphoff, 12 Jun 2024
Summary of paper:

This manuscript emphasizes the need to develop more interpretable AI models for climate applications, with emphasis on AI models that provide component-level understanding. It points out that (posthoc) XAI methods that are applied after an AI model is built are not the way to go to achieve that goal. Instead, AI models should be built a priori to allow for component-level understanding. Comparisons are drawn to numerical climate models which tend to be built in components, making them easier to debug and interpret.
Comments:
I agree with the overall intent of the paper to push toward more interpretable AI models, rather than relying on XAI methods. While I agree with this intent, to me this seems to be a well-known goal and thus I do not see significant new contributions in this manuscript. Let me explain this section by section.
Section 1 argues that relying on applying XAI methods after a model has been built has many drawbacks, and that Instead one should build models that are interpretable (what they call component-level understanding) from the start. However, this point has already been made many times. For example, the highly cited paper by Rudin (2019) (which is also cited in this manuscript) is entitled “Stop Explaining Black Box Models for High Stakes Decisions and Use Interpretable Models Instead”, and makes this point very clearly: whenever possible build interpretable models, rather than relying on applying XAI methods after a model has been built. In the context of weather and climate the argument for interpretable models has been made many times, too, see for example:
Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A. and Xiong, H., 2024. Interpretable Machine Learning for Weather and Climate Prediction: A Survey. arXiv preprint arXiv:2403.18864.

Nhu, A.N. and Xie, Y., 2023, November. Towards Inherently Interpretable Deep Learning for Accelerating Scientific Discoveries in Climate Science. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (pp. 1-2).

Hilburn, K.A., 2023. Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable gremlin. Artificial Intelligence for the Earth Systems, 2(3), p.220093.

Section 2 outlines shortcomings and limitations of XAI methods. It mainly cites a few papers that have studied this topic. I did not find anything new here.
Section 3 states that traditional (numerical) climate models tend to be based on components, which makes it easier to attribute problems to specific components of the model, and that the modularity of these components should be followed by AI models. Firstly, as one of the other reviewers already pointed out, the complex interactions of components in climate models can make it very difficult to attribute problems to individual components, so that reasoning does not always work for traditional climate models either. Secondly, traditional climate models are built in a modular structure because that is the only way humans can built such a complex system – by building one module at a time. Sure, that has other advantages as well – such as higher interpretability – but it wasn’t the main reason. In contrast, modern AI tools are not naturally built on modularity, so it takes considerable effort to try to enforce modularity, especially for very complex tasks. Thus, in essence, I agree that it would be nice for AI models to be modular, but it might not always be possible.
Section 4 provides examples of three papers that are supposed to show how “component-level understanding” can be achieved for AI. How exactly is “component-level understanding” defined? Does it mean that we need to understand ONE component of the AI model? Or ALL components? If it’s just one component – which it seems to be in several examples – then how is this different from physics-guided machine learning, which is an entire field? See for example:
Willard, J., Jia, X., Xu, S., Steinbach, M. and Kumar, V., 2020. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), pp.1-34.

Also, there are many, many other examples from climate science that could have been cited, and I did not find any new ideas here.
Section 5 recommends striving for component-level understanding. It’s a good idea to strive for more modular AI architectures whenever possible, but I did not learn anything new here about how that could be achieved. I would also argue that we should focus on the more general topic of achieving interpretability, whether that is achieved through modularity or other means, such as feature engineering and/or symbolic regression.
Review summary: While I agree with the general idea that we should strive to make AI models more interpretable, unfortunately, I do not see any convincing new ideas here.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC2
RC3: 'Comment on egusphere-2023-2969', Yumin Liu, 29 Jun 2024

Summary:
The authors hold the opinion that the artificial intelligence (AI) models should gain trust in the climate science community as the physics-based dynamical climate models do.
They proposed three types of understanding as a basis to evaluate trust in dynamical and AI models alike: instrumental understanding, statistical understanding and component-level understanding. The instrumental understanding is defined as knowing a model performed well (or not), or knowing its error rate on a given test. Statistical understanding is defined as being able to offer a reason why we should trust a given machine learning model by appealing to input-output mappings which can be retrieved by statistical techniques, and the component-level understanding refers to being able to point to specific model components or parts in the model architecture as the cause of erratic model behaviors or as the crucial reason why the model functions well. And they further argue that the currently Explainable artificial intelligence (XAI) models are only helping in increasing statistical understanding and hence not sufficient. They argue that the component understanding is essential for models to gain trust and propose for AI models to have interpretable components that are amenable to component-level understanding. Then the authors demonstrate some examples to support the arguments that XAI models only provide statistical understanding; dynamical climate models provide component understanding and finally AI models can (and should) have component understanding as well.

Overall comment:
The paper is clear and easy to understand with good writing. And it tries to address ML/AI model explainability which is a very important topic in climate science (or in any other areas), and argues that we should improve the explainability of AI models. However, I don’t think this paper provides a comprehensive or innovative approach to achieve this goal. It seems to me that this paper proposes a concept that already exists in the common practice in the community. The key argument of the paper is to advocate for the ‘component level understanding’ which is essentially finding out which part of the model is not working, and tweaking or adjusting that part until it works. It is quite common for researchers to have some intuition or expectation on the functionality of each component in model architecture when they design a ML/AI model (including models applied to climate). Therefore the model naturally will have some ‘component level’ understanding albeit sometimes not explicitly decoupled. If the authors of the paper are arguing for the more explicitly decoupled or independent component for the model, I think they may need more concrete examples to illustrate the concept in the ML/AI models besides the current examples in the paper.

Session comment:
In the following sections of the paper the authors take some examples to illustrate the three types of understandings. In section two the authors explain how an XAI method utilizes saliency map in convolutional neural networks to examine the input / output mapping and achieve statistical understanding, but they argue that XAI methods have limitation of not being able to distinguish between correlation and causation.

In session three the authors take other examples to argue that dynamical models, on the other hand, have component understanding. The examples include fixing errors in the Atmospheric Model Intercomparison Project by identifying and fixing ocean heat transport, and two more examples of updating parameterization help improve model performance and achieve component understanding. The example in this session is from 30 years ago and climate science has advanced a lot since then. It would be better to provide a more recent example.

In session four the authors give three examples to claim that the AI models can achieve component understanding by either intentional model architecture design or finding interpretable model components. However, these examples are not persuasive enough to support the claims, especially for example three, which is in fact an ablation study, and XAI methods can also utilize this mechanism.

In session five the authors further advocate for component level understanding and argue that the XAI methods can be complementary to the component level understanding.

Throughout the paper, the examples are briefly explained in plain text without detailed information or rigorous numbers to support their arguments. There are only two figures in the paper and they do not help much in explaining the contents in the text.

Citation: https://doi.org/10.5194/egusphere-2023-2969-RC3
AC1: 'Comment on egusphere-2023-2969', Ryan O'Loughlin, 07 Sep 2024

Response to all reviewer comments attached.

Citation: https://doi.org/10.5194/egusphere-2023-2969-AC1

Ryan O'Loughlin, Dan Li, and Travis O'Brien

Supplement

https://doi.org/10.5194/egusphere-2023-2969-supplement

Ryan O'Loughlin, Dan Li, and Travis O'Brien

Viewed

Total article views: 891 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
660	177	54	891	34	26	24

HTML: 660
PDF: 177
XML: 54
Total: 891
Supplement: 34
BibTeX: 26
EndNote: 24

Views and downloads (calculated since 30 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	53	10	1	64
Feb 2024	92	18	5	115
Mar 2024	36	13	2	51
Apr 2024	40	12	7	59
May 2024	33	16	2	51
Jun 2024	97	36	11	144
Jul 2024	44	13	7	64
Aug 2024	41	9	1	51
Sep 2024	84	12	16	112
Oct 2024	39	3	0	42
Nov 2024	43	14	1	58
Dec 2024	18	8	0	26
Jan 2025	33	9	1	43
Feb 2025	7	4	0	11

Cumulative views and downloads (calculated since 30 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	53	10	1	64
Feb 2024	92	18	5	115
Mar 2024	36	13	2	51
Apr 2024	40	12	7	59
May 2024	33	16	2	51
Jun 2024	97	36	11	144
Jul 2024	44	13	7	64
Aug 2024	41	9	1	51
Sep 2024	84	12	16	112
Oct 2024	39	3	0	42
Nov 2024	43	14	1	58
Dec 2024	18	8	0	26
Jan 2025	33	9	1	43
Feb 2025	7	4	0	11

Viewed (geographical distribution)

Total article views: 889 (including HTML, PDF, and XML) Thereof 889 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Feb 2025

This perspective paper examines in detail the concept of explicability in a climate model, whether conventional physics-based dynamical models, or those incorporating components based on machine learning. Everyone with an interest in climate models or their outputs would benefit from understanding the processes by which we can understand the importance and accuracy of these models and the methods by which it is possible to make sense of those outputs. This paper is a major contribution to that understanding. It is also very well written and should be widely read in the field.

Short summary

We draw from traditional climate modeling practices to make recommendations for AI-driven climate science. In particular, we show how component-level understanding–which is obtained when scientists can link model behavior to parts within the overall model–should guide the development and evaluation of AI models. Better understanding can lead to a stronger basis for trust in these models. We highlight several examples to demonstrate.


Total:	0
HTML:	0
PDF:	0
XML:	0