the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Moving beyond post-hoc XAI: Lessons learned from dynamical climate modeling
Abstract. AI models are criticized as being black boxes, potentially subjecting climate science to greater uncertainty. Explainable artificial intelligence (XAI) has been proposed to probe AI models and increase trust. In this Perspective, we suggest that, in addition to using XAI methods, AI researchers in climate science can learn from past successes in the development of physics-based dynamical climate models. Dynamical models are complex but have gained trust because their successes and failures can be attributed to specific components or sub-models, such as when model bias is explained by pointing to a particular parameterization. We propose three types of understanding as a basis to evaluate trust in dynamical and AI models alike: (1) instrumental understanding, which is obtained when a model has passed a functional test; (2) statistical understanding, which is obtained when researchers can make sense of the modelling results using statistical techniques to identify input-output relationships; and (3) Component-level understanding, which refers to modelers’ ability to point to specific model components or parts in the model architecture as the culprit for erratic model behaviors or as the crucial reason why the model functions well. We demonstrate how component-level understanding has been sought and achieved via climate model intercomparison projects over the past several decades. Such component-level of understanding routinely leads to model improvements and may also serve as a template for thinking about AI-driven climate science. Currently, XAI methods can help explain the behaviors of AI models by focusing on the mapping between input and output, thereby increasing the statistical understanding of AI models. Yet, to further increase our understanding of AI models, we will have to build AI models that have interpretable components amenable to component-level understanding. We give recent examples from the AI climate science literature to highlight some recent, albeit limited, successes in achieving component-level understanding and thereby explaining model behaviour. The merit of such interpretable AI models is that they serve as a stronger basis for trust in climate modeling and, by extension, downstream uses of climate model data.
- Preprint
(1002 KB) - Metadata XML
-
Supplement
(482 KB) - BibTeX
- EndNote
Status: closed
- CC1: 'Cross-validation, Symbolic Regression, Pareto include', Paul PUKITE, 16 Feb 2024
-
RC1: 'Comment on egusphere-2023-2969', Julie Jebeile, 06 Jun 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-2969/egusphere-2023-2969-RC1-supplement.pdf
-
RC2: 'Comment on egusphere-2023-2969', Imme Ebert-Uphoff, 12 Jun 2024
Summary of paper:
This manuscript emphasizes the need to develop more interpretable AI models for climate applications, with emphasis on AI models that provide component-level understanding. It points out that (posthoc) XAI methods that are applied after an AI model is built are not the way to go to achieve that goal. Instead, AI models should be built a priori to allow for component-level understanding. Comparisons are drawn to numerical climate models which tend to be built in components, making them easier to debug and interpret.Comments:
I agree with the overall intent of the paper to push toward more interpretable AI models, rather than relying on XAI methods. While I agree with this intent, to me this seems to be a well-known goal and thus I do not see significant new contributions in this manuscript. Let me explain this section by section.
Section 1 argues that relying on applying XAI methods after a model has been built has many drawbacks, and that Instead one should build models that are interpretable (what they call component-level understanding) from the start. However, this point has already been made many times. For example, the highly cited paper by Rudin (2019) (which is also cited in this manuscript) is entitled “Stop Explaining Black Box Models for High Stakes Decisions and Use Interpretable Models Instead”, and makes this point very clearly: whenever possible build interpretable models, rather than relying on applying XAI methods after a model has been built. In the context of weather and climate the argument for interpretable models has been made many times, too, see for example:
- Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A. and Xiong, H., 2024. Interpretable Machine Learning for Weather and Climate Prediction: A Survey. arXiv preprint arXiv:2403.18864.
- Nhu, A.N. and Xie, Y., 2023, November. Towards Inherently Interpretable Deep Learning for Accelerating Scientific Discoveries in Climate Science. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (pp. 1-2).
- Hilburn, K.A., 2023. Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable gremlin. Artificial Intelligence for the Earth Systems, 2(3), p.220093.
Section 2 outlines shortcomings and limitations of XAI methods. It mainly cites a few papers that have studied this topic. I did not find anything new here.
Section 3 states that traditional (numerical) climate models tend to be based on components, which makes it easier to attribute problems to specific components of the model, and that the modularity of these components should be followed by AI models. Firstly, as one of the other reviewers already pointed out, the complex interactions of components in climate models can make it very difficult to attribute problems to individual components, so that reasoning does not always work for traditional climate models either. Secondly, traditional climate models are built in a modular structure because that is the only way humans can built such a complex system – by building one module at a time. Sure, that has other advantages as well – such as higher interpretability – but it wasn’t the main reason. In contrast, modern AI tools are not naturally built on modularity, so it takes considerable effort to try to enforce modularity, especially for very complex tasks. Thus, in essence, I agree that it would be nice for AI models to be modular, but it might not always be possible.
Section 4 provides examples of three papers that are supposed to show how “component-level understanding” can be achieved for AI. How exactly is “component-level understanding” defined? Does it mean that we need to understand ONE component of the AI model? Or ALL components? If it’s just one component – which it seems to be in several examples – then how is this different from physics-guided machine learning, which is an entire field? See for example:
- Willard, J., Jia, X., Xu, S., Steinbach, M. and Kumar, V., 2020. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), pp.1-34.
Also, there are many, many other examples from climate science that could have been cited, and I did not find any new ideas here.
Section 5 recommends striving for component-level understanding. It’s a good idea to strive for more modular AI architectures whenever possible, but I did not learn anything new here about how that could be achieved. I would also argue that we should focus on the more general topic of achieving interpretability, whether that is achieved through modularity or other means, such as feature engineering and/or symbolic regression.
Review summary: While I agree with the general idea that we should strive to make AI models more interpretable, unfortunately, I do not see any convincing new ideas here.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC2 -
RC3: 'Comment on egusphere-2023-2969', Yumin Liu, 29 Jun 2024
Summary:
The authors hold the opinion that the artificial intelligence (AI) models should gain trust in the climate science community as the physics-based dynamical climate models do.
They proposed three types of understanding as a basis to evaluate trust in dynamical and AI models alike: instrumental understanding, statistical understanding and component-level understanding. The instrumental understanding is defined as knowing a model performed well (or not), or knowing its error rate on a given test. Statistical understanding is defined as being able to offer a reason why we should trust a given machine learning model by appealing to input-output mappings which can be retrieved by statistical techniques, and the component-level understanding refers to being able to point to specific model components or parts in the model architecture as the cause of erratic model behaviors or as the crucial reason why the model functions well. And they further argue that the currently Explainable artificial intelligence (XAI) models are only helping in increasing statistical understanding and hence not sufficient. They argue that the component understanding is essential for models to gain trust and propose for AI models to have interpretable components that are amenable to component-level understanding. Then the authors demonstrate some examples to support the arguments that XAI models only provide statistical understanding; dynamical climate models provide component understanding and finally AI models can (and should) have component understanding as well.
Overall comment:
The paper is clear and easy to understand with good writing. And it tries to address ML/AI model explainability which is a very important topic in climate science (or in any other areas), and argues that we should improve the explainability of AI models. However, I don’t think this paper provides a comprehensive or innovative approach to achieve this goal. It seems to me that this paper proposes a concept that already exists in the common practice in the community. The key argument of the paper is to advocate for the ‘component level understanding’ which is essentially finding out which part of the model is not working, and tweaking or adjusting that part until it works. It is quite common for researchers to have some intuition or expectation on the functionality of each component in model architecture when they design a ML/AI model (including models applied to climate). Therefore the model naturally will have some ‘component level’ understanding albeit sometimes not explicitly decoupled. If the authors of the paper are arguing for the more explicitly decoupled or independent component for the model, I think they may need more concrete examples to illustrate the concept in the ML/AI models besides the current examples in the paper.
Session comment:
In the following sections of the paper the authors take some examples to illustrate the three types of understandings. In section two the authors explain how an XAI method utilizes saliency map in convolutional neural networks to examine the input / output mapping and achieve statistical understanding, but they argue that XAI methods have limitation of not being able to distinguish between correlation and causation.
In session three the authors take other examples to argue that dynamical models, on the other hand, have component understanding. The examples include fixing errors in the Atmospheric Model Intercomparison Project by identifying and fixing ocean heat transport, and two more examples of updating parameterization help improve model performance and achieve component understanding. The example in this session is from 30 years ago and climate science has advanced a lot since then. It would be better to provide a more recent example.
In session four the authors give three examples to claim that the AI models can achieve component understanding by either intentional model architecture design or finding interpretable model components. However, these examples are not persuasive enough to support the claims, especially for example three, which is in fact an ablation study, and XAI methods can also utilize this mechanism.
In session five the authors further advocate for component level understanding and argue that the XAI methods can be complementary to the component level understanding.
Throughout the paper, the examples are briefly explained in plain text without detailed information or rigorous numbers to support their arguments. There are only two figures in the paper and they do not help much in explaining the contents in the text.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC3 - AC1: 'Comment on egusphere-2023-2969', Ryan O'Loughlin, 07 Sep 2024
Status: closed
- CC1: 'Cross-validation, Symbolic Regression, Pareto include', Paul PUKITE, 16 Feb 2024
-
RC1: 'Comment on egusphere-2023-2969', Julie Jebeile, 06 Jun 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-2969/egusphere-2023-2969-RC1-supplement.pdf
-
RC2: 'Comment on egusphere-2023-2969', Imme Ebert-Uphoff, 12 Jun 2024
Summary of paper:
This manuscript emphasizes the need to develop more interpretable AI models for climate applications, with emphasis on AI models that provide component-level understanding. It points out that (posthoc) XAI methods that are applied after an AI model is built are not the way to go to achieve that goal. Instead, AI models should be built a priori to allow for component-level understanding. Comparisons are drawn to numerical climate models which tend to be built in components, making them easier to debug and interpret.Comments:
I agree with the overall intent of the paper to push toward more interpretable AI models, rather than relying on XAI methods. While I agree with this intent, to me this seems to be a well-known goal and thus I do not see significant new contributions in this manuscript. Let me explain this section by section.
Section 1 argues that relying on applying XAI methods after a model has been built has many drawbacks, and that Instead one should build models that are interpretable (what they call component-level understanding) from the start. However, this point has already been made many times. For example, the highly cited paper by Rudin (2019) (which is also cited in this manuscript) is entitled “Stop Explaining Black Box Models for High Stakes Decisions and Use Interpretable Models Instead”, and makes this point very clearly: whenever possible build interpretable models, rather than relying on applying XAI methods after a model has been built. In the context of weather and climate the argument for interpretable models has been made many times, too, see for example:
- Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A. and Xiong, H., 2024. Interpretable Machine Learning for Weather and Climate Prediction: A Survey. arXiv preprint arXiv:2403.18864.
- Nhu, A.N. and Xie, Y., 2023, November. Towards Inherently Interpretable Deep Learning for Accelerating Scientific Discoveries in Climate Science. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (pp. 1-2).
- Hilburn, K.A., 2023. Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable gremlin. Artificial Intelligence for the Earth Systems, 2(3), p.220093.
Section 2 outlines shortcomings and limitations of XAI methods. It mainly cites a few papers that have studied this topic. I did not find anything new here.
Section 3 states that traditional (numerical) climate models tend to be based on components, which makes it easier to attribute problems to specific components of the model, and that the modularity of these components should be followed by AI models. Firstly, as one of the other reviewers already pointed out, the complex interactions of components in climate models can make it very difficult to attribute problems to individual components, so that reasoning does not always work for traditional climate models either. Secondly, traditional climate models are built in a modular structure because that is the only way humans can built such a complex system – by building one module at a time. Sure, that has other advantages as well – such as higher interpretability – but it wasn’t the main reason. In contrast, modern AI tools are not naturally built on modularity, so it takes considerable effort to try to enforce modularity, especially for very complex tasks. Thus, in essence, I agree that it would be nice for AI models to be modular, but it might not always be possible.
Section 4 provides examples of three papers that are supposed to show how “component-level understanding” can be achieved for AI. How exactly is “component-level understanding” defined? Does it mean that we need to understand ONE component of the AI model? Or ALL components? If it’s just one component – which it seems to be in several examples – then how is this different from physics-guided machine learning, which is an entire field? See for example:
- Willard, J., Jia, X., Xu, S., Steinbach, M. and Kumar, V., 2020. Integrating physics-based modeling with machine learning: A survey. arXiv preprint arXiv:2003.04919, 1(1), pp.1-34.
Also, there are many, many other examples from climate science that could have been cited, and I did not find any new ideas here.
Section 5 recommends striving for component-level understanding. It’s a good idea to strive for more modular AI architectures whenever possible, but I did not learn anything new here about how that could be achieved. I would also argue that we should focus on the more general topic of achieving interpretability, whether that is achieved through modularity or other means, such as feature engineering and/or symbolic regression.
Review summary: While I agree with the general idea that we should strive to make AI models more interpretable, unfortunately, I do not see any convincing new ideas here.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC2 -
RC3: 'Comment on egusphere-2023-2969', Yumin Liu, 29 Jun 2024
Summary:
The authors hold the opinion that the artificial intelligence (AI) models should gain trust in the climate science community as the physics-based dynamical climate models do.
They proposed three types of understanding as a basis to evaluate trust in dynamical and AI models alike: instrumental understanding, statistical understanding and component-level understanding. The instrumental understanding is defined as knowing a model performed well (or not), or knowing its error rate on a given test. Statistical understanding is defined as being able to offer a reason why we should trust a given machine learning model by appealing to input-output mappings which can be retrieved by statistical techniques, and the component-level understanding refers to being able to point to specific model components or parts in the model architecture as the cause of erratic model behaviors or as the crucial reason why the model functions well. And they further argue that the currently Explainable artificial intelligence (XAI) models are only helping in increasing statistical understanding and hence not sufficient. They argue that the component understanding is essential for models to gain trust and propose for AI models to have interpretable components that are amenable to component-level understanding. Then the authors demonstrate some examples to support the arguments that XAI models only provide statistical understanding; dynamical climate models provide component understanding and finally AI models can (and should) have component understanding as well.
Overall comment:
The paper is clear and easy to understand with good writing. And it tries to address ML/AI model explainability which is a very important topic in climate science (or in any other areas), and argues that we should improve the explainability of AI models. However, I don’t think this paper provides a comprehensive or innovative approach to achieve this goal. It seems to me that this paper proposes a concept that already exists in the common practice in the community. The key argument of the paper is to advocate for the ‘component level understanding’ which is essentially finding out which part of the model is not working, and tweaking or adjusting that part until it works. It is quite common for researchers to have some intuition or expectation on the functionality of each component in model architecture when they design a ML/AI model (including models applied to climate). Therefore the model naturally will have some ‘component level’ understanding albeit sometimes not explicitly decoupled. If the authors of the paper are arguing for the more explicitly decoupled or independent component for the model, I think they may need more concrete examples to illustrate the concept in the ML/AI models besides the current examples in the paper.
Session comment:
In the following sections of the paper the authors take some examples to illustrate the three types of understandings. In section two the authors explain how an XAI method utilizes saliency map in convolutional neural networks to examine the input / output mapping and achieve statistical understanding, but they argue that XAI methods have limitation of not being able to distinguish between correlation and causation.
In session three the authors take other examples to argue that dynamical models, on the other hand, have component understanding. The examples include fixing errors in the Atmospheric Model Intercomparison Project by identifying and fixing ocean heat transport, and two more examples of updating parameterization help improve model performance and achieve component understanding. The example in this session is from 30 years ago and climate science has advanced a lot since then. It would be better to provide a more recent example.
In session four the authors give three examples to claim that the AI models can achieve component understanding by either intentional model architecture design or finding interpretable model components. However, these examples are not persuasive enough to support the claims, especially for example three, which is in fact an ablation study, and XAI methods can also utilize this mechanism.
In session five the authors further advocate for component level understanding and argue that the XAI methods can be complementary to the component level understanding.
Throughout the paper, the examples are briefly explained in plain text without detailed information or rigorous numbers to support their arguments. There are only two figures in the paper and they do not help much in explaining the contents in the text.
Citation: https://doi.org/10.5194/egusphere-2023-2969-RC3 - AC1: 'Comment on egusphere-2023-2969', Ryan O'Loughlin, 07 Sep 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
614 | 164 | 53 | 831 | 33 | 24 | 22 |
- HTML: 614
- PDF: 164
- XML: 53
- Total: 831
- Supplement: 33
- BibTeX: 24
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1