Causal deep learning models for studying the Earth system: soil moisture-precipitation coupling in ERA5 data across Europe
Abstract. The Earth system is a complex non-linear dynamical system. Despite decades of research, many processes and relations between Earth system variables are still poorly understood. Current approaches for studying relations in the Earth system may be broadly divided into approaches based on numerical simulations and statistical approaches. However, there are several inherent limitations to current approaches that are, for example, high computational costs, reliance on the correct representation of relations in numerical models, strong assumptions related to linearity or locality, and the fallacy of correlation and causality.
Here, we propose a novel methodology combining deep learning (DL) and principles of causality research in an attempt to overcome these limitations. The methodology combines the recent idea of training and analyzing DL models to gain new scientific insights in the relations between input and target variables with a theorem from causality research. This theorem states that a statistical model may learn the causal impact of an input variable on a target variable if suitable additional input variables are included. As an illustrative example, we apply the methodology to study soil moisture-precipitation coupling in ERA5 climate reanalysis data across Europe. We demonstrate that, harnessing the great power and flexibility of DL models, the proposed methodology may yield new scientific insights into complex, nonlinear and non-local coupling mechanisms in the Earth system.
Tobias Tesch et al.
Tobias Tesch et al.
Model code and software
Causal deep learning models for studying the Earth system: soil moisture-precipitation coupling in ERA5 data across Europe - Software Code https://doi.org/10.5281/zenodo.6385040
Tobias Tesch et al.
Viewed (geographical distribution)
This paper was intended to "propose a novel methodology combining deep learning (DL) and principles of causality research". However, I do not believe it does so. It reiterates a standard theorem from causal models describing a causally sufficient set for some node X of a probabilistic graphical model. Then the authors claim to choose carefully such a set. If it were possible to do so apriori, there would be no confounding and no need for the causality formalism. After choosing this set, the interpolation of the joint probability distribution with a neural network follows standard practice. Since there is no real use of the mathematical formalism of causality, this cannot justify publication. Moreover, since "An extensive discussion of our results on soil moisture-precipitation coupling in terms of physical processes (e.g. Seneviratne et al., 2010; Santanello et al., 2018) and a comparison with results from other studies (e.g. Seneviratne et al., 2010; Taylor et al., 2012; Guillod et al., 2015; Tuttle and Salvucci, 2016; Imamovic et al., 2017) are postponed to a second paper", no new physical results are presented. Thus I recommend that the paper be rejected, and the authors submit a paper with the new physical insights included.
In the paper itself, some claims could be better supported by evidence. The authors claim that simulations are always more expensive than their deep learning scheme, but no data is provided. Simulations at what resolution? Is the cost of DNN training included? More nuance here would be helpful. Derivatives calculated from the DNN solution are used to quantify sensitivities and errors, but how accurate are these estimates? On page 17 , the authors stat that "In our example, the null hypothesis was rejected at a confidence level of 99 %", however it is later stated that only two samples were taken. This seems misleading at best. Clarification of what is meant by the 99% confidence level in this case would be very helpful.