Preprints
https://doi.org/10.5194/egusphere-2025-4650
https://doi.org/10.5194/egusphere-2025-4650
05 Nov 2025
 | 05 Nov 2025
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Cause-effect discovery in Hydrometeorological Systems: Evaluation of Causal Discovery methods

Vivek Kumar Yadav, Murray Peel, Keirnan Fowler, Dongryeol Ryu, and Bramha Dutt Vishwakarma

Abstract. Identifying the driver(s) of a process or phenomenon is central to understanding and predicting its future state. In complex hydrometeorological systems, a process can have multiple drivers dynamically coupled to the system across timescales. Thus, a robust method to identify drivers is imperative. In hydrological sciences, methods like multivariate regression and, more recently, Big Data machine-learning approaches rely on finding a co-relation between variables, rather than identifying cause-effect relations. This study evaluates cause-effect discovery (Causal Discovery or CD) algorithms in hydrometeorological systems. Although earlier studies have made important contributions to exploring CD methods, they have primarily focused on bivariate methods in simple synthetic environments. Specifically, we evaluate the following four theoretically distinct multivariate CD algorithms, (i) TCDF (ii) VARLiNGAM, (iii) PCMCI+, and (iv) DYNOTEARS. We evaluate these algorithms within a large, complex simulated environment of the Global Land Data Assimilation System (GLDAS) where the drivers, reference truth, are known perfectly. We evaluate the drivers identified by CD methods against this reference truth and also contrast its results with the widely used method of co-relation identification, Pearson’s Correlation Coefficient (PCC). The results show that CD methods identify fewer false drivers compared to PCC, across a range of Köppen-Geiger climate types. For example, PCC failed to distinguish true drivers from instantaneous and lagged cross-correlations, typically present in hydrometeorological systems. Whereas, CD methods eliminate a higher number of false instantaneous and lagged drivers. Thus, though PCC identifies the highest number of true drivers, it suffers from high false drivers. Overall, CD methods perform similar to or better than PCC, while PCMCI+ and DYNOTEARS performed the best. Further, we test whether time-series prediction models perform better when predictors are limited to those identified as causal by CD methods. Evaluation of surface soil moisture predictions during drought shows that CD-based models outperform PCC-based models and are more parsimonious. Thus, we demonstrate the effectiveness of using causal discovery to eliminate spurious relations and obtain a robust set of drivers for prediction and process understanding across different climate conditions. This study overviews, demonstrates and tests efficacy of CD methods in studying cause-effect relations in hydrometeorological systems. By exposing their capabilities and differences in a simulated environment, we hope to encourage their use in the real world and move beyond co-relation.

Competing interests: Kerinan Fowler is a members of the editorial board of journal Hydrology for Earth System Science.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Vivek Kumar Yadav, Murray Peel, Keirnan Fowler, Dongryeol Ryu, and Bramha Dutt Vishwakarma

Status: open (until 17 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Vivek Kumar Yadav, Murray Peel, Keirnan Fowler, Dongryeol Ryu, and Bramha Dutt Vishwakarma
Vivek Kumar Yadav, Murray Peel, Keirnan Fowler, Dongryeol Ryu, and Bramha Dutt Vishwakarma
Metrics will be available soon.
Latest update: 05 Nov 2025
Download
Short summary
Identifying drivers is crucial for process understanding and predictions. In Hydrometeorological systems, many variables are closely related, and common methods often rely on correlation. We describe theoretically distinct methods of discovering cause-effect relations from data. We evaluate them in a large simulated environment. Results show that finding cause-effect relations provides a parsimonious picture and to obtain robust predictions, especially under changing environmental conditions.
Share