Preprints
https://doi.org/10.5194/egusphere-2025-3006
https://doi.org/10.5194/egusphere-2025-3006
26 Aug 2025
 | 26 Aug 2025
Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

A data-driven method for identifying climate drivers of agricultural yield failure from daily weather data

Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, and Jakob Zscheischler

Abstract. Climate-related impacts, such as agricultural yield failure, often occur in response to a range of specific weather conditions taking place across different time periods, such as during the growing season. Identifying which weather conditions and timings are most strongly associated with a certain impact is difficult because of the overwhelming number of possible predictor combinations from different aggregation periods. Here we address this challenge and introduce a method for identifying a small number of climate drivers of an impact from high-resolution meteorological data. Based on the principle that causal drivers should generalize across different environments, our proposed two-stage approach systematically generates, tests, and discards candidate features using machine learning and then generates a set of robust drivers. We evaluate the method using simulated US maize yield data from two process-based global gridded crop models and rigorous out-of-sample testing (using approximately 30 years of early 20th-century climate and yield data for training and over 70 years of subsequent data for testing). The climate drivers identified align with crop model mechanisms and consistently use only the weather variables that are taken as input by the respective models. Logistic regression models using ten drivers as predictors show strong predictive performance on the held-out test period even under shifting climatic conditions, achieving correlations of 0.70–0.85 between predicted and true annual proportions of grid cells experiencing yield failure. This approach circumvents the limitations of post-hoc interpretability in black-box machine learning models, allowing researchers to use parsimonious statistical models to explore relationships between climate and impacts, while still harnessing the predictive power of high-resolution, multivariate weather data. We demonstrate this method in the context of agricultural yield failure, but it is also applicable for studying other climate-related impacts such as forest die-off, wildfire incidents, landslides, or flooding.

Competing interests: One of the (co-)authors, Christoph Müller, is a member of the editorial board of Geoscientific Model Development.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, and Jakob Zscheischler

Status: open (until 21 Oct 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, and Jakob Zscheischler
Lily-belle Sweet, Christoph Müller, Jonas Jägermeyr, and Jakob Zscheischler

Viewed

Total article views: 866 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
838 26 2 866 6 7
  • HTML: 838
  • PDF: 26
  • XML: 2
  • Total: 866
  • BibTeX: 6
  • EndNote: 7
Views and downloads (calculated since 26 Aug 2025)
Cumulative views and downloads (calculated since 26 Aug 2025)

Viewed (geographical distribution)

Total article views: 866 (including HTML, PDF, and XML) Thereof 866 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 16 Sep 2025
Download
Short summary
This study presents a method to identify climate drivers of an impact, such as agricultural yield failure, from high-resolution weather data. The approach systematically generates, selects and combines predictors that generalise across different environments. Tested on crop model simulations, the identified drivers are used to create parsimonious models that achieve high predictive performance over long time horizons, offering a more interpretable alternative to black-box models.
Share