the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
TOAD v1.0: A Python Framework for Detecting Abrupt Shifts and Coherent Spatial Domains in Earth-System Data
Abstract. Large-scale, nonlinear, abrupt, and potentially irreversible transitions in major Earth-system components are becoming increasingly likely under human pressures, with far-reaching consequences for ecosystems, climate stability, and human societies. Yet detecting and comparing such transitions across Earth System Model ensembles remains fragmented and inconsistent, hindering systematic assessment of tipping-point risks.
Here we present the first release of the Tipping and Other Abrupt events Detector (TOAD v1.0), an open-source, user-oriented Python framework for detecting abrupt changes in gridded Earth-system data. TOAD implements a modular three-stage pipeline consisting of 1) grid-level abrupt shift detection, 2) spatio-temporal clustering of co-occurring changes, and 3) consensus synthesis to identify statistically robust regions across ensemble members, variables, models, or methodological configurations and quantifies agreement in transition timing. The framework addresses key practical challenges of large-scale spatio-temporal clustering on geographic grids and provides diagnostic statistics and visualisation tools. Detection, clustering, and synthesis algorithms can be flexibly exchanged, supporting systematic method comparison and extensibility. TOAD functions as a data-introspection tool that reveals potentially tipping-relevant dynamics across spatial and temporal scales for subsequent, process-based analysis.
We apply TOAD to a synthetic benchmark, domain models of the Antarctic Ice Sheet and the global terrestrial biosphere, and a global Earth System Model ensemble of the North Atlantic Subpolar Gyre. Together, these demonstrations illustrate TOAD's applicability across diverse systems and establish a structured foundation for investigating where and when potentially tipping-relevant changes occur and for quantifying associated uncertainties, supporting coordinated assessment efforts such as the Tipping Points Modelling Intercomparison Project (TIPMIP).
- Preprint
(8034 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-356', Anonymous Referee #1, 26 Feb 2026
-
RC2: 'Comment on egusphere-2026-356', Anonymous Referee #2, 25 Mar 2026
This paper presents TOAD v1.0, an open-source Python framework for detecting abrupt shifts in gridded Earth-system data. The work is well-motivated, clearly written, and addresses a genuine gap in the tipping-point research community: the absence of a modular, reproducible, and scalable pipeline for detecting and comparing abrupt changes across heterogeneous model ensembles. The timing is particularly relevant given the forthcoming TIPMIP intercomparison effort. The four examples span a useful range of Earth-system components, providing proof-of-concept for the framework's versatility. The authors are transparent about TOAD's limitations and scope. However, these strengths are currently offset by methodological and performance assessment gaps that would need to be addressed before the paper is suitable for publication in GMD.
As it stands, I have concerns about whether the paper fits within the scope of GMD. I could see it fitting within the new methods category, but such contributions require demonstrating equivalent or superior performance relative to existing approaches. The examples, while illustrative, do not constitute a rigorous evaluation of TOAD's performance. Additionally, some methodological choices warrant justification: the detection method and choices of parameters are not sufficiently justified, and the sensitivity of results to these parameters remains untested. A number of more specific concerns are raised below.
Specific comments:
Method validation: The authors are upfront that the examples are proof-of-concept rather than comprehensive analyses, which is appropriate for a methods paper. However, beyond the synthetic case (Section 3.1), there is no quantitative evaluation of detection accuracy or false positive/negative rates under controlled conditions. The synthetic test is limited to a simple two-region sigmoid shift on a white-noise background, with a large magnitude change, which is far simpler than real Earth-system data. The authors mention a forthcoming benchmark suite (Röhrich et al., forthcoming) but this does not substitute for at least some basic performance assessment in the present paper. I recommend that the authors expand the synthetic data to include more realistic scenarios (e.g., autocorrelation, background trends, multiple shifts).
Choice of default detection algorithm: The choice of ASDETECT as the default shift detection method is surprising given the rich literature on changepoint detection methods rooted in time series analysis. Methods such as PELT (Killick et al., 2012), EnvCpt (Beaulieu and Killick, 2018), strucchange (Zeileis et al., 2002) or the BFAST framework (Verbesselt et al., 2010) just to name a few, offer statistically principled approaches to changepoint detection with well-characterised false positive/negative rates. While the authors justify the choice of ASDETECT on the grounds of speed, robustness to noise, and not requiring a reference time series, these advantages are not systematically demonstrated relative to existing detection methods. Given that TOAD is explicitly designed for use in high-stakes intercomparison efforts such as TIPMIP, the choice of default detection algorithm deserves stronger justification. The authors should either provide a comparative evaluation of ASDETECT against at least one theoretically grounded alternative, or more explicitly acknowledge that the choice of detection method is a consequential one that users should carefully consider for their specific application — particularly since the modular architecture already supports the substitution of alternative detection methods.
Parameter sensitivity: TOAD's outputs are acknowledged to be sensitive to shift thresholds, consensus thresholds, temporal weighting γ, and clustering parameters. The paper provides limited quantitative guidance on how to choose these parameters in practice beyond the recommended threshold of |dts| ≥ 0.5 for ASDETECT. For instance, how should γ be chosen for different systems? The demonstrations use values of γ ranging from 1.0 to 3.0 across the four cases with limited justification. A more systematic discussion — perhaps a sensitivity analysis for one of the demonstration cases — would significantly improve the practical utility of the paper and help users avoid arbitrary parameter choices.
Temporal information loss in consensus clustering: The authors acknowledge that collapsing the temporal dimension before computing co-association means consensus clusters identify spatial agreement without requiring temporal synchronisation. It means that two grid cells could be assigned to the same consensus cluster even if their detected shifts occur decades apart, potentially merging physically unrelated transitions. The authors note this in Section 2.4 but the practical consequences are not illustrated. It would be helpful to show, for at least one demonstration case, the distribution of shift times within consensus clusters to illustrate when this is or is not a concern.
Comparison with existing approaches: The paper situates TOAD relative to edge-detection approaches (Bathiany et al., 2020; Terpstra et al., 2025) but there is no direct comparison of TOAD's output with this approach on the same dataset. Even a qualitative comparison for one of the demonstration would strengthen the paper's claims about TOAD's added value.
Technical comments:
- Line 65: "the default method in TOAD v1.0 being ASDETECT" — consider rephrasing as "the default method in TOAD v1.0 is ASDETECT"
- Line 305: "the data is restricted" — should be "the data are restricted."
- Line 365: "oscillatory or quasi-periodic or chaotic systems" — the repeated "or" is awkward; consider "oscillatory, quasi-periodic, or chaotic systems."
References:
Beaulieu, C. and Killick, R. (2018) Distinguishing trends and shifts from memory in climate data. Journal of Climate, 31(23), 9519–9543. https://doi.org/10.1175/JCLI-D-17-0863.1
Killick, R., Fearnhead, P. and Eckley, I.A. (2012) Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598. https://doi.org/10.1080/01621459.2012.737745
Verbesselt, J., Hyndman, R., Zeileis, A. and Culvenor, D. (2010) Phenological change detection while accounting for abrupt and gradual trends in satellite image time series. Remote Sensing of Environment, 114(12), 2970–2980. https://doi.org/10.1016/j.rse.2010.08.003
Zeileis, A., Leisch, F., Hornik, K. and Kleiber, C. (2002) strucchange: An R package for testing for structural change in linear regression models. Journal of Statistical Software, 7(2), 1–38. https://doi.org/10.18637/jss.v007.i02
Citation: https://doi.org/10.5194/egusphere-2026-356-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 444 | 229 | 24 | 697 | 21 | 31 |
- HTML: 444
- PDF: 229
- XML: 24
- Total: 697
- BibTeX: 21
- EndNote: 31
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The paper by Harteg et al “A Python framework for detecting abrupt shifts and coherent spatial domains in Earth-System data” offers a software package that combines previously published techniques. I have several concerns about this manuscript and its software.
Since there is no research novelty in the paper, I think it is more suitable for a software magazine, such as The Journal of Open Source Software
https://joss.theoj.org/
If the authors would like to target specifically geophysical community, the choice of the experiments should be different. There should be extensive demonstration of performance on artificial data, with clear explanation of the difference between a generic tipping and abrupt change points. In particular, abrupt changes cannot be detected using early warning indicators, which means that the proposed TOAD package should be compared with equivalent packages of change point detection, on the same datasets. See the R package by Killick
https://cran.r-project.org/web/packages/changepoint/index.html
package by James et al
https://cran.r-project.org/web/packages/ecp/index.html
spectral change point package in Python
https://github.com/Lucew/changepoynt
and others.
A new software is a welcome addition when it performs the same or better compared with already existing packages, and the paper should demonstrate this on the same datasets.
The authors consider several modelled datasets. Can they add a 2D change point of an observed dataset? In particular, it would be interesting to know how the technique performs on observed short datasets that are currently available.
The blocks “how technique works” should be prepared in the format of pseudocode.
Section 3.2 should be called “Antarctic ice sheet model data”. In this section, it is not clear how model years were generated. Similarly, amend the title of section 3.3 to state it clearly that it is a modelled example.
In section 3.3., the number of parameters is rather large, and like this tuning detection of changes may become rather arbitrary – what tool can be introduced for optimised choice of parameter values?
It the caption of figure 5, the authors mention that only negative shifts were considered. If positive shifts are excluded, it would be interesting to know the number of those – how the model performs in both direction is indicative of its accuracy.
I did not attempt to install and run the package due to the lack of time, but I had a look at the GitHub. I note that there are other TOAD packages on GitHub (and this toad package is not easy to find, as it is not indexed by Google):
https://github.com/batrachianai/toad
https://github.com/gianwario/TOAD
https://github.com/amphibian-dev/toad
I am not sure if the acronym is good, especially as it misses the keyword “event”. Reconsider the acronym?