Preprints
https://doi.org/10.5194/egusphere-2026-2975
https://doi.org/10.5194/egusphere-2026-2975
23 Jun 2026
 | 23 Jun 2026
Status: this preprint is open for discussion and under review for Nonlinear Processes in Geophysics (NPG).

Encoding-dependent verdicts and H_1 miscalibration in ensemble persistent homology of cyclone trajectories

Rongzhen Dai

Abstract. Ensemble topological data analysis (TDA) on multi-storm cyclone trajectories has been proposed as a tool to detect coherent perturbations — such as those associated with extreme geomagnetic events — that may not register in any single-storm intensity time series. We construct three principled longitude encodings for the cyclone point cloud (linear modular, unit-sphere, and lat-linear-plus-longitude-circle cylinder) and apply the same ensemble-TDA pipeline to the same data: all storms whose lifetime overlaps the ±15-day peak window of Halloween 2003, St Patrick’s Day 2015, or Gannon 2024 (25 event storms; 1,020 calendar-matched controls). Three findings emerge. First, the D1 pool H_1 permutation p-value depends on encoding choice in a way that flips the qualitative verdict: perm-p_H1 = 0.130 (linear), 0.009 (unit-sphere), 0.214 (cylinder), on identical event and control sets. Second, a 49-placebo calibration returns an H_0 false-positive rate close to the nominal 5 % in all three encodings (8.2 %, 4.1 %, 6.1 %), but an H_1 false-positive rate that is consistently above nominal in all three (8.2 %, 10.2 %, 14.3 %; directional consistency across encodings, not individually significant at n = 49 — the cylinder rate sits at the Wilson 95 % upper bound) — H_1 calibration is inflated by a factor 1.6×–2.9× regardless of encoding choice. Third, this dimension asymmetry between H_0 (calibrated) and H_1 (inflated) recurs in stratified attribution across two basin cells and three single-event cells, and in subsample-size sensitivity tests at N ∈ {400, 600, 800, 1000}. We interpret the pattern as observational evidence that ensemble TDA on cyclone-trajectory data carries an intrinsic H_1 inflation that is not curable by lon-encoding choice alone. The solar-perturbation hypothesis cannot be tested through this pipeline until an encoding-invariant H_1-calibrated protocol is in place. We frame the contribution as a methodological cautionary tale: pooled point-cloud TDA on geospatial trajectories with periodic coordinates is more fragile than its formal stability theorems suggest. We propose a minimum protocol of multi-encoding agreement testing plus dimension-resolved placebo calibration as a precondition for any positive ensemble-TDA claim in this class of problems.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Rongzhen Dai

Status: open (until 18 Aug 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Rongzhen Dai
Metrics will be available soon.
Latest update: 23 Jun 2026
Download
Short summary
We tested whether persistent homology reliably detects signals in hurricane track data. Applying three valid coordinate systems to the same storms, the method's conclusions flip between "signal" and "no signal." A calibration test confirmed systematic false-positive inflation. We propose a validation checklist before this method is trusted for hurricane analysis.
Share