Encoding-dependent verdicts and H_1 miscalibration in ensemble persistent homology of cyclone trajectories
Abstract. Ensemble topological data analysis (TDA) on multi-storm cyclone trajectories has been proposed as a tool to detect coherent perturbations — such as those associated with extreme geomagnetic events — that may not register in any single-storm intensity time series. We construct three principled longitude encodings for the cyclone point cloud (linear modular, unit-sphere, and lat-linear-plus-longitude-circle cylinder) and apply the same ensemble-TDA pipeline to the same data: all storms whose lifetime overlaps the ±15-day peak window of Halloween 2003, St Patrick’s Day 2015, or Gannon 2024 (25 event storms; 1,020 calendar-matched controls). Three findings emerge. First, the D1 pool H_1 permutation p-value depends on encoding choice in a way that flips the qualitative verdict: perm-p_H1 = 0.130 (linear), 0.009 (unit-sphere), 0.214 (cylinder), on identical event and control sets. Second, a 49-placebo calibration returns an H_0 false-positive rate close to the nominal 5 % in all three encodings (8.2 %, 4.1 %, 6.1 %), but an H_1 false-positive rate that is consistently above nominal in all three (8.2 %, 10.2 %, 14.3 %; directional consistency across encodings, not individually significant at n = 49 — the cylinder rate sits at the Wilson 95 % upper bound) — H_1 calibration is inflated by a factor 1.6×–2.9× regardless of encoding choice. Third, this dimension asymmetry between H_0 (calibrated) and H_1 (inflated) recurs in stratified attribution across two basin cells and three single-event cells, and in subsample-size sensitivity tests at N ∈ {400, 600, 800, 1000}. We interpret the pattern as observational evidence that ensemble TDA on cyclone-trajectory data carries an intrinsic H_1 inflation that is not curable by lon-encoding choice alone. The solar-perturbation hypothesis cannot be tested through this pipeline until an encoding-invariant H_1-calibrated protocol is in place. We frame the contribution as a methodological cautionary tale: pooled point-cloud TDA on geospatial trajectories with periodic coordinates is more fragile than its formal stability theorems suggest. We propose a minimum protocol of multi-encoding agreement testing plus dimension-resolved placebo calibration as a precondition for any positive ensemble-TDA claim in this class of problems.