the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Survival analysis for droplet-freezing data: Kaplan–Meier confidence intervals and log-rank tests
Abstract. Droplet‑freezing assays underpin immersion‑mode ice‑nucleation research yet approaches to uncertainty quantification for fraction‑frozen curves and derived active‑site densities (ns(T)) are inconsistent. Further, there is not currently a rigorous method for significance testing the difference between fraction frozen curves. To address these issues, we recast droplet‑freezing measurements as survival data and apply analysis techniques typically used in medical statistics. Using the Kaplan–Meier estimator, we derive nonparametric confidence intervals for droplet fraction frozen and ns(T) without binning or model assumptions, matching Monte‑Carlo and studentized‑bootstrapped intervals on a literature volcanic ash ice nucleation dataset. Confidence intervals calculated for simulated datasets show precision improves with sample size and with steeper fraction frozen curves. Adapting the log-rank test, we introduce a method for comparing fraction frozen curves and demonstrate its application to literature and simulated droplet freezing datasets. We recommend reporting Kaplan–Meier confidence intervals on droplet freezing datasets and employing the log-rank test when comparing droplet fraction frozen curves.
- Preprint
(1111 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-680', Anonymous Referee #1, 10 Apr 2026
-
RC2: 'Comment on egusphere-2026-680', Anonymous Referee #2, 14 Apr 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-680/egusphere-2026-680-RC2-supplement.pdf
-
RC3: 'Comment on egusphere-2026-680', Anonymous Referee #3, 04 May 2026
Summary
The authors recast droplet-freezing data as right-censored survival data, use the Kaplan-Meier estimator to put non-parametric confidence intervals on f(T) and n_s(T), and adapt the log-rank test to compare two fraction-frozen curves. They demonstrate the methods on Fahy et al. (2022), Daily et al. (2022), and on a set of Gaussian-drawn simulations.
I think the basic idea is correct and overdue. Treating droplet freezing as a survival problem is a natural fit, and reaching for an off-the-shelf, well-tested estimator like Kaplan-Meier is preferable to the binning- and Monte-Carlo-based methods currently in use. The fact that KM bands match the studentized bootstrap of Fahy et al. on the volcanic ash data (Fig. 1) is a nice piece of cross-validation.
What concerns me is mostly methodological. The quantitative claims, especially the "around 50 droplets" heuristic and the Type I error labelled in Fig. 4(b), rest on a small number of illustrative simulations drawn from a single Gaussian distribution. The proportional-hazards assumption gets one sentence of acknowledgement and no further treatment, even though it's central to the log-rank test. And the paper does not demonstrate that the 95% KM intervals cover the truth 95% of the time. These gaps should be straightforward to fill with the simulation code the authors presumably already have on hand.
Major comments
- Proportional hazards. The log-rank test really only delivers full power when hazards are proportional, i.e., when the J·A ratio between two samples is constant in T. The manuscript notes this but only flags the case where curves literally cross. In practice the hazard ratio can vary smoothly with T even when the curves don't cross, for example, when two chemically different nucleators share the same A but have different g(T). PH is really only plausible when the two samples differ in the n_s(T) prefactor and share the same hazard shape, which is rarely strictly true. It would be worth saying so.
- Coverage check. The recommendation to report KM bands would be a lot more compelling with an empirical coverage study. If the answer is consistently near 95% the recommendation is on solid ground; if it isn't in some regime, that is itself important to know.
- Type I and power for the log-rank test. Figure 4(b) shows one example of a false positive and labels it "Type I error from sampling variability." A single example doesn't tell anyone what the real false-positive rate is.
- Only Gaussian simulations. Table 1 shows that all the simulated datasets are Gaussian. Real fraction-frozen curves are routinely not and are often long-tailed at the warm end and sometimes bimodal. The "~50 droplets" heuristic in the conclusions should either be tested against some non-Gaussian shapes or qualified as distribution-dependent.
- Quantitative comparison with Fahy et al. Figure 1 shows that the KM bands look similar to the studentized bootstrap. That's good as far as it goes, but if the authors are recommending KM in preference to the existing method then "looks similar" isn't quite enough. A quantitative comparison, say band width at matched temperatures or coverage against a known ground truth, would be straightforward to add and would fall out of the coverage study above.
- The f(T) = 1 patch. Carrying the previous interval forward at the last point is pragmatic but it's an arbitrary choice and warrants either some justification or at least a sensitivity check. Does the choice change any of the conclusions on the ash or birch-pollen data? If not, please say so.
Closing
I think this is a useful methodological contribution and well within scope for the journal. The framing is sound and I expect it to get picked up quickly once it's out. The calibration and assumption-testing points above would, in my view, change a good paper into a reference one. I look forward to the revision.
Citation: https://doi.org/10.5194/egusphere-2026-680-RC3
Interactive computing environment
KM-nucleation-stats Tom Whale https://github.com/TFWhale/KM-nucleation-stats
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 710 | 356 | 73 | 1,139 | 57 | 98 |
- HTML: 710
- PDF: 356
- XML: 73
- Total: 1,139
- BibTeX: 57
- EndNote: 98
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In “Survival analysis for droplet-freezing data: Kaplan-Meier confidence intervals and log-rank tests,” Whale et al. present an approach to quantify uncertainty in droplet-freezing data and test differences in frozen fraction curves based on the non-parametric Kaplan-Meier survival function estimator. By applying this method, they are able to derive confidence intervals for frozen fraction curves and cumulative ice nucleation activity spectra. They also adapt the log-rank test to droplet freezing data to hypothesis test whether two frozen fraction curves are identical or not. They then demonstrate both methods on selected literature and simulated datasets. The method of calculating confidence intervals they present is rigorous, easy to use, and addresses a persistent problem of a lack of standardization in ice nucleation statistics. However, I have much more significant concerns surrounding the use of the log-rank test for ice nucleation data. Specifically, the hypothesis being tested by the log-rank test is limited in the context of ice nucleation, where in many samples there may be multiple different populations of ice active species that dominate at different temperatures in the frozen fraction spectrum. The requirement of proportional hazards is also a concern as many frozen fraction curves in literature do not meet this assumption, further limiting the usefulness of this test. Given that it can also only be applied to frozen fraction curves (and so generally cannot be used to compare across instruments or across samples at different concentrations), I am not convinced of the broad applicability that the authors imply when recommending this statistical test be used. When combined with the fact that the method of calculating confidence intervals has been previously published (Kinney et al., 2024), I am not convinced that this manuscript in its current form constitutes a sufficiently novel and significant contribution to the field. I encourage the authors to consider whether refinements mentioned in the manuscript that address the shortcomings of the log-rank test or other approaches to statistical testing in the context of survival analysis might be more appropriate and useful to a wider range of applications. Specific comments and suggestions are below.
Major comments:
Minor comments:
References
Cumming, G.: Inference by eye: Reading the overlap of independent confidence intervals, Stat. Med., 28, 205–220, https://doi.org/10.1002/sim.3471, 2009.
Fahy, W. D., Shalizi, C. R., and Sullivan, R. C.: A universally applicable method of calculating confidence bands for ice nucleation spectra derived from droplet freezing experiments, Atmospheric Meas. Tech., 15, 6819–6836, https://doi.org/10.5194/amt-15-6819-2022, 2022.
Jahl, L. G., Brubaker, T. A., Polen, M. J., Jahn, L. G., Cain, K. P., Bowers, B. B., Fahy, W. D., Graves, S., and Sullivan, R. C.: Atmospheric aging enhances the ice nucleation ability of biomass-burning aerosol, Sci. Adv., 7, eabd3440, https://doi.org/10.1126/sciadv.abd3440, 2021.
Kinney, N. L. H., Hepburn, C. A., Gibson, M. I., Ballesteros, D., and Whale, T. F.: High interspecific variability in ice nucleation activity suggests pollen ice nucleators are incidental, Biogeosciences, 21, 3201–3214, https://doi.org/10.5194/bg-21-3201-2024, 2024.
Losey, D. J., Sihvonen, S. K., Veghte, D. P., Chong, E., and Freedman, M. A.: Acidic processing of fly ash: chemical characterization, morphology, and immersion freezing, Environ. Sci. Process. Impacts, 20, 1581–1592, https://doi.org/10.1039/c8em00319j, 2018.
Sachs, M. C., Brand, A., and Gabriel, E. E.: Confidence bands in survival analysis, Br. J. Cancer, 127, 1636–1641, https://doi.org/10.1038/s41416-022-01920-5, 2022.
Whale, T. F., Holden, M. A., Wilson, T. W., O’Sullivan, D., and Murray, B. J.: The enhancement and suppression of immersion mode heterogeneous ice-nucleation by solutes, Chem. Sci., 9, 4142–4151, https://doi.org/10.1039/C7SC05421A, 2018.