The Stippled Gridpoints are Statistically Significant: (Mis)uses of False Discovery Rate Correction for Geospatial Data

Schutte, Michael Konrad; Olivetti, Leonardo; Pons, Flavio Maria Emanuele; Messori, Gabriele

doi:10.5194/egusphere-2026-2203

Preprints

https://doi.org/10.5194/egusphere-2026-2203

Preprints

21 May 2026

| 21 May 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

The Stippled Gridpoints are Statistically Significant: (Mis)uses of False Discovery Rate Correction for Geospatial Data

Michael Konrad Schutte, Leonardo Olivetti, Flavio Maria Emanuele Pons, and Gabriele Messori

Abstract. Peer-reviewed articles in the geosciences routinely assess statistical significance in spatially distributed data. Statistical significance is often assessed independently at each grid point, while formal adjustment for multiple testing is applied less consistently. Although several approaches to account for multiple testing exist, their application to geosciences data is not always straightforward, as these data often exhibit spatially coherent signals.

In this work, we revisit multiple-testing correction in the context of spatially structured datasets. We first highlight how neglecting multiple testing correction can substantially inflate the number of false positives. We further show that the global false discovery rate (FDR) approach, proposed in literature for application in geosciences, can yield counterintuitive and potentially misleading results when applied to spatially coherent signals. To illustrate the latter point, we provide an example based on near-surface air temperature composites following sudden stratospheric warmings. We show that when anomalies are spatially coherent, restricting the spatial domain can increase the FDR-adjusted significance threshold. Consequently, the same underlying field can appear more statistically significant solely due to domain selection, despite unchanged data. We explain this behavior from the rank-based structure of the FDR procedure and discuss its implications for spatial inference and uncertainty quantification in the geosciences.

Building on these insights, we outline practical recommendations for transparent and robust significance assessment in geoscientific applications. These include clearly documenting multiple-testing corrections when adjusted pointwise significance is shown, cautious interpretation of adjusted thresholds, and considering spatially aware alternatives such as regional or cluster-based inference when appropriate.

Overall, our results highlight both the need to account for multiple-testing and potential issues with a naïve application and interpretation of the FDR correction. We hope that our work may contribute to more robust statistical testing in the geosciences.

Received: 17 Apr 2026 – Discussion started: 21 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Michael Konrad Schutte, Leonardo Olivetti, Flavio Maria Emanuele Pons, and Gabriele Messori

Status: open (until 17 Aug 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2026-2203', Anonymous Referee #1, 15 Jun 2026 reply

This paper points out potential problems associated with use of the FDR procedure in the context of spatially correlated hypothesis tests. The authors show that the rule of thumb recommended in the 2016 Wilks paper, which was derived from a particular synthetic data setting with relatively few locally significant gridpoints, apparently behaves badly in the extreme restricted-domain example highlighted in the present paper, and so is evidently not optimal in general.
The paper is fairly short, and would be stronger if it were to include a spectrum of synthetic-data simulations aimed at quantifying optimized parameterization of the ratio of the FDR to the local test level, perhaps as a function of the proportion of local tests that are nominally significant (for example as suggested on line 196), or possibly in terms of the relationship between the domain size and the spatial autocorrelation length scale. Figure 2 seems to indicate that setting the FDR level closer to 0.05 might yield more consistent results in Figure 1d. At minimum, it would be interesting to see the counterpart of Figure 1d with equality of the FDR level and the local test level (i.e., 0.05).
A few more specific comments:
para beginning line 65. Not quite accurate: FDR is actual, not expected number of incorrectly rejected nulls. The Benjamini-Hochberg procedure limits (“controls”) the proportion of such rejections, in expectation. So alpha-FDR = 0.1 implies that, on average, no more than 10% of rejected nulls are false positives, and indeed there may be fewer than this. (The characterization on line 91 is correct).
line 110. The test setup here appears to assume implicitly that SSW events are uniformly distributed in the November-March data window, which is substantially longer than 60 days. Is this the case in the observations? Also, is there a physical justification for use of the 60-day period, external to the test data? What is the effect of temperature nonstationarity during November-March?
line 125. The choice of the small Northern European gridbox appears to have been made a posteriori, after calculation of the initial hemispheric analysis. The several papers cited, apparently to justify this choice, presumably were based on the same or substantially overlapping historical data. The possible impact on the second, spatially restricted, analysis should be discussed more fully.

Reply

Citation: https://doi.org/10.5194/egusphere-2026-2203-RC1

Michael Konrad Schutte, Leonardo Olivetti, Flavio Maria Emanuele Pons, and Gabriele Messori

Viewed

Total article views: 274 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
184	71	19	274	14	15

HTML: 184
PDF: 71
XML: 19
Total: 274
BibTeX: 14
EndNote: 15

Views and downloads (calculated since 21 May 2026)

Month	HTML	PDF	XML	Total
May 2026	127	32	9	168
Jun 2026	37	11	5	53
Jul 2026	20	28	5	53

Cumulative views and downloads (calculated since 21 May 2026)

Month	HTML	PDF	XML	Total
May 2026	127	32	9	168
Jun 2026	37	11	5	53
Jul 2026	20	28	5	53

Viewed (geographical distribution)

Total article views: 248 (including HTML, PDF, and XML) Thereof 248 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2026

Short summary

When testing whether signals on maps are statistically meaningful, scientists must account for multiple testing. Otherwise, some results may appear significant purely by chance. We show that the widely used false discovery rate correction can be misleading: simply changing the geographic region analyzed can make the same data appear far more statistically significant, even though nothing has changed. We recommend transparent reporting and using region-wide averages instead of point-wise testing.


Total:	0
HTML:	0
PDF:	0
XML:	0