the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Covariance-informed spatiotemporal clustering improves the detection of hazardous weather events
Abstract. Spatiotemporal clustering can be used to detect weather events in multi-dimensional datasets. This method requires that the resolution of a dataset equivalently resolves fluctuations across space and time, thereby normalizing the dataset for unbiased clustering across three dimensions. Yet, few studies test whether a dataset meets this requirement as there is no standard approach to do so. To address this methodological gap, we present a framework to quantify the relationship between space and time using space time separable covariance modelling. We demonstrate that, by defining a temporal resolution of interest (e.g. hours, days), the equivalent spatial resolution can be empirically derived using a space time metric. We present an application using the unsupervised machine learning method Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to detect heat waves and severe storms across the Southeastern US from 1940 to 2023 from ECMWF Reanalysis version 5 (ERA5) data. We analyse the seasonal behaviour of space time metrics for precipitation and heat index before selecting representative values. We find that both ERA5-derived daily heat index and hourly precipitation are insufficiently resolved for unbiased clustering at their native resolutions (i.e., 0.25 spatial degrees [degree] per day for heat index and 0.25 degree per hour for precipitation). We show that a resolution of 0.39 degree per day (0.05 degree per hour) prevents preferential clustering in either the spatial or temporal dimension for heat index (precipitation). We hypothesize that event identification will improve by resampling the data by the space time metric. Heat wave clusters that were produced using the unbiased resolution were compared against the NOAA Storm Events Database from 2019 to 2023. Recall of heat waves increased from 0.92 to 0.94 using the covariance-informed resolution, demonstrating the importance of normalization prior to weather event reconstruction. Ultimately, the inclusion of temporal geostatistics leads to improved reconstruction of historical weather events and enables evaluation of their scale and variability.
- Preprint
(3108 KB) - Metadata XML
-
Supplement
(2043 KB) - BibTeX
- EndNote
Status: open (until 30 Oct 2025)
- RC1: 'Comment on egusphere-2025-2870', Anonymous Referee #1, 10 Sep 2025 reply
-
EC1: 'Comment on egusphere-2025-2870', Aloïs Tilloy, 23 Sep 2025
reply
Publisher’s note: this comment is a copy of RC2 and its content was therefore removed on 24 September 2025.
Citation: https://doi.org/10.5194/egusphere-2025-2870-EC1 -
RC2: 'Comment on egusphere-2025-2870', Anonymous Referee #2, 23 Sep 2025
reply
General Comment
The authors present a rigorous and innovative methodological framework for improving the detection of hazardous weather events using covariance-informed spatiotemporal clustering. The study addresses a critical gap in the literature: the lack of standardized approaches to test whether a dataset's spatiotemporal resolution is appropriate for unbiased clustering. The manuscript is well written, methodologically sound, and contributes meaningfully to the field of extreme weather event detection.
However, I share some concerns with reviewer 1 regarding the application of the method to reanalysis data. While the technical novelty of the covariance-informed space-time metric is clear, the current manuscript lacks information to justify how this approach improves the characterisation of spatiotemporal features of extreme events. Furthermore, the article (at least the main part) does not leverage the huge amount of work done by the authors to create clusters for the period 1940-2023 over the Southern USA.
I believe the manuscript has strong potential but would benefit from major revisions. Below, I outline specific areas for improvement.
- Overlooked methodological aspects
- The results of spatiotemporal clustering with DBSCAN are highly sensitive to the parameter set, and to the thresholds used (Tilloy et al., 2022). In section 3.2, you set threshold for extreme events based on NOAA datasets. These thresholds are impact-relevant but they may induce over or under sampling of extremes in the ERA5 data due to major differences between the underlying data in NOAA datasets and ERA5. This may explain the poor recall for precipitation extremes.
- The choice of aggregating precipitation with a rolling sum is justifiable from an impact perspective, from a clustering perspective, it may result in an overestimation of temporal covariance, biasing the conclusions regarding the recommended spatial downscaling.
- The lack of precipitation validation is a major limitation. The authors state that a higher-resolution dataset is needed, but they could discuss available downscaling techniques. Furthermore, could an upscaling of temporal resolution have been an option to overcome the issue?
- Unclear connection to physical processes
- The discussion of seasonal variability in space-time metrics is insightful but a bit messy and extremely focused of heat waves in the current manuscript. For example, the biannual cycle in heat index metrics and annual cycle in precipitation metrics could be linked to known climatological characteristics
- The connection to physical processes can provide material supporting the robustness and usefulness of the method. I suggest creating a subsection dedicated to this topic in the discussion (it is now within the subsection on covariance modelling), clearly stating the meaning of the covariance results, and the connection to known physical processes, storm types and weather patterns.
- Under exploitation of long-term cluster creation: The long-term frequency of the created clusters tells something about the climatology of extreme events in the region since 1940. I see some results in the supplementary material, but they seem underexploited. Simple trends could be assessed on the number of clusters, average size, intensity.
- Recommendations and Practical Implications
- The conclusions (Section 6) provide clear takeaways, but the recommendations for future research are somewhat generic. To make them more impactful, the authors should:
- Explicitly tie recommendations to the literature review. For example:
- If previous studies (e.g., Tilloy et al., 2022; Liu et al., 2023) used biased clustering methods, how could the covariance-informed approach improve their results?
- Are there specific datasets or regions where this method would be most beneficial?
- Address scalability: Could this framework be applied globally? What are the computational challenges?
Specific comments
- Abstract (Line 10-15): The phrase "few studies test whether a dataset meets this requirement" could be more precise. For example: "While spatiotemporal clustering is widely used, few studies quantitatively assess whether a dataset's resolution satisfies the normalization assumption required for unbiased clustering."
- Line 146-158 p.6: It seems that you have two different uses of the acronym BME.
- Line 207 p.9 : The space-time ratio was already used in Tilloy et al.,2022. Furthermore, why do you introduce this ratio if you don’t use it? (Line 378)
- Line 245 p.10: Formatting issue
- Line 254 p.11: “Southeastern populations are the most frequent hotspot” do you mean regions?
- Line 363 p.15: Why only the last 4 years of NOAA storm events? What was different before 2019.
- Figure 6: The choice of the colour scale breaks is odd, please find a more interpretable scale.
Citation: https://doi.org/10.5194/egusphere-2025-2870-RC2 - Overlooked methodological aspects
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
750 | 60 | 18 | 828 | 23 | 10 | 23 |
- HTML: 750
- PDF: 60
- XML: 18
- Total: 828
- Supplement: 23
- BibTeX: 10
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
General comments: The approach the authors offer exploring debiasing spatial and temporal covariance as a preprocessor for cluster analysis is an interesting idea that has not been explored in atmospheric science. The idea certainly has merit, and I found no major issues with the methodology as presented. However, the application of the methods in sections 3 and onward had some concerns. First, the authors quantify a “heat wave”, which is typically reserved for a prolonged period of excessive heat, as any extreme heat measure as quantified by heat index. This created a few issues, as I discuss below. Second, and in my opinion the most glaring issue, the authors did not explore the importance of false positives in a database they already noted was rare-event. Without quantifying false positive rates in some way, the authors are presenting a model that may actually have little to no skill. I discuss this further below. Finally, the authors, despite providing so much statistical detail, did not provide any measure of significance in their recall performance statistics in their demonstration section.
Most of the issues can be addressed through some additional thought or analyses in the case-study part of this work. Without a stronger case in that section, it is hard to see the inherent value in adding complexity to the problem with the debiasing preprocessor. If however results show significant differences, the authors may have identified a method that could be helpful for cluster analysis studies going forward.
Based on these comments, I recommend major revisions.
Major comments:
The use of the heat index as a measure of heat waves can introduce challenges if you use the heat index data outside of its intended ranges, which is likely your source of unphysical values. Did you consider using wet bulb globe temperature, which is used by many SEUS public entities to measure human health risk from heat? It may be a better measure than heat index for this type of application, though it is challenging to compute. The data in ERA5 should be sufficient to obtain this measure and remove the NA issue.
Why are you extrapolating extreme heat index and precipitation values over the ocean, where such things are not defined? Does your method require spatially continuous data, or could you use a land-sea mask and focus on the actual CWAs of the region. It seems continuity is important, so this may be a limitation of your approach.
The choice to not penalize false positives (lines 355-358) is problematic. This speaks to a major issue with the Weather Service, the false alarm problem. It is critical, if this method is to be evaluated against existing methods in a fair manner, that you at least quantify the rate of false alarms (false positives) relative to your projections. That is, if your model always identified a heat wave, it would have a very high recall, but its false positives would be enormous and the model would have no skill. If you choose to utilize confusion matrices to evaluate performance of your clustering, this is a major limitation of your work.
With all the statistical analysis provided in the text, it was surprising that Table 2, which is a key result in this paper, did not contain some measure of significance in the differences of the results of recall. The results are so similar I wonder if there is statistically significant benefit to this complex approach over traditional approaches. This would strengthen your results if they were significant.
There may be value in exploring the 4-6% of “heat waves” that were missed by your methodology, since missing a heat wave seems like something that should not really happen in reality. Why were those 6% missed, especially when you clearly have a major false positive issue already per figure 6.
Minor comments:
A heat wave is defined as an “unusually warm and unusually humid weather, typically lasting two or more days” according to NOAA. However, your approach only looks at daily heat index values. How can you relate these results to an actual heat wave? Or is a better approach to state you are measuring “extreme heat”?
The use of both α and a in denoting important variables for deriving your metric is a bit confusing. The reader has to look closely to tell which are α and which are a. If possible, consider using a different variable to represent one or the other to ease differentiating the two when discussing the equations.
There appears to be some sort of typo on lines 241-246 in the text in terms of formatting.
It is a little strange to delineate hurricanes and tropical storms on line 250. It would be better to just say “tropical cyclones”.
I may have missed it, but is the variable ε defined on line 334? If not, it should be explicitly defined in the text.