Hazard Potential of Compound Flooding from Rainfall, Storm Surge, and Groundwater in Coastal New York and Connecticut
Abstract. Compound flood events, defined here as the co-occurrence of more than one flood type, can result in flood hazard potential that is higher than if the events occurred independently. To evaluate compound flooding in a semi-urbanized coastal area, historical records dating back to 1970 are used to study the co-occurrences of high precipitation, storm surge, and shallow groundwater conditions for Long Island and the Long Island Sound vicinity across coastal New York and Connecticut. Joint return periods for coincident precipitation-surge events were computed using fitted copulas and compared to the assumption of independence as a ratio of return periods, referred to here as a return period adjustment. Results indicate distinct seasonality where compound events in the area disproportionately occur in the cold season between October and April. Return period shifts range from 1 to almost 9, demonstrating the range in precipitation-storm surge dependence across the study area. Across all 24 station triad locations, groundwater levels were elevated during times of precipitation-storm surge co-occurrence, in areas where the average depth to water is shallow (less than 20 feet or 6 m below land surface). The result is a pseudo-trivariate compound flood potential map that integrates dependence between daily precipitation-surge events and overall monthly groundwater levels into a relative compound hazard score. The location with the highest compound flood hazard score is in the south shore of Long Island, as well as locations across coastal Connecticut where groundwater levels are already near-surface during events where both heavy rainfall and high coastal storm surge occur at the same time.
- Preprint
(1381 KB) - Metadata XML
-
Supplement
(50 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-5683', Anonymous Referee #1, 22 Feb 2026
-
AC1: 'Reply on RC1', Robin Glas, 13 Apr 2026
Response to Reviewer 1
We thank Reviewer 1 for their detailed comments. Major changes include: addition of a locator inset map and study area boundary to spatial figures; clarification of the biconditional sampling time lags for CoP and CoS; addition of a Kendall tau formulation and description; clarification of the joint AND return period contour concept with a worked numerical example; and corrections to Table 2 notation and caption.
Comment 1:
Line 24 - Suggest adding units to return period shifts.
Author response: We thank the reviewer for this comment. Return period adjustments are dimensionless ratios representing the factor difference between joint return periods under independence and dependence assumptions. In response we have clarified this in the abstract.
Proposed change to manuscript :“Return period adjustments range from a factor of 1 to almost 9, demonstrating the range in precipitation-storm surge dependence across the study area.”
Comment 2:
Line 29 - The line lists the result as a "flood potential" map, while the next statement refers to a "flood hazard score". Suggest aligning the terminology for better understanding of the results, e.g., "The result is a pseudo-trivariate compound flood hazard score and a corresponding hazard map..."
Author response: We thank the reviewer for this comment and suggestion. The wording in the abstract was changed as suggested.
Proposed change to manuscript: “The result is a pseudo-trivariate compound flood hazard score and corresponding hazard map that integrates dependence between daily precipitation-surge events and overall monthly groundwater levels (as a precondition) into a relative compound hazard score.”
Comment 3:
Line 80 - Missing closing parenthesis after citation. Also applies on Line 119.
Author response: Thank you for this catch- parentheses added as suggested.
Proposed change to manuscript: (Su et al., 2022; Bosserelle et al., 2022). (Rosenzweig et al., 2024).
Comment 4:
Line 46 - It will be helpful for readers unfamiliar with the region, to add a reference to Fig 1. A similar reference will be helpful for Line 93.
Author response: We thank the reviewer for this comment. We interpret this as a request for a geographic reference to help orient readers unfamiliar with the region. In response Figure 1 has been revised to include a locator inset map showing the position of the study area within the northeastern United States. This addition, combined with the existing Figure 1 reference in the text, provides the geographic context needed for readers unfamiliar with coastal New York and Connecticut.
Comment 5:
Fig 1 - It will be helpful to mark the area of consideration in this study on the map. If the entire area of the map is considered, please add a statement to that effect in the caption.
Author response: We thank the reviewer for this comment. A study area boundary has been added to all spatial figures in the manuscript (Figures 1, 9, and 10). In addition, Figure 1 has been revised to include a locator inset map showing the position of the study area within the northeastern United States, with the study area extent indicated by an orange box. Graticule tick marks with coordinate labels have also been added to Figure 1 to provide geographic reference. The study area boundary and inset are described in the updated figure caption
Proposed change to manuscript: Figure 1. (a) Locations of precipitation, coastal water level, and groundwater observation stations used in the analysis; additional site information is provided in Table 1. Inset shows the location of the study area (orange box) within the northeastern United States. (b) Station triad locations, plotted at the nearest land point to the centroid of each triad composed of a precipitation gage, a coastal water level station, and a cluster of groundwater observation wells. Precipitation/surge midpoints are shown as the geographic midpoint between each precipitation and coastal station pair, with circular search radii indicating the area within which groundwater wells were selected for inclusion in each triad. The solid black line delineates the study area boundary in both panels. Projection World Geodetic System of 1984, Base modified from U.S. Geological Survey digital data, 1:250,000.
Comment 6:
Line 114 - I am confused about the comparison of large and moderate flooding events in terms of affected area. The terms "areal-wide", "widespread", and "larger area" are similar, making the distinction difficult to understand. Could the authors please clarify this difference between the events?
Author response: We thank the reviewer for this comment. In response we have revised the sentence to more clearly distinguish between the spatial and intensity characteristics of tropical versus extratropical storms.
Proposed change to manuscript: “Whereas these tropical storms have garnered substantial attention because of their intensity and localized impacts, it is the more moderate extratropical cyclones that have been both more frequent and affect broader stretches of coastline, resulting in recurring compound flooding across a larger portion of the study area (Booth et al., 2016; Liu et al., 2020).”
Comment 7:
Line 175 - I appreciate the authors including their verification process. It will also be helpful to include a reference for Table 2 where the RMSE and bias errors are provided.
Author response: We thank the reviewer for this comment. Table 2 is already cited in Section 4.1 at the point where the holdout analysis results are described: "The accuracy of the regression imputation process was assessed using the results from the holdout analysis, comparing observed to imputed across precipitation, coastal NTR, and groundwater datasets (Table 2).
Comment 8:
Line 176 - The term "unique paired combinations" is unclear. Does it imply that no stations were included in multiple pair combinations? Could the authors please clarify?
Author response: We thank the reviewer for this comment. In response we have clarified this sentence to more accurately describe the station pair selection procedure.
Proposed change to manuscript: “Using the complete imputed data records, 24 precipitation-coastal station pairs were selected based on proximity. Each pair was subsequently paired with groundwater observation wells to form a station triad, reflecting the three data streams contributing to each analysis location. Individual stations may contribute to more than one triad where geographic proximity warranted inclusion.”
Comment 9:
Line 178 - Could the authors clarify if they used the geodesic midpoint or Mercator midpoint?
Author response: We thank the reviewer for this comment. In response we have clarified the midpoint calculation method in the methods section.
Proposed change to manuscript: “Groundwater observation wells were selected using a cluster-based approach centered on the geographic midpoint between each precipitation/coastal station pair in decimal degrees.”
Comment 10:
Line 180 - Could the authors clarify what caused the data availability in the imputed dataset to be less than 100%? Is it because of the period of deployment of the station being incomplete over the selected duration?
Author response: We thank the reviewer for this comment. Data availability in the imputed dataset is less than 100% at some stations because imputation requires at least one correlated reference station with concurrent observations. During periods when no suitable reference station had data available -- particularly early in the record when the monitoring network was less dense -- imputation could not be performed and values remained missing. In response we have added a clarifying sentence to the methods.
Proposed change to manuscript: “It should be noted that imputed records may retain missing values where no correlated reference station had concurrent observations available, for example, during early portions of the record when monitoring network density was lower.”
Comment 11:
Line 181 - The statement "minimum observed depth to water less than" is confusing due to the double comparison. If I understand this correctly, modifying the statement to something like "at least one observation of depth to water less than" may serve the same intent and would be easier to understand.
Author response: Sentence revised as suggested.
Proposed change to manuscript: “at least one observation of depth to water less than 50 feet (15 m) below land surface (bls),”
Comment 12:
Line 184 - What was the maximum distance that had to be considered?
Author response: We thank the reviewer for this comment. No predetermined maximum search distance was established -- the radius expanded until selection criteria were met at each triad. Final search radii for all triads are reported in Table 1, with a maximum of 23 km at Triad 12.
Proposed change to manuscript: “Final search radii for each triad are listed in Table 1, with a maximum of 23 km at Triad 12.”
Comment 13:
Line 185 - Was the 10-well limit applied based on the closest distance to the midpoint, or to one of the stations in the pair?
Author response: The 10-well limit was applied based on distance to the geographic midpoint between each precipitation and coastal station pair, which served as the center of the search radius. Wells were ranked by distance to this midpoint and the closest qualifying wells were retained up to the maximum of 10. A clarifying phrase has been added to the methods.
Proposed change to manuscript: “Up to 10 wells per search radius were retained based on proximity to the precipitation-coastal station midpoint.”
Comment 14:
Line 185 - Could the authors expand on the definition of hydrologic consistency in this case?
Author response: Hydrogeologic consistency in this context refers to confirmation that selected wells within each cluster were completed in the same unconfined hydrogeologic unit, exhibited similar seasonal water level patterns, and were not anomalous relative to neighboring wells in ways that would suggest confinement, interference from pumping, or data quality issues. A clarifying sentence has been added to the methods.
Proposed change to manuscript: “Final selections were manually reviewed to confirm hydrogeologic consistency within each cluster, specifically that selected wells were completed in the same unconfined hydrogeologic unit, exhibited seasonal water level patterns consistent with neighboring wells, and were representative of ambient conditions.”
Comment 15:
Line 202 - I believe that this statement described the CoP case, but the storm surge is mentioned to be selected within three days of the precipitation event. Additionally, the next statement also appears to describe the CoP but with a plus/minus 5-day window. Can the authors clarify?
Author response: We agree that the description of the biconditional sampling procedure and associated time lags was unclear. The ±5-day lag applies to the precipitation-conditioned (CoP) dataset, where the maximum NTR is identified within 5 days of each precipitation event. The ±3-day lag applies to the surge-conditioned (CoS) dataset, where the maximum precipitation is identified within 3 days of each surge event. In response we have rewritten this paragraph to more clearly distinguish the two conditioning approaches and their respective lag windows.
Proposed change to manuscript: “To create two distinct biconditional datasets the R package Multihazard (Jane et al., 2020) was used with the Multihazard::Con_Sampling_2D_Lag function. For the precipitation-conditioned dataset (CoP), 24-hour precipitation totals exceeding the selected threshold were identified as conditioning events, and the maximum NTR occurring within a ±5-day window of each precipitation event was extracted. For the surge-conditioned dataset (CoS), NTR values exceeding the selected threshold were identified as conditioning events, and the maximum precipitation occurring within a ±3-day window of each surge event was extracted. Time lags of ±5 days for CoP and ±3 days for CoS were based on findings that average coastal storm surge events around New York have durations ranging from 1.6 to 3.3 days (Barbot et al., 2024), and that the most common duration for extreme rain events in coastal areas of the Northeast is between 2 and 5 days (Agel et al., 2015). This approach accommodates potential mismatches between peak rainfall and storm surge that still have overlapping events, as well as delays in water reaching the coast via overland flow
Comment 16:
Line 215 - I would suggest including the basic formulation and description of Kendall tau to make the manuscript accessible to a wider audience. Additionally, I would suggest including the reasoning and/or reference for the appropriateness of using Kendall Tau for this problem.
Author response: Basic Kendall Tau formula has been added with a brief description.
Proposed change to manuscript: Biconditional precipitation-surge event sets were assessed for correlation using Kendall tau for each triad. Kendall tau is a rank-based nonparametric correlation coefficient that measures the strength of monotonic dependence between two variables, defined in Equation 1 as:
1
where and are the numbers of concordant and discordant pairs respectively and n is the sample size. Kendall tau was used here because it does not assume normality of the marginal distributions, is robust to extreme values, and has a direct mathematical relationship to copula dependence structure through its connection to the copula parameter (Genest and Favre, 2007).”
Comment 17:
Line 228 - What is a feature in the dataset? It would be helpful to describe features prior to their usage here.
Author response: The term “feature” will be replaced by “variable”.
Proposed change to manuscript: To model the dependence structure in the data, each variable (precipitation and NTR) in the biconditional datasets was transformed into a uniform distribution by computing their rank-based pseudo-observations.”
Comment 18:
Line 252 - Since the absolute difference is compared, it would be helpful to understand the range of uTDC values to put them in context with the difference of 0.1. Otherwise, it is not possible to understand the scale of the difference threshold. Additionally, can the authors elaborate whether the comparison with the median was done because the simulations are expected to follow a lognormal distribution? Typically, simulated results follow the normal distribution, and the difference is then compared in terms of the standard deviation.
Author response: Regarding the range of uTDC values, empirical uTDC values at u=0.6 across the 24 triads ranged from approximately 0.10 to 0.25, such that the 0.1 difference threshold represents a meaningful departure from expected tail behavior rather than a negligible difference. This context has been added to the methods text.
Regarding the use of the median rather than the mean, bootstrap distributions of uTDC can be skewed, particularly at low sample sizes or under weak dependence, making the median a more robust measure of central tendency than the mean for this comparison. The choice of an absolute difference threshold rather than a standard deviation based criterion reflects the bounded nature of the uTDC statistic, which lies on [0,1], making standard deviation based comparisons less intuitive than absolute differences on the same scale. These justifications have been added to the methods.
Proposed change to manuscript: “The uTDC threshold of 0.1 represents a meaningful departure given that empirical uTDC values across all 24 triads ranged from approximately 0.10 to 0.25 at u=0.6. The median rather than the mean of the bootstrapped uTDC values was used as the reference because bootstrap distributions of uTDC can be skewed under weak dependence or small sample sizes, making the median a more robust measure of central tendency. An absolute difference threshold was preferred over a standard deviation based criterion because uTDC is bounded on [0,1], making absolute comparisons on the same scale more interpretable.”
Comment 19:
Eq. 2 - I am unclear about the return periods represented by contour lines, and the corresponding equation does not seem to adequately explain this. In its current form, RP appears to be a scalar, since u and v are functions, the equation is difficult to understand. Additionally, if possible, a graphical example of return period contours might help elucidate this concept further.
Author response: We thank the reviewer for this comment. We agree that it may not be clear from Equation 2 alone that the joint AND return period defines a continuous two-dimensional contour in bivariate probability space, representing infinitely many univariate return period combinations between the two variables. To address this we have added a clarifying sentence immediately after Equation 2 and expanded the description of Figure 7b in the results to reinforce the concept visually. Given the length of the manuscript we have opted not to add an additional contour plot figure, but believe these two additions together adequately convey the concept.
Proposed change to manuscript: "For a fixed return period T, Equation 3 is satisfied by a continuous set of (u,v) combinations lying on [0,1] x [0,1], defining a contour line in bivariate probability space rather than a single point estimate."
(Section 4.4 Line 485) Reworked the results paragraph to include a reference to equation 3 and the 2d contour:
“Figure 7 and Table 5 show and list, respectively, the results from the copula analysis as return period adjustments between the assumption of independence and dependence for concurrent precipitation-surge events. The weighted average return period adjustment encompasses all values across the selected univariate return period pairs, inversely weighted by the variability from the bootstrap resampling (Fig. 8a). Because higher return period adjustments generally indicate more variability between the bootstrapped samples, the weights put more emphasis on shorter univariate return period events. Figure 8b illustrates this for Triad 15, showing how the return period adjustment varies across univariate return period combinations for surge along with associated 95% confidence intervals. The curvature of this relationship reflects the two-dimensional nature of the joint AND return period contour described in Equation 2, where each point along the curve represents a different (u,v) combination satisfying the same joint exceedance probability. Weighted average adjustments ranged from 1 (independent data) to 8.51 (Triad 4, conditioned on surge, Table 5), meaning that coincident rain-surge events are more than eight times more likely to occur when taking their dependence structure into account (that is, the joint AND return period under dependence is approximately eight times shorter than under the assumption of independence, reflecting the observed historical dependence structure rather than any change in physical forcing). The average return period adjustment conditioned on precipitation was 2.99, whereas the average adjustment conditioned on surge was 4.32 across all station triads. The magnitude of period adjustments corresponds to the strength of upper tail dependence associated with the copula models as well as statistically significant correlations represented in the Kendall tau correlation coefficient (Tables 3 and 4).”
Comment 20:
Line 283 - How are Fx and Fy related to u and v, if they are? I am not able to put them in context with u and v.
Author response: We thank the reviewer for this comment. Fx and Fy in line 283 are the marginal cumulative distribution functions of precipitation and surge respectively, equivalent to u and v introduced in Equation 2. To eliminate this notational inconsistency we have replaced Fx and Fy with u and v throughout the manuscript for consistency with the equation notation.
Proposed change to manuscript: This ratio of return periods, or return period adjustment, represents the degree of dependence between the variables across each unique combination of univariate return periods (u and v), allowing the measure of dependence between the variables to be assessed in units of return period (Zscheischler and Seneviratne, 2017).
Comment 21:
Line 311 - How were the values 6 ft and 15 ft selected for hazard thresholds?
Author response: We thank the reviewer for this comment. The 6 ft and 15 ft thresholds were selected based on physical criteria related to infrastructure vulnerability. Depths less than 6 ft (1.8 m) have the potential to interfere with drainage infrastructure and natural infiltration of soils (Bosserelle et al., 2022). Depths between 6 and 15 ft (1.8 to 4.6 m) are likely to intersect with basements and other subsurface infrastructure (Conestoga-Rovers and Associates, 2007; New Jersey Department of Environmental Protection, 2021). Clarification has been added to Section 3.5 where the thresholds are first introduced.
Proposed change to manuscript: “These thresholds reflect physical infrastructure vulnerability criteria: depths less than 6 ft have the potential to interfere with drainage infrastructure and natural soil infiltration (Bosserelle et al., 2022), while depths between 6 and 15 ft are likely to intersect with basements and subsurface infrastructure (Conestoga-Rovers and Associates, 2007; New Jersey Department of Environmental Protection, 2021). These thresholds are consistent with those used in the companion groundwater emergence hazard analysis for the same study area (Masterson et al., 2025; Welk et al., 2025a), for direct comparison of hazard scores across the suite of regional hazard products.”
Comment 22:
Line 323 - What was the criterion for selecting 8 initial clusters and 11 final clusters? K-means typically has a fixed number of clusters, so was a variation on k-means used to dynamically change the number of clusters on each iteration?
Author response: To determine an appropriate value of k, we evaluated cluster quality across a range of candidate values (2 to 15) using the silhouette score. The highest silhouette scores occurred for k = 6 and k = 8, with both values producing similarly strong internal cohesion and separation. Because we needed to do additional separation anyway, we selected k = 8 as the initial clustering configuration.
K‑means itself does not dynamically adjust the number of clusters across iterations, and we did not use a variant that changes k. Instead, after establishing the eight statistically optimal clusters, we performed a post‑processing step to ensure geographic contiguity. Some of the k‑means clusters included multiple non-adjacent spatial regions; because the objective of the analysis was to identify coherent, spatially continuous compound‑flooding zones, these non‑contiguous pieces were separated into their own polygons. This resulted in 11 final clusters. This step did not alter the underlying k‑means algorithm or its optimization but simply partitioned disconnected spatial units into distinct, contiguous areas for mapping and interpretation.
We have clarified this procedure in the manuscript.
Proposed change to manuscript: To interpolate these results across the entire study area (the coasts of Long Island Sound and the south shore of Long Island), each groundwater centroid location was initially assigned to one of eight clusters using k-means clustering (MacQueen, 1967), implemented with the scikit-learn Python library (Pedregosa et al., 2011). Clustering was based on the following six features: (1) average return period adjustment conditioned on precipitation (CoP), (2) average return period adjustment conditioned on surge (CoS), (3) median minimum groundwater depth (in feet), (4) groundwater hazard score, (5) biconditional dependence score, and (6) upper tail dependence score. The number of clusters (k) was selected by evaluating silhouette scores over a range of candidate k values; the highest scores occurred for k = 6 and k = 8, with k = 8 producing groupings that aligned more clearly with our understanding of spatial gradients. Because k-means does not enforce geographic contiguity and some of the statistically derived clusters consisted of multiple disconnected spatial regions, a spatial post-processing step was applied. Non-contiguous cluster components were separated into distinct regions to ensure that each final cluster represented a coherent geographic unit. This procedure resulted in 11 final clusters, each containing between one and four stations, while preserving the underlying k-means classification structure.
Comment 23:
Line 331 - A graphical representation of the clustering, Voronoi tessellation, and the 900-m gridding would be very helpful for improving the understanding of this section.
Author response: We appreciate the reviewer’s suggestion to include a graphical representation of the clustering, Voronoi tessellation, and 900‑m gridding. The final products of these steps, including the cluster centroids and the spatial distribution of the compound hazard scores, are available online in the interactive map associated with this study(Long Island Sound Study Compound Flooding). While the Voronoi polygons themselves are not displayed explicitly, the centroids are provided, and the tessellation is simply a computationally efficient implementation of a nearest‑neighbor interpolation. Users can visualize the spatial relationships among stations, clusters, and resulting hazard estimates directly through the interactive map, which offers greater clarity and flexibility than a static figure in the manuscript.
Proposed change to manuscript: (Section 4.5, line 532) “The compound hazard scores, supporting univariate hazard layers, and 900-m gridding geometry are available for interactive exploration at the station-triad and 900-m grid scale via an online mapping application (Finkelstein et al., 2025, https://ny.water.usgs.gov/maps/compoundflooding/).”
Comment 24:
Line 357 - Similar to a previous comment, it will help with understanding if a reference to Table 2 was added while providing a comparative statement for RMSE, or move the statement to the next paragraph where the comparisons are described.
Author response: We thank the reviewer for this comment. We clarify that line 357 refers to overall imputation errors reported in Supplemental Tables S1-S3 rather than the holdout statistics in Table 2. In response we have added a reference to Tables S1-S3 at this location to direct readers to the appropriate supporting data.
Proposed change to manuscript: Errors (RMSE and bias) were higher than for coastal water levels or groundwater, reflecting the greater spatial and temporal variability of precipitation (Tables S1-S3)."
Comment 25:
Table 2 - Why are the units of bias, which is calculated as a percent, in mm, m, and ft?
Author response: We thank the reviewer for identifying this inconsistency. The bias values in Table 2 represent the mean difference between observed and imputed values in the original units of each variable rather than a percentage. The caption has been corrected to reflect this.
Proposed change to manuscript: (Table 2.) “Imputation performance statistics from a 5% holdout analysis conducted separately for wet days (precipitation (PPT) > 1 mm) and dry days (precipitation = 0 mm). Metrics include root mean square error (RMSE) and average bias between withheld and imputed values, reported in the original units of each variable (millimeters for precipitation, meters for NTR, and feet for groundwater).”
Comment 26:
Table 2 - It will be helpful to additionally provide mean and standard deviation of the parameters to be able to compare the RMSE with the absolute values.
Author response: We appreciate this suggestion. The RMSE values reported in Table 2 are small relative to the typical range of observed values in this region. For context, mean wet-day precipitation in the northeastern U.S. typically ranges from 8-15 mm, mean NTR values are on the order of 0.1-0.3 m, and groundwater depth to water table across the study area ranges from near zero to approximately 15 feet. The reported RMSE and bias values are therefore small in absolute terms and reflect good imputation performance. We will consider adding mean and standard deviation of observed values to Table 2 in the revised manuscript.
Comment 27:
Line 396 - There appears to be higher uTDC for CoP 96-21 in 6a and lower uTDC for CoS for 96-21 in 6b. Although this is difficult to fully establish from the figure, can the authors include more information in the figure, e.g., a best fit line for CoP and CoS to compare against the 45-degree diagonal, and further support their conclusion?
Author response: We thank the reviewer for this comment. We agree that the original scatter plot format of Figure 6 made it difficult to assess whether uTDC values changed systematically between the two periods. In response Figure 6 (now Figure 7) has been completely redesigned as a boxplot showing the distribution of empirical uTDC values across all 24 triads for the periods 1970-1995 and 1996-2021 at both u=0.6 and u=0.9, separately for CoP and CoS. The overlapping distributions and near-identical medians between periods across both conditioning approaches provide clear and immediate visual evidence that the dependence structure did not change systematically over the study period, supporting the stationarity assumption without requiring interpretation of scatter around a diagonal reference line.
Comment 28:
Line 412 - Can the authors please clarify the concept of return period shift further, e.g., by providing an example of return periods for individual, independent, and dependent scenarios. For example, if the precipitation event has a RP of 4 years, surge a RP of 50 years, then the combined RP in the independent case will be 200 years. Would an 8 times more likely RP for the dependence case be 25 years?
Author response: We thank the reviewer for this helpful suggestion and for providing an illustrative example. The reviewer's example is correct -- for a precipitation event with a 4-year return period and a surge event with a 50-year return period, the joint AND return period under independence is 4 × 50 = 200 years. Under dependence with a return period adjustment factor of 8, the joint AND return period would be 200 / 8 = 25 years, meaning the compound event is 8 times more likely to occur than the independence assumption would suggest. We note that this example is already partially addressed in the conclusions section where the return period adjustment concept is described with a worked numerical example (lines 536-540). In response to this comment we have added a brief numerical example to the methods section where the return period adjustment is first introduced.
Proposed change to manuscript: “To illustrate, consider a precipitation event with a univariate return period of 4 years and a coincident surge event with a univariate return period of 50 years. Under the assumption of independence the joint AND return period is the product of the two marginal return periods: 4 × 50 = 200 years. Under dependence, if the return period adjustment factor is 8, the joint AND return period is 200 / 8 = 25 years, meaning the compound event is 8 times more likely to occur than independence would predict.”
Citation: https://doi.org/10.5194/egusphere-2025-5683-AC1
-
AC1: 'Reply on RC1', Robin Glas, 13 Apr 2026
-
RC2: 'Comment on egusphere-2025-5683', Anonymous Referee #2, 02 Mar 2026
-
AC2: 'Reply on RC2', Robin Glas, 13 Apr 2026
Response to Reviewer 2
We thank Reviewer 2 for their thorough engagement. Major changes include: strengthened pseudo-trivariate framing in the abstract and introduction; addition of a new stationarity analysis with linear regression of POT exceedance magnitudes against time, reported in Supplemental Table S5; redesign of Figure 6 as a split-period uTDC boxplot; addition of a conceptual workflow diagram as Figure 3; addition of formulas for the inverse confidence interval weighting scheme; a sensitivity note on symmetric copula assumptions; and a correction to the event-based groundwater level extraction. Streamlining of discussion passages restating regional climatology was also performed.
Major Comments
Comment M1:
Groundwater Is Not Fully Integrated in a Multivariate Framework. Although described as a compound analysis of three drivers, only precipitation and surge are modeled probabilistically via copulas. Groundwater is incorporated post hoc through ordinal scoring based on shallow depth thresholds. This creates a "pseudo-trivariate" hazard metric rather than a fully joint probability framework. The manuscript should either: a) Reframe the study explicitly as a bivariate extreme analysis with groundwater preconditioning, or b) Provide stronger justification for the scoring-based integration and clarify its limitations.
Author response: We thank the reviewer for this comment. The "pseudo-trivariate" framing is already adopted throughout the manuscript, including in the abstract (line 28), where we describe the result as "a pseudo-trivariate compound flood potential map." The decision to treat groundwater as a preconditioning factor rather than a probabilistic driver was intentional, justified by the monthly temporal resolution of the groundwater record, which prevents direct integration into the daily-scale copula framework. This limitation is explicitly acknowledged in the discussion (lines 481-484). In response to this comment, we have strengthened the pseudo-trivariate framing in the abstract and introduction so readers encounter this context before reaching the methods.
Proposed change to manuscript: "The result is a pseudo-trivariate compound flood potential map that integrates dependence between daily precipitation-surge events and overall monthly groundwater levels (as a precondition) into a relative compound hazard score."
Changed text in manuscript (Introduction, line 85):
"Because monthly groundwater data preclude direct integration into the daily-scale return period framework, groundwater is treated as a preconditioning factor and incorporated through an ordinal scoring approach described in Section 3.5."
Comment M2:
Stationarity Assumption Requires Stronger Justification. The assumption of stationarity is based on limited comparisons of Kendall's tau and empirical upper tail dependence between two time periods. Given documented sea-level rise and changes in precipitation extremes in the region, a more formal test of non-stationarity (e.g., time-varying copula parameters or trend analysis in marginal extremes) would strengthen confidence in the results. At minimum, the manuscript should include a sensitivity discussion quantifying how moderate changes in dependence would affect return period shifts.
Author response: We thank the reviewer for this comment. We have added an additional stationarity check to the framework. Previously, we checked for stationarity in the full time series of each marginal variable through trend detection and detrending the full NTR time series. , We had also evaluated stationarity in the dependence structure by splitting the record and evaluating changes in the tail dependence coefficients. As an additional check in response to Reviewer 2’s comment, an additional check was conducted in the form of linear regression of the over-threshold extremes over time for both precipitation and surge under both conditioning approaches. Surge extremes showed slight decreasing trends at a subset of triads along the western south shore of Long Island and coastal Connecticut (slopes ranging from -0.003 to -0.007 m/year), and no regionally coherent increasing trend was found in precipitation extremes. The direction of the surge trend is conservative with respect to compound hazard assessment. Full slopes and p-values are provided in Supplemental Table S5.
Third, and most directly relevant to the copula framework, Figure 6 has been revised from a scatter plot to a boxplot display showing the distribution of empirical upper tail dependence coefficients across all 24 triads for the periods 1970-1995 and 1996-2021, at both u=0.6 and u=0.9. The overlapping distributions and near-identical medians between periods across both conditioning approaches provide clear visual evidence that the dependence structure did not change systematically over the study period. We note that marginal nonstationarity does not necessarily imply nonstationarity in the copula, and our split-period uTDC analysis targets the specific quantity relevant to the return period framework. The decision to retain a stationary framework is further supported by Serinaldi and Kilsby (2015), who demonstrate that stationarity uncertainty typically dominates the distribution of extremes and that nonstationary models often introduce more uncertainty than they resolve (lines 520-522).
Proposed change to manuscript: (Section 3.3):
"To further evaluate marginal stationarity, linear regression of POT exceedance magnitudes against time was performed for both precipitation and surge across all 24 triads under both conditioning approaches. Slopes and associated p-values are provided in Supplemental Table S5."
Proposed change to manuscript: (Section 4.3):
"Regression of POT exceedance magnitudes against time revealed no regionally coherent trend in precipitation extremes across the 24 triads. Surge extremes showed slight decreasing trends at a subset of triads along the western south shore of Long Island and coastal Connecticut, with slopes ranging from -0.003 to -0.007 m/year (Supplemental Table S[X]). The direction of these trends is conservative with respect to compound hazard assessment."
Proposed change to manuscript: (Section 5.1, Limitations):
"Linear regression of POT exceedance magnitudes against time revealed small trends at a subset of triads, with surge extremes showing slight decreasing trends at stations along the western south shore of Long Island and coastal Connecticut (slopes ranging from -0.003 to -0.007 m/year), and no regionally coherent increasing trend in precipitation extremes. These marginal trends are insufficient to justify a nonstationary framework, consistent with the split-period uTDC analysis in Figure 6 and the guidance of Serinaldi and Kilsby (2015)."
Proposed change to manuscript: (Figure 6 caption): (now Figure 7)
"Figure 7. Empirical upper tail dependence coefficients (uTDC) for the first (1970-1995, light grey) and second (1996-2021, dark grey) halves of the record, shown for (a) u=0.6 and (b) u=0.9. Distributions are shown separately for precipitation-conditioned (CoP) and surge-conditioned (CoS) samples across all 24 triads. Overlapping distributions and similar medians between periods support the assumption of stationarity in the dependence structure."
Comment M3:
Threshold Selection May Influence Dependence Estimates. Thresholds were selected partly to maximize Kendall's tau while maintaining 3-6 events per year. This approach may introduce bias toward stronger dependence. A sensitivity analysis using alternative thresholds (e.g., fixed 95th percentile across stations) would help demonstrate robustness.
Author response: We thank the reviewer for this comment. We acknowledge that maximizing Kendall tau during threshold selection could theoretically introduce a bias toward stronger dependence estimates. However, we note that this potential bias is substantially constrained by the requirement to maintain between 3 and 6 independent events per year, which limits the range of thresholds available for selection at each station and reduces the degrees of freedom available for optimization. This procedure follows Jane et al. (2022) directly and represents a conventional approach in the compound flood literature. The potential for threshold sensitivity to influence results is already acknowledged in the limitations section (lines 505-510). In response to this comment we have added a sentence to the limitations section explicitly noting the direction of the potential bias.
Proposed change to manuscript: (Section 5.1, Limitations, after existing threshold sentenc):
"The threshold selection procedure used here, which maximizes Kendall tau within the constraint of 3-6 events per year, may introduce a slight bias toward stronger dependence estimates relative to fixed-threshold approaches. However, the 3-6 events per year constraint limits the range of thresholds available for optimization and therefore the magnitude of any such bias."
Comment M4:
Imputation Uncertainty Is Not Propagated. Large portions of early NTR and groundwater data are imputed. While imputation performance metrics are provided, uncertainty from imputation is not carried into copula modeling or return period estimation. This may influence tail dependence estimates. The authors should clarify the potential magnitude of this impact and discuss whether extreme values are systematically dampened by regression-based imputation.
Author response: We thank the reviewer for this comment. The potential for regression-based imputation to dampen extreme values is already explicitly acknowledged in the limitations section (lines 495-500): extreme values estimated by imputation may be under-represented because of the tendency for regression to underpredict the variance of time series, citing Newman (2014). This would result in a conservative bias -- tail dependence estimates derived from imputed data may be slightly underestimated relative to a fully observed record, meaning the compound hazard scores reported here may if anything understate the true hazard at locations where imputation was most prevalent. In response to this comment we have added a sentence to the limitations section making the direction and implication of this bias more explicit.
Proposed change to manuscript: Changed text in manuscript (Section 5.1, Limitations, after existing imputation sentence around line 500):
"To the extent that imputation dampens extreme values, tail dependence estimates and return period adjustments derived from imputed portions of the record may be slightly conservative, representing a lower bound on compound hazard at locations where early NTR and groundwater records relied heavily on imputed values. Formal propagation of imputation uncertainty through the copula framework remains a direction for future work."
Comment M5:
Interpretation of Return Period Adjustments Needs Care. Statements such as compound events being "eight times more likely" should be clarified as referring to the ratio of joint AND return periods under dependence versus independence, not to changes in physical frequency due to climate forcing. The language should avoid implying causal climate change effects unless formally demonstrated.
Author response: We thank the reviewer for this comment. We note that the manuscript already defines the return period adjustment explicitly as a statistical ratio in the conclusions (lines 532-534), where it is described as representing the ratio of bivariate AND return periods under independence versus dependence assumptions. No causal climate attribution is made anywhere in the manuscript. In response to this comment we have added a parenthetical clarification at the first use of this language in the results section to ensure readers encounter this context before reaching the conclusions.
Proposed change to manuscript: (Section 4.4) “Weighted average adjustments ranged from 1 (independent data) to 8.51 (Triad 4, conditioned on surge, Table 5), meaning that coincident rain-surge events are more than eight times more likely to occur when taking their dependence structure into account (that is, the joint AND return period under dependence is approximately eight times shorter than under the assumption of independence, reflecting the observed historical dependence structure rather than any change in physical forcing).”
Comment M6:
Hazard Versus Risk Terminology. The manuscript evaluates hazard potential, not risk (no exposure or vulnerability metrics are included). The discussion and conclusions should consistently use "hazard" rather than "risk" to avoid conceptual ambiguity.
Author response: We thank the reviewer for this comment and agree that consistent use of hazard terminology is important. The manuscript title, methods, and results sections already use hazard terminology consistently. In response to this comment we have conducted a full sweep of the manuscript and replaced remaining instances of risk language with hazard or frequency language where the context refers to the physical drivers assessed in this study rather than to broader societal risk concepts.
Note: The term "risk" has been retained in two contexts where it refers specifically to the broader literature on flood risk management rather than to the findings of this study, as replacing it in those contexts would misrepresent the cited works.
Minor Comments
Comment Minor 1:
Clarify explicitly in the methods that the joint return periods are AND-type (simultaneous exceedance) to avoid confusion with OR definitions.
Author response: We thank the reviewer for this comment. The AND-type joint return period is already defined in the introduction (lines 68-71), where the OR and AND cases are explicitly distinguished and the AND scenario is identified as the focus of this study. In response to this comment we have added a clarifying sentence at the point where Equation 2 is introduced in the methods to ensure readers encounter this context within the methods section itself.
Proposed change to manuscript: (Section 3.3, before or after Equation 2):
"All joint return periods computed in this study are AND-type, corresponding to the simultaneous exceedance of both precipitation and surge thresholds, as distinct from OR-type return periods where exceedance of either variable alone is sufficient."
Comment Minor 2:
Provide a brief quantitative summary in the results of how many triads exhibit upper tail dependence and how many are independent under each conditioning approach.
Author response: We thank the reviewer for this comment. The manuscript already reports independence counts explicitly in the results (lines 387-391), noting that independence was found in 14 of 24 triads under CoP and 1 of 24 triads under CoS. In response to this comment we have added a sentence explicitly reporting upper tail dependence counts under both conditioning approaches for completeness.
Proposed change to manuscript: (Section 4.3):
"Of the triads exhibiting statistically significant dependence, upper tail dependence was present in 6 of 24 triads under CoP and 12 of 24 triads under CoS, as determined by selection of a copula family exhibiting upper tail dependence (Table 4)."
Comment Minor 3:
Include a brief sensitivity note on copula family selection — particularly the implications of using only symmetric copulas.
Author response: We thank the reviewer for this comment. The use of permutation symmetric copulas is already noted in the methods section (lines 242-244), where it is identified as an implicit assumption of the analysis. In response to this comment we have added a brief sensitivity note to the limitations section discussing the implications of this assumption.
Proposed change to manuscript: (Section 5.1, Limitations, after existing copula paragraph):
"All candidate copula families used in this study are permutation symmetric, meaning C(u,v) = C(v,u), which assumes that the dependence structure is the same regardless of which variable is conditioned upon. Asymmetric copulas, which allow for directional dependence between precipitation and surge, were not considered. If the true dependence structure is asymmetric, the symmetric copulas used here may misrepresent the joint distribution in ways that could affect return period estimates at individual triads. The biconditional sampling approach used in this study -- which produces separate CoP and CoS datasets -- partially addresses this by allowing dependence to be assessed from both conditioning directions, and differences in results between CoP and CoS provide an indirect indication of potential asymmetry in the data.
Comment Minor 4:
Improve clarity of the weighting scheme used to aggregate return period adjustments (inverse CI width); a short formula or schematic would help.
Author response: We thank the reviewer for this comment. The inverse confidence interval weighting scheme is described in prose in Section 3.4 (lines 286-300) and illustrated visually in Figure 8b. In response to this comment we have added an explicit formula to the methods to improve clarity.
Proposed change to manuscript: (Section 3.4, after existing description):
"Formally, the weight assigned to each return period adjustment was computed as:
*where is the width of the 95% confidence interval for the return period adjustment at univariate return period combination i, derived from the 500 bootstrap resamples. The weighted average return period adjustment for each triad was then computed as:*
*where is the return period adjustment at each unique combination of univariate marginal return periods. This weighting scheme assigns greater influence to return period combinations where bootstrap uncertainty is smallest, which in practice corresponds to shorter univariate return periods where sample sizes are largest (Figure 7b).
Comment Minor 5:
Ensure consistent units throughout (meters vs feet for groundwater depth).
Author response We thank the reviewer for this comment. In response we have conducted a full sweep of the manuscript to ensure groundwater depth is reported consistently in feet with metric equivalents in parentheses throughout, consistent with the units used in the groundwater hazard scoring thresholds and the USGS monitoring network from which the data were retrieved.
Comment Minor 6:
In spatial figures, consider including uncertainty ranges or classification uncertainty to prevent overinterpretation of sharp boundaries.
Author response: We thank the reviewer for this suggestion. We agree that spatial interpolation from discrete monitoring stations introduces classification uncertainty at boundaries between hazard zones. Formal propagation of bootstrap uncertainty through the spatial interpolation framework is beyond the scope of the current revision. However the methods section already includes explicit language cautioning against overinterpretation of the interpolated results (lines 339-342), and in response to this comment we have added a sentence to the caption of Figure 10 reinforcing this caveat.
Proposed change to manuscript: Figure 11 caption. (Final results figure) Breakdown of scoring assignments used to compute the compound flood hazard score. (a) Final hazard scores by triad. (b) One point assigned for biconditional dependence, defined as dependence in both rainfall-conditioned (CoP) and surge-conditioned (CoS) samples. (c) One point assigned if either CoP or CoS samples were best fit by a copula model exhibiting upper tail dependence. (d) Groundwater scores based on the median shallowest depth among selected wells per triad: 2 points for less than 6 ft (1.8 m) below land surface (bls), 1 point for 6 to 15 ft (1.8 to 4.6 m) bls, and 0 points for greater than 15 ft (4.6 m) bls. The solid black line delineates the study area boundary. Boundaries between hazard zones (a) reflect interpolation from 24 discrete station triads and should not be interpreted as precise spatial boundaries. Local variability in precipitation, coastal water levels, and groundwater depth at unsampled locations is not represented.
Comment Minor 7:
Add a concise conceptual diagram summarizing the triad-based workflow (precipitation-surge sampling, copula fitting, RP shift, groundwater scoring, spatial interpolation).
Author response: We thank the reviewer for this suggestion. A conceptual workflow diagram has been added to the manuscript as Figure 3, placed at the end of the methods section following Section 3.5. The diagram summarizes the full analytical chain from station triad design and data imputation through biconditional event sampling, copula fitting and model selection, return period adjustment, groundwater scoring, and spatial interpolation to the final compound hazard map. Three color tracks (red for precipitation, green for NTR, purple for groundwater) illustrate how each data stream flows through the analysis and where the streams converge at the final scoring step, making the pseudo-trivariate structure of the analysis visually explicit. All subsequent figure numbers have been updated accordingly.
Proposed change to manuscript: (New Figure 3, end of Section 3.5):
Figure 3. Flowchart summarizing the analytical workflow for compound flood hazard assessment. Three data streams -- daily precipitation (red), daily non-tidal residual storm surge (green), and monthly groundwater (purple) -- are processed through station triad design and imputation, biconditional event sampling, copula fitting and model selection, and return period adjustment. Groundwater is incorporated as a preconditioning factor through a separate scoring step before all components are combined into a final compound hazard score and interpolated across the study area. CoP, conditioned on precipitation; CoS, conditioned on surge; AIC, Akaike information criterion; uTDC, upper tail dependence coefficient; ARCHI, Automated Regional Correlation Analysis for Hydrologic Record Imputation (Levy and others, 2024).
Comment Minor 8:
Slightly streamline portions of the discussion that restate known regional climatology unless directly linked to quantitative findings.
Author response: We thank the reviewer for this comment. In response we have streamlined the discussion by condensing passages that restate known regional climatology without direct quantitative connection to the findings of this study, while retaining content that links regional climate context to the spatial patterns observed in the results.
Citation: https://doi.org/10.5194/egusphere-2025-5683-AC2
-
AC2: 'Reply on RC2', Robin Glas, 13 Apr 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 295 | 185 | 21 | 501 | 27 | 19 | 22 |
- HTML: 295
- PDF: 185
- XML: 21
- Total: 501
- Supplement: 27
- BibTeX: 19
- EndNote: 22
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Glas et al have presented statistical analysis to demonstrate the probability of multi-flooding-events occurrence in the New York City region. Additionally they have proposed a flood score based on their statistical results to qualify the spatial flood risk in the region. This is a highly effective study with rigorous analysis and potential applications.
While the analysis in the manuscript appears to be sound, I would suggest further detailing and describing the methodologies and results for improved understanding across a wide audience. I have listed my comments below for the authors consideration: