the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating the Feasibility of Scaling the FIER Framework for Large-Scale Flood Inundation Prediction
Abstract. Floods are a recurring global threat, causing lives lost, property damage, and agricultural impacts. Accurate and timely flood inundation forecasts are crucial for effective disaster preparedness and mitigation. However, traditional flood forecasting methods often face challenges in terms of computational demands and data requirements, particularly when applied to large geographic areas. This study presents a novel approach to scaling a data-driven flood forecasting framework, Forecasting Inundation Extents using REOF (Rotated Empirical Orthogonal Function) (FIER), to large geographic regions. FIER leverages historical satellite imagery and streamflow data to predict flood inundation extents without relying on complex hydrodynamic models. We demonstrate the effectiveness of applying FIER over a large geographic extent using watershed boundaries to create individual FIER models and then mosaicking the results geographically to provide large flood inundation predictions. The Upper Mississippi Alluvial Plain in the United States was used as a test region. We evaluated multiple buffer sizes for watersheds for generating the data-driven FIER models to reduce edge effects along watershed boundaries when mosaicking the individual FIER implementations. The FIER method using watersheds, coupled with different forecast lead times from the National Water Model operational streamflow forecasts, was used to accurately predict the extent of surface water for select flood and low flow use cases. Our results show that the scaled FIER approach using watersheds yields higher accuracies for different error metrics, including the Structural Similarity Index Measure (SSIM), RMSE, and MAE. The metrics for the watershed-scaling approach resulted in SSIM ranging from 0.699–0.804, RMSE range of 7.15–8.60, and an MAE range of 1.09–1.88 compared to a baseline area with SSIM ranging from 0.643–0.693, RMSE range of 8.112–11.681, and an MAE range of 1.969–1.989. We found that scaling FIER using a watershed approach yielded statistically significant better performance compared to the baseline area: this is particularly true when using buffer sizes for the watersheds of 0–10 km and when applying a post-processing correction to the FIER outputs. This approach offers a promising solution for large-scale flood forecasting, particularly in data-scarce regions or ungauged basins. Future research will focus on refining the framework to incorporate additional hydrological variables and improve the accuracy of long-range flood inundation predictions.
Competing interests: Kel N. Markert is employed by Google; the methods presented use generally available Google technologies. Daniel P. Ames is a member of the editorial board of the Environmental Modelling & Software journal. The other authors declare no conflicts of interest.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(7931 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3491', Anonymous Referee #1, 26 Nov 2024
The manuscript describes research on a new approach for scaling a flood inundation prediction method to larger regions. The text is well-written and has a good overall structure, with additional information included in an appendix. The topic is relevant, as scaling flood forecasting methods to enable coverage over large areas, while maintaining the spatial and temporal resolution, lead times and accuracy required to guide decisions prior to and during a flood event, is challenging and a topic of active research across the globe. This is also the case for the US, where this research has placed its study area. Furthermore, the fact that the code/scripts used for the study are made available publicly under an open-source license and the explicit attention the research places on operational constraints and use is recommendable. This can enable the research from being applied in operational practice more easily (something that is often overlooked, while stated as a driver or goal of many manuscripts). These are all points that merit publication.
However, there are a few areas where the manuscript can (and in some cases should) be improved, as outlined below. This includes general comments on (parts of) the text, followed by detailed line-by-line comments.General comments
The abstract gives a good and concise overview of the whole text. But a few things in the abstract could be made more clear. Sometimes it might simply be the right choice of words. For example, it is stated that FIER is a data-driven method but also a promising solution for data-scarce regions, so apparently these are different types of data. Stating specific numbers (e.g. error metrics, buffer sizes) in the abstract can sometimes help readers, but only if they can directly place it in context and gauge their value. For the error metrics it would help to know how these were calculated (e.g. what are the observations, for what regions/times). For the buffer sizes some more clarification and context would help (e.g. are these large or small compared to other tested buffers, what is the post-processing technique applied?). If this would make the abstract too long, it could also be considered to remove the specific numbers and instead explain things at a higher level.
There seems to be a mixed signal regarding different types of models. On one hand, the drawbacks of hydraulic/hydrodynamic models are listed, such as extensive parameter tuning and uncertainties in the input data. On the other hand, hydrological model output is mentioned as a good data source, while hydrological models suffer from very similar issues, at least in terms of parameter tuning and input/output uncertainties. The latter is recognized later in the text, but only as a potential explanation for issues with the described framework (more on that later). Describing this early on, and giving an explanation on why it should still be used, would show that this is recognized by the authors, and as such, with the current set up, is an inherent element of the framework.
The described method requires historical imagery and data. The authors begin their introduction with statements around climate change, population growth and urban expansion, rightly pointing to these to give relevance to the research. However, as a data-driven method, it is unclear how well the proposed method (including its post-processing correction step) could cope with such changes. This also holds for other changes directly influencing flood patterns, such as new infrastructure or (temporary) flood defenses. It would be good to make a notion of this in the text, even if the implications are uncertain.
It might help readers if a better description of the study area, including its hydrology, main flood drivers, relevant structures, precipitation and streamflow inflow from outside the study domain was given. This could help readers gauge results (especially since there are some things that remain unclear from the results and associated error metrics) and judge transferability to their own domains of interest.
The research is well structured, executed and documented, including the calculation of various error metrics. However, these (and the results in general) could do with more explanation, especially on their implications. For example, it would help readers if there was some insight on what certain RMSE values imply for flood forecasting purposes. Related, what is (or could be) the benchmark for each error metric? Without these, it is hard to judge if this indeed makes the described method suitable for operational flood forecasting purposes, while this is implied in the text. The use cases (and the maps shown there) definitely help in this regard, but also have the same issue with error metrics. This is understandably a hard task, as, for example, benchmarks for flood inundation maps are probably not available. But as long as it is unknown what would be required for operational flood forecasting and decision making, statements on the suitability of the described method for those purposes feel somewhat unsubstantiated.
The focus of the text is, understandably, very much on floods. It also includes a use case on low flows, but this is a relatively weak part of the manuscript. The study area was chosen because it is a flood-prone region, not because of its relevance for droughts. The reasoning behind the approach and the research also utilizes current challenges with flood (not drought) forecasting. Finally, as stated in some of the detailed comments on the low flows use case, the direct applicability of FIER for this use case is not clear. That being said, the fact that FIER can function well under low flow conditions, i.e. not producing bad results that would hinder its use and reduce trust in its results, speaks well for its operational potential. That is something that could be highlighted somewhere in the text as well.Detailed comments
[Line 29] What do these specific buffer values imply? Are they relatively small or large? (see also general comments on abstract)[Line 66] “hydrological model outputs” hasn’t been mentioned as input/use before, so the fact that these are “indeed” a “promising avenue” is unsubstantiated. This also relates to the general comment on (hydrological) model limitations and uncertainties.
[Line 96] “[…] without the complexities […] of hydrodynamic models”; that depends, as FIER itself is also rather complex. Hydrodynamic models are often offered as software packages which take away most of the underlying complexities for the user/modeler. How it this with FIER? How complex would it be for others to set this up, in comparison with often used hydrodynamic models?
[Line 96] “[…] without the […] computational needs of traditional hydrodynamic models” vs. “there is a computational challenge as the nature of developing the flood patterns requires loading data in memory for processing so applying FIER over large areas can be a challenge” (lines 81-82). It can be assumed that FIER’s computations are concentrated on its set-up / training, while the traditional models are in their execution, but it would be good to make that distinction explicit somewhere.
[Line 141] As mentioned in the general comments, it would help readers if there was more information on the study area. Some suggestions follow below.
[Line 145] What are the (in)flows of these rivers into the study area?
[Line 146] Where are these reservoirs located and what are their characteristics (e.g. how much can they buffer to reduce downstream flooding)?
[Line 147] Is that localized rainfall (and snowmelt) within the study area or coming (as runoff and/or streamflow) from upstream? Does local precipitation have a strong influence on floods within the study area or is it mainly streamflow-driven?
[Line 148] “[…] leading to increases in streamflow […]” Figure 3 (line 275) seems to indicate the opposite, i.e. a downward trend in streamflow? Is this a short-term anomaly, error in the simulations, or something else?
[Line 155] “[…] section 2.3 Experimental Design […]” Should probably be 3.3.
[Line 174] “[…] 2012-01-20 to, […]” seems like the end date is missing.
[Line 196] How were the basins selected? Not all basins intersect with the baseline area, but there also more basins on the sides that could have been included. For reproducibility and understanding of the readers, it would help to make this clear.
[Line 201-202] How do these buffer sizes compare to the watershed (area) sizes? It would be interesting for others to know this when applying it to other watersheds, as there probably is some sort of relation there? Has that been tested/investigated? This is addressed in the discussion section (lines 458-462), but an indication of the above, or at least how the current values were chosen, might be helpful to readers.
[Line 202-203] Duplicate sentence, has been stated in the prior sentence already, so I’d suggest this one can be removed.
[Line 205] Why 99.9% and not 100%?
[Line 205-206] What percentage of data is kept (or removed) as part of those two thresholds, i.e. the initial 90% and later 99.9%?
[Line 205-206] And how were these cloud percentages assessed/calculated? It is known that identification of cloud cover in satellite imagery is not a trivial task due to the varying nature and characteristics of clouds. Certain clouds can also have more impact on certain bands of the imagery. There’s often a balance between over- and underestimation of clouds, implying that sometimes clouds are missed (and thus included in the imagery even if it is stated it’s 100% clear). What would be the impact of cloudy pixels present in the data when fitting FIER?
[Line 205-206] FIER has previously been tested with Sentinel-1 SAR imagery (https://doi.org/10.1016/j.envsoft.2023.105643), which of course has much less issues with clouds. Why was it decided to go with VIIRS here? This is briefly touched upon in the Discussion (lines 512-513), stating that VIIRS provides daily observations. However, there are likely quite some days without data due to cloud (filtering), see also two comments above, so are there other factors that made the decision for VIIRS?
[Line 259-261] Why 2019 and 2020? Streamflow seems to indicate a downward trend from year-to-year in the operational hydrograph (from 2018-2024, Figure 3). Low-flow periods in later years are thus more extreme. Not saying that taking the most extreme low flow is the best choice, but an explanation as to why 2019 and 2020 were chosen would help the reader understand the authors line of thought.
[Line 267-269] What is meant with “evaluating” the return periods (vs. the previously mentioned calculation of them)? It is clear that the NWM operational product is to be used operationally, and thus also to test/validate FIER, but that was already stated previously and this seems to indicate something else.
[Line 271-272] Similar as the questions above on low flows, why 2018-2020 for high flows / floods? This is more clear for floods, as we can see from Figure 3 that the 50-year event in 2019 is the most extreme in this reach, but it’s still interesting to know why this decision was made and to make that explicit in the text.
[Line 281-282] NWM has sub-daily forecasts for a reason; this is often a requirement, or at least standard practice, for operational flood forecasting, as forecasters and decision makers need it at this level. Flood forecasting with a daily timestep is often considered inadequate, especially for shorter lead times. There can be various reasons why it was decided to go with a daily time step for FIER, but it would be good to mention these and if/how it could be done on the same time step as NWM.
[Line 282-285] Similar question as above on the ensembles; although perhaps less relevant than the sub-daily timestep, ensembles do enable estimates of uncertainty and probabilistic forecasting. While assumptions can be made about why these were averaged for this study, it would be good to make that explicit in the text.
[Line 293-294] Is this computational performance? If so, would be good to make that explicit (and provide at least some kind of indication of what it means). If not, the sentence might be obsolete, as the following sentences describe performance well enough. Related, it is later stated (lines 465-466) that “[…] using more watersheds comes at the cost of increased computational complexity […]”, which opens up the question as to whether that is a significant increase, and if so, what the relation between choice of watersheds and computational complexity/burden is.
[Line 296-297] It might be better to split this sentence in two as the fact that “[…] mosaicked results begin to trend more closely aligned to the baseline […]” and “[…] are not that much lower than the SSIM metric at the lower buffer sizes.” are two distinct observations. One is a comparison of two different methods, the other a comparison of input parameters within one of those methods. At present, it can create confusion for the reader.
[Line 297-298] If this is indeed also true for RMSE and MAE, why not include that with the previous (split up, see comment above) sentence? However, I’m not sure it really is. Just roughly calculating from reading the graphs in Figure 4, SSIM differences for various buffer sizes are indeed small (0.02 differences for values around 0.7, i.e. roughly 2.8%), but RMSE is already a lot more than that (at over 10%; 0.8 vs. 7.8) and MAE even more (0.3 vs. 1.75). Or was this statement referring to the first part of the previous sentence? (see comment above about confusion)
[Line 304-305] “[…] noticeable in the RMSE and RRMSE […]” Indeed it is, but what about SSIM? That also shows an interesting pattern, where the original scores lower while mosaicked shows higher values after post-processing (contrary to RMSE and RRMSE, where both score worse). Might be worth mentioning as well. And explain what could be behind this.
[Line 308-310] Indeed, and it would be worthwhile to investigate this further. What could cause this? Since FIER is a data-driven method (and so is the post-processing with CDF), there must be something in the data or the method itself that can explain this? Or is there still something that is not understood?
[Line 348-349] Indeed, but is this compared against the post-processed baseline as well? We’ve seen earlier that error metrics of the baseline deteriorate after post-processing, while they improve for the mosaicking approach, so testing the significance between these might not be that relevant. For the baseline, one would probably decide not to use the post-processing step? A fairer comparison would thus be against the original baseline.
[Line 355-367] Related to the above comment. Yes, there indeed seems to be merit in the watershed approach. But, in the paragraphs above there was more caution and also the notion that there can be more large errors with the post-processed results. Also, if (as questioned above) the test here are indeed comparing post-processed baseline and mosaics, the statement “[…] CDF matching further improves the accuracy […]” is based on a comparison that might be questionable or should at least be explained better.
[Line 380-381] “[…] better error metric results (SSIM, RMSE, MAE) […]” Doesn’t RMSE increase with post-processing corrections?
[Line 391-392] “[…] however […]” Nitpicking on word choice, but wasn’t that (smaller extent with larger lead time) also already the case with the other event (top row in figure), even if the difference is only observed between medium and long-range? So these are similar, while ‘however’ seems to indicate the opposite?
[Line 391-392] There seems to be more differences between the medium and long-range version, not necessarily (or at least not only) indicating smaller flood extents with the long-range version. Certain regions, especially in the Southern (downstream?) half, seems to have more water in the long-range version (even if the tributary there is not connected then). This implies more complex behavior than simply less water overall and it might be good to reflect on that in the text.
[Line 392] “[…] suggesting more uncertainty with longer range forecasts.” Why would a smaller flood extent imply more uncertainty? Or does this follow from something else?
[Line 393-394] “[…] long-range forecast exhibits slightly higher RMSE […]” Isn’t RMSE for long-range (16.3) lower than for nowcast (16.9) and medium-range (17.3)?
[Line 396-397] “[…] even with extended lead times […]” In fact, it seems to suggest that FIER performs best with the long-range forecast, as this has the best error metrics across the board? As stated in the text, longer lead times usually have higher uncertainties, so this is an interesting pattern. It just being a ‘lucky shot’ (across the two events, as it also holds for the second case) seems unlikely (although it cannot be ruled out, two cases are not enough to come to definitive conclusions). That would imply there are certain elements, within the input data and/or within the FIER framework, that work better for the long-range forecast and it would be very interesting to identify those. At minimum, something along these lines should be reflected in the text.
[Line 398] “[…] even better […]” That is rather subjective. This implies that performance of the first event was already good, but was it? Results are much worse compared to what’s shown before (e.g. nearly all metrics off the charts for Figure 4). There has been no link with what the metrics truly imply so far (e.g. their influence on operational practice or otherwise, see also general comment on this), so whether they are indeed good is hard to judge. It also implies that it has improved for the second event, but that also depends on what’s being looked at. All metrics are worse for nowcast, but are indeed better for (especially) medium and long-range.
[Line 398-399] “[…] particularly for the long-range […]” Related to comment directly above. Performance of stated SSIM and RMSE are only slightly better (0.6676 vs. 0.6664 and 16.04 vs. 16.26, respectively).
[Line 400] “[…] capturing more frequent flood events with high accuracy […]” See comments above. With rather small differences between some of the error metrics of the 50 and 5-year return periods, and no implications on what those values truly mean, this statement feels out of place. It either needs more explanation to back this up or should be changed.
[Line 401-403] “[…] likely due to the errors in the NWM streamflow predictions […]” It would be worthwhile to assess this quantitatively. It can of course be expected that there are errors in those predictions, but how large are they and what is their influence on the “degradation in performance” of FIER? And could there be any other factors influencing this?
[Line 405, Figure 6] Including differences between observations and FIER would help the reader assess quality and gauge the calculated error metrics. Showing this in the same figure might not be feasible, but a different figure (perhaps in the Appendix / supplementary material) would be great.
[Line 405, Figure 6] It seems from the first case (top row) that there are streams / water areas which are being cut off at the basin boundary (i.e. on the Southwestern side). Looking at Figure 2, these indeed seem to drain into the main river later (further South, i.e. downstream?). This relates to the earlier question on how the basins were selected (line 196). Would this influence FIER results (and if so, how), as the flood patterns there are less related to those in the main channel (e.g. not driven by discharge used as input)?
[Line 419-420] “[…] demonstrates the lowest RMSE 420 (8.53) and MAE (0.89) […]” And RRMSE?
[Line 420] “[…] even with extended lead times.” In fact, ‘especially’ would be a better choice of word here than ‘even’, as it performs best across all metrics and for both cases? This relates to similar comments regarding lead time for the flood cases.
[Line 425-426] Related to earlier comment on Figure 6; a calculated comparison (e.g. difference map) would help the reader to follow this statement.
[Line 426] “[…] particularly for the nowcast and medium-range forecasts.” Isn’t that counterintuitive, as the error metrics seem to imply the opposite? Looking at the figures closely also seems to indicate the opposite, although it is hard to make a definitive statement without a calculated comparison (see comment directly above). The nowcast seems to have more small isolated water bodies (or noise?), which do not seem to be present in the observations?
[Line 426-428] “[…] using the streamflow forecasts […]” Don’t all of these use those forecasts? Or is this about the distinction between nowcast and forecasts?
[Line 426-428] “[…] show some minor deviations […]” Such as? Pointing to an example would help the reader here.
[Line 426-428] “[…] inherent uncertainties […]” Similar comment as previously on the flood case; it would be worthwhile to assess this quantitatively. It can of course be expected that there are errors in those predictions, but how large are they and what is their influence here? And could there be any other factors?
[Line 429] “[…] low flow forecasting.” Does FIER also forecasts flows? These have not been shown so far. If so, it could be a valuable addition. If not, it might be considered to rephrase to something akin ‘during low flow conditions’. This also holds for the same wording in line 431.
[Line 431-433] FIER can forecast water fractions. During flood conditions these can serve as flood maps, forecasting where inundation could take place, which can definitely be useful information for decision makers. However, this might be less apparent during droughts. How would water fraction maps help “water management decisions, mitigating drought impacts, and ensuring sustainable water resource allocation”? What are the protocols or operational practice in this region? In many regions of the world these are based on streamflow, groundwater levels and/or lake/reservoir volumes, which are not an output of FIER. Outlining how the information from FIER would be used in practice would help strengthen this statement.
[Line 456-457] Figure 4 does not explicitly show that “discontinuities at watershed boundaries” or “abrupt transitions” are mitigated, it shows aggregated error metrics. We have in fact not seen an example of those discontinuities, so it might be worthwhile to include that.
[Line 464] “[…] allows for finer spatial resolution […]” Is FIER spatial resolution dependent on the area it covers (not the input data)? If so, this is new information and it would be good to include that earlier in the text.
[Line 479] “[…] calibration […]” Sorry for nitpicking again, but while FIER might not require calibration in the traditional sense, it has just been described that the “optimal watershed scale and buffer size” would “require careful evaluation”, which is not completely unlike calibration.
[Line 479] “[…] data independence […]” Hopefully the last nitpicking comment. Related to some previous comments on this; it’s a specific type of data that FIER is independent of, while it requires other data. Good to make that distinction.
[Line 481] one of “areas” or “regions” is redundant here, choose one.
[Line 485-487] Couldn’t changes in hydrologic conditions also affect the patterns learnt by FIER, thus causing results under changed conditions to be less accurate? Also, this would require a hydrological model coupled to FIER, as such conditions aren’t input for FIER itself? Finally, it’s not directly clear how such a thing could inform flood risk assessments or reservoir building/operations, but we might just have to wait for Do et al. to be published.
[Line 487] “[…] assessing the effectiveness of flood control measures […]” Won’t measures (e.g. new infrastructure, temporary defenses) directly affect the spatial extent of the flood and thus make FIER maps invalid? Related to a similar general comment.
[Line 489-490] Has it been tested if FIER can produce (accurate) maps for return periods, especially on the higher end (e.g. 100 year), that it hasn’t seen in the data yet? Because it can be those return periods which are most relevant regarding climate change and planning.
[Line 507-508] This is a very good point indeed. It might already help readers if a simple comparison with those type of models and NWM is made (e.g. spatial and temporal resolution, time step, lead times).
[Line 510-512] It might be good to mention whether this would be directly possible with the current FIER framework, or whether it would require some adjustments.
[Line 542] “[…] event-based forecasting […]” Maybe just semantics, but isn’t FIER more of a continuous operational forecasting approach, since there is no event-based spin-up or calibration involved? (which actually might have more value for operational use than a purely event-based approach)
[Line 554-555] Same comments as for lines 348-349.
[Line 568] Great that more information is included in an appendix.
[Line 575] It would help the readers to explain briefly what are the implications of positive and negative values in the RSM maps.
[Line 586-587] “[…] high NSE values (0.61, 0.77, and 0.63) […]” Whether a NSE value is ‘high’ can be subjective, with different research(ers) giving different classifications on NSE scores. Some indeed state values above 0.5 as ‘good’, while others reserve this for values above 0.65 (which is a threshold two of the three stated values don’t reach) or even higher. As such, this sentence could perhaps do with more careful phrasing.
[Line 590, Figure A1] The scatter plots of RTPCs 01 and 02 seem to indicate a strong link between the fitted model and (a single?) high value(s) in the data. The curve of the fit also seems to suggest that this results in a RTPC value that normally belong to much lower streamflow values. Is this indeed reading the graphs correctly? This is not mentioned in the text at all. Has the sensitivity and influence of this been tested (perhaps in previous FIER research)? What would the implications of this be?
[Line 590, Figure A1] The scatter plots of RTPCs seem to indicate that different fitting functions might be applicable to different RTPCs. For example, the data in RTCP-01 seems to show a linear break pattern. Has this been tested (perhaps in previous FIER research)? What would the implications of this be?
[Line 596-615] This seems contradictory to the main text, where we’ve seen that smaller buffer sizes yield better results. This is not mentioned here, while this warrants at least a notion, or better yet: a good explanation.
[Line 605-606] Could this be because streamflow for the study area is mainly driven by what comes in from upstream through the main river channel(s)? And that this relationship could be (more) affected by the buffer size if there was more locally generated streamflow? It might help the reader to gauge this if more information on the hydrology of the study area was provided (see earlier comments on this, e.g. line 141)
[Line 630-631] Great that this is made publicly available.
Citation: https://doi.org/10.5194/egusphere-2024-3491-RC1 -
AC1: 'Reply on RC1', Kel Markert, 23 Jun 2025
We sincerely thank you for your detailed and insightful review of our manuscript. Your comments are highly valuable and have helped us identify several areas where the clarity, justification, and overall impact of our work can be significantly improved. We appreciate your positive feedback on the relevance of the topic, the well-structured text, the open-source nature of our code, and the focus on operational applicability.
We have carefully considered all your general and detailed comments and have made corresponding revisions to the manuscript. Below, we address each of your points and describe the changes made.
General Comments:
- Abstract Clarity (Data-driven vs. data-scarce, metrics, buffer sizes):
- We agree that the distinction between "data-driven" and "data-scarce" could be clearer. We have revised the abstract to specify that FIER is data-driven with respect to historical satellite imagery and streamflow, but offers a solution in regions typically considered "data-scarce" for traditional hydrodynamic modeling (i.e., lacking detailed bathymetry, friction coefficients, etc.).
- Regarding metrics and buffer sizes in the abstract: We have aimed for a balance. We've opted to remove the key metric ranges as they highlight the quantitative improvement but added a little more context with significance tests used and clarified the post-processing method. We have stated the need for the buffers in the abstract but feel that adding context for size of buffers would not add value so opted to keep as is. We believe this provides key takeaways with a high level overview without overly lengthening the abstract.
- Mixed Signals on Hydraulic/Hydrologic Models:
- Thank you for pointing this out. We have revised the introduction (Section 1) to acknowledge upfront there are errors with both hydrologic and hydrodynamic modeling. Furthermore we clarify that while FIER leverages hydrological model outputs (like NWM), these models also have their own uncertainties which can influence FIER outputs. We now clarify that FIER's advantage lies in not requiring the user to develop and calibrate a complex hydrodynamic model from scratch for inundation mapping, instead utilizing existing, often operational, streamflow forecasts.
- Impact of Climate Change/Infrastructure on a Data-Driven Method:
- This is a very important point. We have added a discussion in "Caveats and Limitations" (Section 5.3) acknowledging that FIER, being data-driven, learns from historical patterns. Significant long-term changes in hydrological regimes (due to climate change) or floodplain characteristics (due to new infrastructure or defenses) that deviate substantially from the training period could reduce the accuracy of FIER predictions. We state that periodic retraining to account for such non-stationarity would be necessary.
- Study Area Description:
- We concur that a more detailed description would benefit readers. We have expanded Section 3.1 ("Study Area") to include more specifics on the hydrology of the study area, its main flood drivers (e.g., contributions from major tributaries like the Ohio, influence of snowmelt vs. rainfall), more information on infrastructure like reservoirs and levees, and the values of of upstream inflows. We added additional context on droughts to provide context and justification of the study area for low flow predictions.
- Explanation and Benchmarking of Error Metrics:
- We acknowledge the challenge of defining absolute benchmarks for flood inundation map accuracy, as these are often application-specific and not universally established. In the revised manuscript (primarily in Section 4.1 "Statistical Analysis" and the Discussion), we emphasize that our evaluation focuses on relative performance (watershed-based approach vs. baseline, impact of buffers, effect of post-processing) and consistency across different forecast lead times. We have added text to Section 4.1 to briefly contextualize why validating water fraction maps is difficult compared to previous research and evaluation guidance. The manuscript states the metrics like SSIM and RMSE/MAE signify in terms of capturing spatial patterns and water fraction intensity. While we cannot provide definitive "satisfactory thresholds," we put language in the text that practitioners should determine what is good enough for operational implementations..
- Low-Flow Use Case:
- We appreciate your perspective on the low-flow case. We agree its primary strength in this manuscript is not to position FIER as a drought forecasting tool, but rather to demonstrate its robustness for simulating water for all cases and operational use. We have revised the introduction to frame the objectives of general inundation mapping for both flood and low-flow conditions. We updated discussion (Section 5) to highlight that FIER's ability to not generate spurious flood signals during low-flow conditions, and to reasonably represent surface water extent, increases confidence in its operational reliability across a range of hydrological conditions. This is important because an operational system should ideally perform sensibly whether flows are high or low.
Detailed Comments:
- [Line 29] Buffer values in abstract: We have added context to the abstract when introducing the buffer approach. Going into details in the abstract will make it too long so we opted to leave that for the text.
- [Line 66] “indeed” a “promising avenue”: We have rephrased this to avoid the unsubstantiated "indeed” and be more clear about data inputs from these examples.
- [Line 96] FIER complexity vs. hydrodynamic models: We've clarified this. FIER avoids the complexity of hydrodynamic model parameterization and calibration. While FIER has its own setup, the provision of open-source scripts aims to lower the barrier to entry.
- [Line 96] Computational needs: We've clarified that FIER's main computational load is during the training phase (REOF analysis on historical imagery). Predictions are relatively fast.
- [Line 141] More info on study area: Addressed in General Comment 4. We have expanded Section 3.1 to specifically identify the inflow reaches and provide more details on the reservoirs and infrastructure.
- [Line 145] (In)flows of rivers: Addressed above.
- [Line 146] Reservoirs location/characteristics: Addressed above.
- [Line 147] Localized rainfall/snowmelt vs. upstream: Addressed above.
- [Line 148] Streamflow trend in Fig 3: Clarified in section 3.1 that Yin et al. (2023) refers to broader, longer-term regional trends and clarified that the Mississippi River does experience droughts reported by recent research. While Figure 3 shows a specific reach and a shorter operational period which can exhibit different short-term variability, we also clarified this distinction in section 3.5.
- [Line 155] Section 2.3: Thank you for identifying the error. Corrected to 3.3.
- [Line 174] VIIRS end date: Thank you for identifying the error. We have included the end date for analysis.
- [Line 196] Basin selection: Clarified that HUC8 watersheds were selected by intersecting the baseline area using a 50km buffer.
- [Line 201-202] Buffer sizes vs. watershed sizes: We clarified this relationship wasn't systematically tested. We also acknowledged this as future work in discussion). The chosen buffers represent a range from no buffer to a substantial one (50km can span across smaller HUC8s).
- [Line 202-203] Duplicate sentence: Thank you for the suggestions. We removed the duplicate.
- [Line 205] Why 99.9% and not 100% clear sky: We found this to be a practical threshold to avoid discarding nearly perfect scenes due to a few isolated bad pixels which commonly occur with the automated QAQC processing. We updated the text to be more clear on why we used this threshold.
- [Line 205-206] Percentage of data kept/removed: The percentage of data kept varies by watershed but we provided a range of data kept for the training and evaluation datasets.
- [Line 205-206] Cloud assessment/impact: Acknowledged that cloud masking is imperfect. We expanded on the general quality of satellite-derived water in section 5.3 and noted the issues with cloud masking.
- [Line 205-206] VIIRS vs. Sentinel-1 SAR: VIIRS was chosen for its long, consistent daily historical record, crucial for REOF pattern extraction over many years. Recent research by Rostami et al., 2024 illustrated that daily imagery is important for extracting the patterns via REOF over a long period. While SAR has cloud penetration properties, VIIRS’ historical archive for consistent, frequent, large-area coverage suitable for this type of long-term analysis might be more variable.
- [Line 259-261] Low flow dates 2019-2020: Added clarification in the text that these were chose largely due to data availability. These years were selected because they included operational NWM forecasts starting in late 2018 and overlap with available VIIRS imagery which had a substantial gap in data from 2021-01-01 to 2023-08-10.
- [Line 267-269] "Evaluating" return periods: Rephrased to clarify that we only used the return periods to determine flooding events for the operational NWM data.
- [Line 271-272] High flow dates 2018-2020: We addressed this earlier stating the selected dates were based on data availability and coincidence of the NWM operational and VIIRSd data used.
- [Line 281-282] Daily timestep for FIER: Acknowledged that NWM is sub-daily. For this initial scaling study, daily aggregation was a simplification. We clarified that FIER could be run with sub-daily time steps and is something to consider for future work.
- [Line 282-285] Ensemble averaging: Similar to above, averaging was done as a simplification for this study. We added a sentence to clarify this.
- [Line 293-294] Performance (computational?): Clarified this refers to statistical performance metrics. Computational aspects are discussed in the discussion.
- [Line 296-297] Sentence split: Thank you for the suggestion, we agree and split for clarity.
- [Line 297-298] RMSE and MAE comparison: Re-evaluated based on Figure 4. We removed the sentence as this caused confusion.
- [Line 304-305] SSIM pattern with post-processing: We intentionally left out SSIM in this statement as the increase in SSIM shows improvement whereas an increase in RMSE/RRMSE indicates worse performance. We added this point for clarification.
- [Line 308-310] Cause of increased errors with post-processing: Hypothesized that quantile mapping can amplify outliers if distributions differ or if the underlying relationship is imperfect. We added information to clarify this point but more investigation is warranted to prove this point.
- [Line 348-349, L355-367] Comparison with post-processed baseline / CDF matching improvement: Clarified that comparisons of post-processed mosaicked results were with the post-processed baseline. The statement about CDF matching improving accuracy refers to the mosaicked results relative to their original counterparts, and then their performance relative to the (often original, better) baseline.
- [Line 380-381] RMSE increase with post-processing (case studies): Figure 4 shows that for 1km buffer, corrected mosaicked RMSE (blue line at 1km) is slightly higher than original mosaicked RMSE (green line at 1km). The statement refers to the corrected outputs being generally better across SSIM, RMSE, MAE compared to the baseline or their original counterparts in terms of overall utility for selection.
- [Line 391-392] "however" for smaller extent: Thank you for the suggestion. Rephrased to highlight the similarity between the cases if the pattern is consistent.
- [Line 391-392] More water in long-range (Southern half): Acknowledged this is much more complex than simply “less water” we updated the text to note the complexity.
- [Line 392] Smaller extent implies more uncertainty: Clarified in previous point. The spatial differences in NWM streamflow forecasts influence the FIER outputs suggesting at the uncertainty.
- [Line 393-394] RMSE long-range (2019 event): Thank you for pointing out the error. We updated the text to be correct.
- [Line 396-397, L398, L398-399, L400] FIER performance with long-range / subjectivity / accuracy claims: Toned down subjective wording. Focused on observations. We acknowledge that "high accuracy" is relative.
- [Line 401-403, L426-428] NWM errors quantitative assessment: We agree that it would be worthwhile to assess the NWM streamflow forecast quantitatively. However, we feel this is out of scope for the current work and noted this as future work. There are whole papers dedicated to the quantification of NWM streamflow outputs. Additionally the nature of FIER, using multiple stream reaches as inputs into a watershed prediction make this a challenge with the current experimental design and gauge data availability to compare against.
- [Line 405, Figure 6] Difference maps: Good suggestion we included appendix B with the difference maps for reference and comparison with Figures 6 & 7.
- [Line 405, Figure 6] Streams cut off at basin boundary: FIER predictions are not necessarily influenced by upstream contributions. The image predictions are influenced by only the images input. However, the streamflow predictions used to simulate water maps are influenced by upstream tributaries but this is captured by the NWM. So, no cutoffs to streams / water areas in the watersheds will have no influence on the FIER outputs.
- [Line 419-420] RRMSE for low flow: Correct RRMSE is lowest for the long-range forecast in this case. We added RRMSE to the statement if it follows the same pattern.
- [Line 420] "even" vs. "especially" for extended lead times (low flow): Thanks for the suggestion, we changed to "especially" as it performs best.
- [Line 425-426] Visual comparison (low flow): Refer to difference maps.
- [Line 426] Nowcast noise (low flow): Correct, this is counterintuitive. We removed this sentence to avoid confusion for the reader and to not have statements conflicting with the accuracy assessment.
- [Line 426-428] "streamflow forecasts" (low flow): Clarified that this statement is for all FIER outputs using the operational NWM streamflow outputs (nowcasts and forecasts)
- [Line 426-428] "minor deviations" (low flow): Pointed readers to the example of the eastern side of the study regions for the 2019-09-25 where there are small water bodies that are present or missing in the different forecasts.
- [Line 429] "low flow forecasting": Rephrased.
- [Line 431-433] FIER utility for drought: Explained how spatial water extent maps are useful focusing on the surface water dynamic especially with hypothetical simulations for decision making. Example provided is simulating inundation dynamics due to dam construction Do et al., 2025.
- [Line 456-457] Figure 4 discontinuities: You are correct. We reworded the sentence.
- [Line 464] Finer spatial resolution: Clarified this means capturing finer dynamics, not pixel size.
- [Line 479] Calibration: Acknowledged similarity but highlighted difference from traditional calibration.
- [Line 479] Data independence: We added clarifying language.
- [Line 481] Redundant word: Thank you for pointing this out, we fixed the sentence.
- [L485-487, L487] Changes affecting FIER / flood control measures: Linked to general comment on climate change. Yes, these would degrade performance if patterns change significantly and we added a sentence to clarify this. In the context of Do et al., 2025, FIER was used to simulate the Tonle Sap floodplain where dams were inserted upstream from a major tributary system, the Sekong, Sesan and Srepok (3S) Basin in the Lower Mekong. Including dams would significantly alter the downstream streamflow into the floodplain where FIER was used to estimate the inundation extent. No control structures were directly inserted where FIER was predicting the surface water.
- [Line 489-490] Untested return periods: No, FIER has not been tested on how well the approach predicts out of the training range. We acknowledged this limitation.
- [Line 507-508] Comparison with other models (GEOGLOWS, NWM): Briefly added context about GEOGLOWS and the difference with NWM
- [Line 510-512] Other hydro variables in FIER: No adjustments are needed for the processing framework as it can handle arbitrary hydrologic inputs for fitting but would require the user to prepare the data as inputs which is currently not implemented in the scripts shared. We added a sentence to clarify this point.
- [Line 542] "Event-based forecasting": Yes, this is semantics. We rephrased to “operational flood forecasting” to distinguish between event-based and even long-term scenario-based modeling.
- [Line 554-555] Same as L348-349: Addressed above.
- [Line 568] Appendix: Acknowledged and thank you.
- [Line 575] RSM positive/negative values: We added further explanation in the caption to identify what the colors represent.
- [Line 586-587] "High NSE": We rephrased the sentence to be less strong with considering if the NSE is high or not.
- [Line 590, Figure A1] Scatter plots/outliers/fitting functions: Acknowledged. Linear regression was for simplicity and computational efficiency, consistent with prior work. More complex approaches using deep learning are being used for FIER moving forward (see Rostami et al., 2025 and Wan et al., 2025 (https://doi.org/10.1016/j.envsoft.2025.106562). We added information to the text for discussion.
- [Line 596-615] Appendix vs. main text on buffer sizes (contradiction?): We clarified that Appendix A discusses REOF model fitting statistics (Pearson's R, NSE of regression), while the main text discusses final inundation map accuracy (SSIM, RMSE). Better REOF component fitting doesn't automatically mean better final map accuracy if other factors (like over-smoothing) come into play.
- [Line 605-606] Streamflow drivers and buffer size impact on Pearson's R: This is a good hypothesis. We do not have any data to support this but would be something interesting to investigate in future research. We updated the study area to provide more context on inflows but decided to leave the text as is.
- [Line 630-631] Code availability: Acknowledged and thank you.
We believe these revisions address the concerns raised and substantially strengthen the manuscript. We look forward to your further feedback.
Citation: https://doi.org/10.5194/egusphere-2024-3491-AC1
-
AC1: 'Reply on RC1', Kel Markert, 23 Jun 2025
-
RC2: 'Comment on egusphere-2024-3491', Wolfgang Wagner, 21 Mar 2025
This is an interesting study that investigates an approach to scale a method (FIER) published by Chang et al. (2020) to larger areas. The aim of the FIER method is to predict flood inundation extents using historical satellite imagery and streamflow data, without any need for complex hydrodynamic models. FIER stands for “Forecasting Inundation Extents using REOF (Rotated Empirical Orthogonal Function),” and it is a purely statistical approach without physical constraints. As a result, FIER has only been applied to specific regions so far, and there remain questions about its generalizability.
Therefore, in this study, the authors explored the feasibility of applying FIER in a manner that creates a consistent flood forecast for a large area, specifically the Upper Mississippi Alluvial Plain. This was achieved by creating individual FIER models based on watershed boundaries and then mosaicking the results geographically to generate large-scale flood inundation predictions. Given the nature of this method, it is not really possible to understand where it works well and where not. Nonetheless, the presented results are quite convincing, showing both high and low-flow situations.
My only comment is that there is one more important caveat should be addressed, namely the fact that the satellite derived water fraction product can of course not be perfect. Water fraction estimates are subject to errors that depend on land cover types. In areas like dense forests or urban environments, satellites may not even be able to detect water. Therefore, this method relies on an imperfect input dataset to predict potentially imperfect and incomplete inundation areas. Therefore, please add a discussion of the uncertainties of the VIIRS water fraction product and what this implies for real-world hydrological applications.
Citation: https://doi.org/10.5194/egusphere-2024-3491-RC2 -
AC2: 'Reply on RC2', Kel Markert, 23 Jun 2025
We thank you for your positive feedback on our study and for raising the important caveat concerning the uncertainties in the satellite-derived surface water extents. We agree that this is a crucial aspect that warrants explicit discussion, as the quality of input data directly influences the performance and reliability of any data-driven model like FIER.
To address this, we have incorporated a more detailed discussion on these uncertainties within the "Caveats and Limitations" section (Section 5.3) of the manuscript at lines 685-688 and 689-704. Furthermore, we have expanded the "Future Work" section (Section 5.4) to underscore the importance of research into mitigating these input data uncertainties at lines 721-727.
Citation: https://doi.org/10.5194/egusphere-2024-3491-AC2
-
AC2: 'Reply on RC2', Kel Markert, 23 Jun 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
472 | 120 | 25 | 617 | 24 | 36 |
- HTML: 472
- PDF: 120
- XML: 25
- Total: 617
- BibTeX: 24
- EndNote: 36
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1