Evaluating the effects of preprocessing, method selection, and hyperparameter tuning on SAR-based flood mapping and water depth estimation
Abstract. Flood mapping and water depth estimation from Synthetic Aperture Radar (SAR) imagery are crucial for calibrating and validating hydraulic models. This study uses SAR imagery to evaluate various preprocessing (especially speckle noise reduction), flood mapping, and water depth estimation methods. The impact of the choice of method at different steps and its hyperparameters is studied by considering an ensemble of preprocessed images, flood maps, and water depth fields.
The evaluation is conducted for two flood events on the Garonne River (France) in 2019 and 2021, using hydrodynamic simulations and in-situ observations as reference data. Results show that the speckle filtering method choice can significantly alter flood extent estimations with variations of several square kilometers. Additionally, the selection and tuning of flood mapping methods significantly affect performance. While supervised methods outperformed unsupervised ones, well-tuned unsupervised approaches (such as local thresholding or change detection) can achieve comparable results. The compounded uncertainty from preprocessing and flood mapping steps also introduces substantial variability in the water depth field estimates.
This study highlights the importance of considering the entire processing pipeline, encompassing preprocessing, flood mapping, and water depth estimation methods and their associated hyperparameters. Rather than relying on a single configuration, adopting an ensemble approach and accounting for methodological uncertainty should be privileged. For flood mapping, the method choice has the most influence. For water depth estimation, the most influential processing step was the flood map input resulting from the flood mapping step and the hyperparameters of the methods.
This is a very detailed and interesting study that compares different SAR based flood extent and water depth estimation techniques. The study is carried out for two flood events (December 2019 and February 2021) in the floodplains of the Garonne river in France. An impressive number of simulations were carried out using different SAR preprocessing approaches, models and model parameterizations and assessed using high quality hydraulic model outputs and observed watermarks. In my view this is much needed and highly valuable study investigating the strength and weaknesses of different algorithms in SAR data processing + flood mapping + water depth estimation. However, I have several major and minor comments.
MAJOR COMMENTS
MINOR COMMENTS
Section 1: What is the definition of a “hyperparameter”? What makes it different from a “normal” model parameter?
Section 1: There are some recent studies that investigated the effect of different model parameters on flood mapping accuracies (e.g. recent studies investigating different change detection algorithms). Please review the literature and relate this work to the existing publications (also come back to this point in the discussion section).
End of section 1 / beginning of section 2: Check for repetitions
Line 86: Why are so many configurations tested for the local threshold approach (36 versus 2 / 2 / 6 configurations for the other three methods). Does this mean that the threshold approach has advantages?
Figure 3: Show location of in situ sites
Line 163: Only in a narrow sense I would agree to this statement: “The main source of error in SAR imagery is speckle noise …”. In practice, there are many physical reasons for uncertainties in the SAR derived flood maps.
Line 248: Visually, the SAR2SAR filter looks indeed nicer than the other filters. But are there any quantitative indicators that can substantiate that SAR2SAR “outperforms the traditional methods”? How much of this filtered image is invented, how much of it is true?
Figure 6: These are the VH images, right?
Figure 7: Use the same y-axis for a direct comparison
Line 349: How many flood cases of the Sen1Flood11 cases show similar conditions as for the Garonne river flood.
Figure 9: I find the spread of the results for different algorithm / flood case combinations surprisingly low. What is the reason for this? Does this also reflect different pre-processing options?
Line 535: Some grasslands may cause “water-look-alike conditions”, but normally vegetation causes a loss of sensitivity of backscatter to flooding.