the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HIDRA-D: deep-learning model for dense sea level forecasting using sparse altimetry and tide gauge data
Abstract. This paper introduces HIDRA-D, a novel deep-learning model for basin scale dense (gridded) sea level prediction from in situ tide gauge data. Accurate sea level prediction is crucial for coastal risk management, marine operations, and sustainable development. While traditional numerical ocean models are computationally expensive, especially for probabilistic forecasts over many ensemble members, HIDRA-D offers a faster, numerically cheaper, observation-driven alternative. Unlike previous HIDRA models (HIDRA1, HIDRA2 and HIDRA3) that focused on point predictions at tide gauges, HIDRA-D provides dense, two-dimensional, gridded sea level forecasts. The core innovation lies in a new algorithm that effectively leverages sparse and unevenly distributed satellite altimetry data in combination with tide gauge observations, to learn the complex basin-scale dynamics of sea level. HIDRA-D achieves this by integrating a HIDRA3 module for point predictions at tide gauges with a novel Dense decoder module, which generates low-frequency spatial components of the sea level field in the Fourier domain, whose Fourier inverse is an hourly sea level forecast over a 3-day horizon. Evaluation in the Adriatic demonstrates that HIDRA-D significantly outperforms the NEMO general circulation model, achieving a 28.0 % reduction in mean absolute error when compared to satellite sea-level anomaly (SLA) data. However, while HIDRA-D performs well in open waters, leave-one-out cross-validation at tide gauges indicates limitations in areas with complex bathymetry, such as the Neretva estuary located in a narrow bay, and in regions with sparse SLA data, like the northern Adriatic. Importantly, the model shows robustness to spatially-limited tide gauge coverage, maintaining acceptable performance even when trained using data from distant stations. This suggests its potential for broader applicability in areas with limited in situ observations.
- Preprint
(1177 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
CEC1: 'Comment on egusphere-2025-3187', Juan Antonio Añel, 28 Jul 2025
reply
-
AC1: 'Reply on CEC1', Marko Rus, 30 Jul 2025
reply
Dear Juan A. Añel,
Thank you for your comment regarding the "Code and Data Policy" for our manuscript. We have uploaded all the required data to a permanent public repository on Zenodo. This dataset includes the observational sea-level data from the IOC Sea Level Station Monitoring Facility (for the Sobra, Stari Grad, and Vela Luka tide gauges), the sea level anomaly (SLA) measurements from the Copernicus Marine Service, and the model output data from our study (the dense_predictions.pth file).
The dataset is publicly available under the following permanent identifier (DOI): 10.5281/zenodo.16599013. We will update the "Code and data availability" section in our revised manuscript.
Best regards,
Marko Rus
Citation: https://doi.org/10.5194/egusphere-2025-3187-AC1
-
AC1: 'Reply on CEC1', Marko Rus, 30 Jul 2025
reply
-
RC1: 'Comment on egusphere-2025-3187', Anonymous Referee #1, 07 Jan 2026
reply
The manuscript presents an interesting study on learning from observational datasets, and the authors propose a novel approach, building on their previous work on the HIDRA family, to generate dense grid predictions. The text is generally well written; however, the model architecture is not explained as clearly as expected, and my major concerns are regarding the evaluation, explainability, and generalization aspects. How can one transfer the model to other regions, and what are the limitations?
Suggesting sections 2.3 and 3.1 to a dataset section, along with the training dataset, to improve clarity and flow. Adding a problem definition subsection before the model architecture would be beneficial. The number of figures and tables is considerably higher than in a typical paper. I suggest reducing them in the main text and relocating additional information to an appendix to avoid distracting from the main flow of the research.
L36: This statement requires an appropriate reference. Also, it is appreciated to mention the limitations and challenges of data-driven models, such as stability and physical fidelity?
L41: There are several successful models for ocean emulation and forecasting. It is unclear why the authors primarily cite their own work to demonstrate the capabilities of data-driven models in ocean applications. Also, Rus et al. is cited repeatedly in some parts of the text, which is somewhat distracting.
L57: The terminologies of sea level are mixed here. SLA and ADT are distinct terms. Some studies use these terms imprecisely; however, when SLA, ADT, MDT, and SSH are discussed together, their distinct definitions should be respected. SSH represents sea level relative to the reference ellipsoid. ADT corresponds to sea level relative to the geoid (i.e., SSH - geoidHeight) and is fundamentally different from SLA, which measures sea level relative to a given mean sea surface (i.e., SSH - MSS); and MDT = MSS - geoidHeight. Hence, it is recommended to use the appropriate terms consistently throughout the paper.
L115: By adding MDT to SLA, we will have ADT? If instantaneous satellite data are required for this task, why are OTC and DAC applied?
L126: It's not clear what adjustments were applied, and is it only for visualization? Weren't they applied for model training and evaluation?
L154-155: This sentence is not clear. Is this HIDRA models' challenge or a general challenge?
Fig.5: GT was not defined. L(s) and their arrows are confusing, perhaps require an explanation in the caption. Is block HIDRA3 frozen, fine-tuned, or jointly trained during the training of model HIDRA-D?
Fig.6: How can we intuitively explain the 2D Fourier domain? The reshape block and how the data are transformed from physical space to Fourier space are not clear.
L231: If b is due to the difference between the vertical reference surface, it should be constant or change due to vertical land movements at the location of tide gauges. Isn't it?
Table 1: Could you also add the performance of Nemo and HIDRA-D against the tide gauge over the testing period?
Fig. 7 & 8: It would be helpful to indicate the RMSE value for each panel in the bottom row.
Sec. 3.2: I suggest adding (or replacing with Fig. 7 or 8) the RMSE contour of HIDRA-D against Nemo for the lead times over the test dataset.
L323: according to Fig. 7, visually, one can observe that the difference between NEMO and HIDRA-D contains processes greater than \lambda_R=150km. Suggest comparing the radially averaged power spectrum of NEMO and HIDRA-D to discuss the spatial scales that the model can capture.
Table 3: For what lead time? As a question, is RMSE simply averaged over all tide gages, or RMSE=sqrt(mean(MSE_i)). The second form should be presented as RMSE total. I suggest adding the performance of a naïve baseline model for comparison with Nemo and HIDRA-D, in which the forecast for the next time step is simply the last observed value (i.e., yp_{t+1} = y_t).
Fig. 10,13,14: I suggest including or replacing the plot with RMSE as a better index for performance assessment. I realised that black and gray colors in the legend refer to dark and light colors, but it's not a good way to show. Please show, e.g., dark and light orange for Nemo (and similarly for HiDRA-D) in the legend or simply remove the black and gray and mention it in the caption.
Fig. 11: Are the time series hourly? Has any smoothing been applied to the tide gauge data? Have you calculated the correlation between tide gauges? I can see the neighbor tide gauges have similar behavior, so we can expect that excluding one tide gauge is unlikely to have a significant impact on model training.
L402: The variants are not clear. Has the grid size changed after reshaping, or was the feature vector dimension modified and then reshaped to 4by4 instead of 5by5?
In the model definition, it would be worth mentioning that the tensor of Fourier components is padded to produce the desired output grid (?).
L418: This is a strong claim, as the Nemo model is evaluated in an autoregressive (AR) mode, whereas HIDRA-D is assessed for single-step forecasting. I think these two evaluation settings are not directly comparable.
-- Some Minors:
Check the first line of the abstract. It's not aligned with the title regarding using satellite altimetry data.
L30: "when modeling using a numerical model", what about data-driven models? Is it only for numerical models?
L31-33: This statement is somewhat overstated. Numerical models do not always rely on ensemble modeling, as this depends on the task. Also, ensembles vary not only in parameters but also in initial conditions, boundary conditions, and forcing fields (?).
L38-41: It is expected that forecasting at a single point would be substantially faster than performing spatiotemporal Nemo forecasting?
L61: "As a rule, "?
L64-66: Please revisit these lines to ensure clarity.
The term “swath” is not commonly used for conventional satellite altimetry data, except for the SWOT dataset. Standard altimeters provide along-track measurements.
L222: "SLA represents the level relative to a reference geoid."?
Citation: https://doi.org/10.5194/egusphere-2025-3187-RC1
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,856 | 128 | 35 | 2,019 | 36 | 55 |
- HTML: 1,856
- PDF: 128
- XML: 35
- Total: 2,019
- BibTeX: 36
- EndNote: 55
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
We have checked your Code and Data availability statement, and we can not accept the links to the sites that you cite for part of the observational data used in your study. Namely, you have to store the data that you take from the IOC SLSMF and the Copernicus Marine Service in a suitable repository according to our policy.
Also, you do not provide a repository containing the output data produced in your work, and you must do it.
Therefore, please, publish the requested data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
I must note that if you do not fix this problem, we can not continue with the peer-review process or accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor