the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using surface drifters to characterise near-surface ocean dynamics in the southern North Sea: a data-driven approach
Abstract. The large size of traditional drifters limits their ability to mimic the transport of buoyant objects at the ocean surface, which are subject to complex interactions among direct wind drag, fast-moving surface currents, and wave-induced transport. To better capture these dynamics, we track the trajectories of 12 novel, ultra-thin surface drifters deployed in the southern North Sea over 68 days. We adopt a data-driven approach to model drifter velocity using hydrodynamic and atmospheric data, applying both a linear leeway parameterisation and two machine learning models: random forest and support vector regression. Machine learning model-agnostic interpretation techniques reveal that tidal forcing predominantly drives zonal motion, whereas wind is the main driver in the meridional direction in this region. Notably, the wind exhibits a saturation effect, and its contribution to explaining the variance of the drifter velocity decreases at higher speeds. In trajectory prediction experiments, we find that machine learning models, particularly random forest, outperform linear models, with the latter achieving comparable accuracy only at short time scales. Using a hybrid approach and deriving a non-linear function of the wind from machine learning interpretable methods to include in the leeway parameterisation significantly improves the model prediction of the drifter trajectory.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Ocean Science.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(10137 KB) - Metadata XML
-
Supplement
(42526 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-3287', Anonymous Referee #1, 29 Sep 2025
-
RC2: 'Comment on egusphere-2025-3287', Anonymous Referee #2, 03 Oct 2025
This study investigates near-surface ocean dynamics in the southern North Sea using a data-driven approach based on the trajectories of 12 ultra-thin surface drifters deployed over a period of 68 days. The authors combine environmental data, such as wind, wave, and ocean current fields, with a linear leeway parameterisation and two machine learning models to predict drifter velocities. The study identifies the dominant physical drivers of drifter motion. It shows that zonal movement is largely driven by tidal currents, while meridional movement is more strongly influenced by wind. The machine learning models outperform linear models in long-term trajectory prediction, and their interpretability tools reveal important nonlinear effects, such as wind saturation. These insights are then used to improve the linear model by incorporating a nonlinear wind response. The study demonstrates how a data-driven framework, grounded in observational data and enhanced by machine learning, can deepen our understanding of surface transport mechanisms and improve predictive skill in operational oceanography.
The manuscript presents a well-designed and timely contribution that combines novel observational data with explainable machine learning techniques. The drifters used provide rare information on the very surface layer of the ocean, and the methodological approach is described in great detail and with clarity. Figures are clear and support the narrative well. A particular strength is the use of interpretability tools such as permutation feature importance and ALE plots, which link machine learning results to physical understanding. The study is highly relevant for both fundamental ocean dynamics and practical applications such as search-and-rescue, oil spill, and marine litter transport.
General Comments:
Introduction (Section 1):
The introduction emphasizes that the surface drifters used in this study represent buoyant objects better than other drifters. It sounds like a justification is not strictly necessary in such detail. (However, if comparative measurements with other drifter types exist, they should be mentioned here).Specification of model data (Section 2.2):
Building on Point 1: Since the drifter observations are later compared with predictions derived from model data using machine learning methods, it is more important to clarify the characteristics of the model data. From what depths do the model values originate? Are similar/identical depths being compared between the model and the drifter? Referring to “surface” is too general.Variable naming (Section 2 and 3):
It would help readability if variables were mentioned more in the text, but with a coherent/consistent approach (e.g. line 98 and lines 141/142).Quantitative description of results (Sections 4.1.1/4.1.2):
The results section, particularly when describing Figures 2 and 4, would benefit from including more numerical values. For example, in Section 4.1.1, values could illustrate the reversed roles (line 325), and it could be shown that wind is the dominant factor for the meridional component but of comparable magnitude to zonal contributions.Residual zonal velocity (Section 4.1.2):
Why is the residual zonal velocity only described in relation to the total velocity model? Although the effect is smaller, it should be reported for completeness.Discussion and conclusions (Sections 4 and 5):
I support Reviewer #1’s point that the discussion and conclusions are somewhat too short. Even the Random Forest model shows inaccuracies, which are small but should still be acknowledged and discussed. The model data used in Section 2.2 could also be critically assessed. Furthermore, the generalizability of the trained models to other regions should be addressed more explicitly. What aspects are transferable, and which are not? In general, the discussion would benefit from more detail as well as a stronger connection to the existing literature.
As an addition to Reviewer #1: I also do not question the validity of the analysis method. However, if possible, the authors should consider including additional drifter data, either from their own campaigns or from other studies, for example Deyle et al. (2024) (if possible, with surface measurements from 0.5 m).
Error estimation (Appendix A):
I consider the error estimation to be problematic. With an interval of 2 hours, to my understanding, only 24 position values per drifter are available, which is insufficient for a robust error analysis. Are there any other studies available? This would strengthen this part.Consistency of text and figure (Appendix B):
The description in the text does not fully match the figure. The M4 signal is visible in the FFT/PSD as well, not only in the Morlet Wavelet graph. indicates that intervals of 5 min, 30 min, and 3 h are considered, but the figure caption suggests just 3 h. It would also be helpful to indicate the 30 min boundary directly in the Morlet Wavelet graph to support the discussion in the text.Minor Comments:
Line 42: Specify what is meant by “uppermost” already here (as done later in line 71).
Lines 84/85: Please clarify how time differences of 2.5 minutes can occur if the measurement interval is at least 5 minutes (line 81).
Lines 121/122: The difference between the mean wave direction and the bulk wave direction should be explained more clearly.
Lines 325–327: The comparison with the linear model is not sufficiently clear, please clarify.
Line 397: Please explain why including lat/lon as predictors decreased performance.
Figure 9: Caption contains an incorrect color reference for lat/lon, please correct.
Citation: https://doi.org/10.5194/egusphere-2025-3287-RC2
Data sets
North Sea drifter trajectories 2024 Erik van Sebille https://doi.org/10.5281/zenodo.14198921
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,071 | 95 | 17 | 1,183 | 39 | 36 | 47 |
- HTML: 1,071
- PDF: 95
- XML: 17
- Total: 1,183
- Supplement: 39
- BibTeX: 36
- EndNote: 47
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
In this paper the authors use different machine learning models to characterize near surface ocean dynamic. Tha authors launched several undrogued surface drifters in the North Sea released from the coast of Netherlands, tracking their position with GNSS. Then, several variables (including variables derived from wind, oceanic currents and waves) from different research products are used as inputs in three machine learning models (linear regression, random forest and support vector machine) to predict drifter velocities. Permutation feature importance and ALE plots are then used to explain the importance of the input variables in predicting the drifter velocities.
The authors claim two different results in the conclusions.
The first one is the efficacy of the proposed analysis method. The use of techniques of explainable machine learning to investigate surface ocean dynamic is interesting and sufficiently novel. I have no objections for this part.
The second one is the accuracy of the proposed method in inferring drifter trajectories. This is, in my opinion, the weakest part of the paper. Albeit the numerical results support the conclusions of the authors, the trajectory dataset is very small, consisting of twelve drifters, released the same day at 250 meters of distance. As can be seen from the figures in the paper, the trajectories are higly correlated, meaning that the dataset lacks the variety needed to ensure sufficient generalization. In this condition the risk of overfitting a model during training is very high, and this problem is neither mentioned nor addressed in the paper.
The reason why the trajectory integrated using the linear model outputs is much more different from the other might be because, due to being a simpler model, it overfitted less than random forest and support vector regression.
I still think that integrating the trajectories using the model outputs is a reasonable benchmark, if the scope of the models is to explain the correlations between input variables and predicted drifter velocities.
In order to claim that the model is able to generalize beyond the twelve drifters presented in the paper, a test using some other drifter release (from some other starting position, in some other period) should be necessary.
I understand that drifter release is a demanding task, and I am obviously not asking the authors to plan further releases. However, in order to better understand the generalization limits, if other surface drifter trajectories are available to the authors, I suggest to test the trained models to reproduce them. If this is not possible, I expect that these concerns are better addressed in the conclusions.
At the very least, the model-integrated trajectories should be compared with trajectories simulated using the ocean velocities given as input to the machine learning models, using some classical integration scheme such as RK4 or RK45.
As a last note, even if the models are actually overfitting the data, this is not an issue for the first scope of the paper (predictor-velocity analysis), since the analysis is focused on this particular dataset and has no claim of generalization. Some degree of overfitting might even be considered beneficial.