Using surface drifters to characterise near-surface ocean dynamics in the southern North Sea: a data-driven approach
Abstract. The large size of traditional drifters limits their ability to mimic the transport of buoyant objects at the ocean surface, which are subject to complex interactions among direct wind drag, fast-moving surface currents, and wave-induced transport. To better capture these dynamics, we track the trajectories of 12 novel, ultra-thin surface drifters deployed in the southern North Sea over 68 days. We adopt a data-driven approach to model drifter velocity using hydrodynamic and atmospheric data, applying both a linear leeway parameterisation and two machine learning models: random forest and support vector regression. Machine learning model-agnostic interpretation techniques reveal that tidal forcing predominantly drives zonal motion, whereas wind is the main driver in the meridional direction in this region. Notably, the wind exhibits a saturation effect, and its contribution to explaining the variance of the drifter velocity decreases at higher speeds. In trajectory prediction experiments, we find that machine learning models, particularly random forest, outperform linear models, with the latter achieving comparable accuracy only at short time scales. Using a hybrid approach and deriving a non-linear function of the wind from machine learning interpretable methods to include in the leeway parameterisation significantly improves the model prediction of the drifter trajectory.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Ocean Science.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
In this paper the authors use different machine learning models to characterize near surface ocean dynamic. Tha authors launched several undrogued surface drifters in the North Sea released from the coast of Netherlands, tracking their position with GNSS. Then, several variables (including variables derived from wind, oceanic currents and waves) from different research products are used as inputs in three machine learning models (linear regression, random forest and support vector machine) to predict drifter velocities. Permutation feature importance and ALE plots are then used to explain the importance of the input variables in predicting the drifter velocities.Â
 The authors claim two different results in the conclusions.
The first one is the efficacy of the proposed analysis method. The use of techniques of explainable machine learning to investigate surface ocean dynamic is interesting and sufficiently novel. I have no objections for this part.
The second one is the accuracy of the proposed method in inferring drifter trajectories. This is, in my opinion, the weakest part of the paper. Albeit the numerical results support the conclusions of the authors, the trajectory dataset is very small, consisting of twelve drifters, released the same day at 250 meters of distance. As can be seen from the figures in the paper, the trajectories are higly correlated, meaning that the dataset lacks the variety needed to ensure sufficient generalization. In this condition the risk of overfitting a model during training is very high, and this problem is neither mentioned nor addressed in the paper.
The reason why the trajectory integrated using the linear model outputs is much more different from the other might be because, due to being a simpler model, it overfitted less than random forest and support vector regression.
I still think that integrating the trajectories using the model outputs is a reasonable benchmark, if the scope of the models is to explain the correlations between input variables and predicted drifter velocities.Â
In order to claim that the model is able to generalize beyond the twelve drifters presented in the paper, a test using some other drifter release (from some other starting position, in some other period) should be necessary.Â
I understand that drifter release is a demanding task, and I am obviously not asking the authors to plan further releases. However, in order to better understand the generalization limits, if other surface drifter trajectories are available to the authors, I suggest to test the trained models to reproduce them. If this is not possible, I expect that these concerns are better addressed in the conclusions.Â
At the very least, the model-integrated trajectories should be compared with trajectories simulated using the ocean velocities given as input to the machine learning models, using some classical integration scheme such as RK4 or RK45.Â
As a last note, even if the models are actually overfitting the data, this is not an issue for the first scope of the paper (predictor-velocity analysis), since the analysis is focused on this particular dataset and has no claim of generalization. Some degree of overfitting might even be considered beneficial.