Refining Predictive Models for Sea Surface Currents: A Focus on Variable Configuration and Time Sequence Analysis
Abstract. Accurate prediction of sea surface currents is crucial for understanding ocean dynamics, climate variability, and marine ecosystem health. Despite advancements in statistical modeling, challenges remain in terms of optimizing model parameters and variable configurations to enhance prediction accuracy. This study employed high-frequency (HF) radar data from the Bali Strait (2018–2021) to develop a statistical modeling approach for sea surface current prediction. We utilize random forest regression (RFR) as the primary machine learning technique. The data were subjected to a rigorous preprocessing pipeline to ensure robustness, including selection, cleaning, and imputation. We define 11 distinct model configurations with various input parameters, such as moving averages (avgh3, avgh6, or avgh12) and previous day values (h-24, h-48, and h-72). Our analysis focused on three prediction schemes: seasonal (P1) and monthly (P2 and P3), each with tailored training and testing data allocations. This study evaluates the models using root mean square error (RMSE) and Coefficient of Determination (R2). Results indicate that combining moving-average predictors significantly enhances the accuracy of long-term forecasts, whereas short-term predictions benefit from utilizing recent data. Our findings highlight specific variable configurations, particularly those incorporating moving averages, which lead to superior performance in sea surface current prediction. The results indicate that models employing configurations F1, F5, and F8 yield the best results, highlighting the importance of optimizing model variables to achieve high-accuracy predictions.