Assessment of gap-filling techniques applied to satellite phytoplankton composition products for the Atlantic Ocean
Abstract. Phytoplankton are vital to marine biogeochemical cycles and form the base of the marine food web. Comprehensive datasets offering a spatiotemporal perspective on phytoplankton composition are essential for assessing the impacts of climate change on marine ecosystems. Phytoplankton functional types (PFTs) classify phytoplankton based on their biogeochemical functions, enabling assessments of nutrient cycling, primary productivity, and ecosystem structure. However, satellite-derived ocean colour products like PFTs chlorophyll-a (Chla) concentrations are challenged by limited temporal and spatial coverage due to the exclusion of data collected under non-optimal observing conditions such as strong sun glint, clouds, thick aerosols, straylight, and large viewing angles or due to the specific sensor configuration and sensor malfunction. This highlights the importance of gap-filling techniques for producing consistent datasets, which are currently missing for operational data sets. This study evaluates two robust gap-filling methods for satellite observations: Data Interpolating Empirical Orthogonal Functions (DINEOF) and Data Interpolating Convolutional Auto Encoder (DINCAE). These methods were applied to Sentinel 3A/B OLCI-derived Chla concentration products in several regions of the Atlantic Ocean over three years of data, including total chlorophyll-a (TChla) and Chla concentration of five major PFTs, namely diatoms, dinoflagellates, haptophytes, green algae, and prokaryotic phytoplankton. The reconstructed datasets were assessed using test dataset evaluation and validated with in situ measurements collected during the transatlantic RV Polarstern expedition PS113 in 2018. The test dataset evaluation indicates that DINCAE outperforms DINEOF, particularly in capturing transient-scale features. DINCAE achieves an average root-mean-square-logarithmic-error (RMSLE) in cross-validation that is 66 % lower for TChla and 16 % lower for PFTs compared to DINEOF. However, external validation using in situ measurements indicates better performance for DINEOF than DINCAE, with improved regression metrics for PFTs, including a 12.5 % better slope, 13.6 % better intercept, and 68 % higher coefficient of determination (R²). The gap-filled datasets exhibit slightly reduced but still robust accuracy compared to the original satellite data while preserving statistical trends, improving spatial structure restoration, and increasing matchup data for validation. It is concluded that DINCAE and DINEOF each have unique strengths for gap-filling ocean colour products. DINCAE performs well in complex water bodies, effectively reproducing patterns from the original satellite product. In contrast, DINEOF shows higher overall reliability, supported by independent validation, and is better suited for larger areas due to its lower computational demands.