the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessment of gap-filling techniques applied to satellite phytoplankton composition products for the Atlantic Ocean
Abstract. Phytoplankton are vital to marine biogeochemical cycles and form the base of the marine food web. Comprehensive datasets offering a spatiotemporal perspective on phytoplankton composition are essential for assessing the impacts of climate change on marine ecosystems. Phytoplankton functional types (PFTs) classify phytoplankton based on their biogeochemical functions, enabling assessments of nutrient cycling, primary productivity, and ecosystem structure. However, satellite-derived ocean colour products like PFTs chlorophyll-a (Chla) concentrations are challenged by limited temporal and spatial coverage due to the exclusion of data collected under non-optimal observing conditions such as strong sun glint, clouds, thick aerosols, straylight, and large viewing angles or due to the specific sensor configuration and sensor malfunction. This highlights the importance of gap-filling techniques for producing consistent datasets, which are currently missing for operational data sets. This study evaluates two robust gap-filling methods for satellite observations: Data Interpolating Empirical Orthogonal Functions (DINEOF) and Data Interpolating Convolutional Auto Encoder (DINCAE). These methods were applied to Sentinel 3A/B OLCI-derived Chla concentration products in several regions of the Atlantic Ocean over three years of data, including total chlorophyll-a (TChla) and Chla concentration of five major PFTs, namely diatoms, dinoflagellates, haptophytes, green algae, and prokaryotic phytoplankton. The reconstructed datasets were assessed using test dataset evaluation and validated with in situ measurements collected during the transatlantic RV Polarstern expedition PS113 in 2018. The test dataset evaluation indicates that DINCAE outperforms DINEOF, particularly in capturing transient-scale features. DINCAE achieves an average root-mean-square-logarithmic-error (RMSLE) in cross-validation that is 66 % lower for TChla and 16 % lower for PFTs compared to DINEOF. However, external validation using in situ measurements indicates better performance for DINEOF than DINCAE, with improved regression metrics for PFTs, including a 12.5 % better slope, 13.6 % better intercept, and 68 % higher coefficient of determination (R²). The gap-filled datasets exhibit slightly reduced but still robust accuracy compared to the original satellite data while preserving statistical trends, improving spatial structure restoration, and increasing matchup data for validation. It is concluded that DINCAE and DINEOF each have unique strengths for gap-filling ocean colour products. DINCAE performs well in complex water bodies, effectively reproducing patterns from the original satellite product. In contrast, DINEOF shows higher overall reliability, supported by independent validation, and is better suited for larger areas due to its lower computational demands.
- Preprint
(7400 KB) - Metadata XML
-
Supplement
(1446 KB) - BibTeX
- EndNote
Status: open (until 07 May 2025)
-
CEC1: 'Comment on egusphere-2025-112 - No compliance with the policy of the journal', Juan Antonio Añel, 21 Mar 2025
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived several parts of code that you use in your manuscript in GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the links and permanent identifiers (e.g. DOI) of the new repositories.
Additionally, you state that you have not been able to share one of the datasets because its size is larger than 100 GB. This is not a big amount of data, and you can easily store it in, for example, two Zenodo repositories. Therefore, we need that you reply to this comment with the size of the dataset, to assess what prevents you to share it, and if it can qualify for an exception to our policy.
Please, note that you must address these issues and reply to this comment as soon as possible. If you do not do it, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-112-CEC1 -
AC1: 'Reply on CEC1', Ehsan Mehdipour, 26 Mar 2025
reply
Dear Editor,
Thank you for your message and for highlighting the requirements of the “Code and Data Policy.”
We would like to clarify that the full codes and part of the data used for the independent validation are already archived and publicly available through Zenodo (links below), with the DOI and permanent link included in the manuscript itself. We understand that GitHub is not considered a suitable long-term archive, and that is why we have already used Zenodo for this purpose. We have not included these links in the assets section on the journal’s website, but we will gladly do so to ensure full compliance with the policy.
•Code: https://doi.org/10.5281/zenodo.14905369
•Independent validation data: https://doi.org/10.5281/zenodo.14905558
Additionally, we are prepared to upload the remaining data as well, should it be necessary. The dataset in question is approximately 100 GB, and if required, we can share it via Zenodo in multiple parts. Please let us know if this dataset is required for publication—if so, we will begin the upload process immediately and provide the corresponding links as soon as possible. We will also update the ‘Code and Data Availability’ section of the manuscript to explicitly list all Zenodo DOIs and ensure they are also included in the assets submitted via the journal’s system.
Thank you again for your guidance, and please don’t hesitate to let us know if any further changes are needed.
Best regards,
Ehsan Mehdipour
[On behalf of all co-authors]
Citation: https://doi.org/10.5194/egusphere-2025-112-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 26 Mar 2025
reply
Dear authors,
Thanks for your reply. Please, in any future version of your manuscript remove the citation to the GitHub sites, and include instead the only valid repositories.
If your dataset is 100 GB you must share in a suitable repository, as the size is nothing exceptional. Please, reply to this comment with the link and permanent identifier (for example DOI or handle) of such repository as soon as you have published it.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-112-CEC2 -
AC2: 'Reply on CEC2', Ehsan Mehdipour, 01 Apr 2025
reply
Dear Chief Editor,
Thank you for your reply. We will ensure that all references to GitHub are removed from any future version of the manuscript. The links to the relevant repositories, along with their corresponding permanent identifiers, are provided below and will also be included in the revised version of the manuscript.
Codes: https://zenodo.org/records/14905369 (DOI: 10.5281/zenodo.14905369)
Dataset 1 (Full reconstructed dataset using DINEOF gap-filling method): https://zenodo.org/records/15095368 (DOI: 10.5281/zenodo.15095368)
Dataset 2 (Full reconstructed dataset using DINCAE gap-filling method): https://zenodo.org/records/15102826 (DOI: 10.5281/zenodo.15102826)
Dataset 3 (Merged reconstructed datasets used for independent validation): https://zenodo.org/records/14905558 (DOI: 10.5281/zenodo.14905558)
Please let us know if any additional datasets or documentation are required.
Best regards,
Ehsan Mehdipour
[On behalf of all co-authors]
Citation: https://doi.org/10.5194/egusphere-2025-112-AC2
-
AC2: 'Reply on CEC2', Ehsan Mehdipour, 01 Apr 2025
reply
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 26 Mar 2025
reply
-
AC1: 'Reply on CEC1', Ehsan Mehdipour, 26 Mar 2025
reply
Interactive computing environment
Preprocessing, processing and post-processing scripts and environment Ehsan Mehdipour https://github.com/EhsanMehdipour/PFT_gapfilling
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
119 | 53 | 8 | 180 | 14 | 2 | 3 |
- HTML: 119
- PDF: 53
- XML: 8
- Total: 180
- Supplement: 14
- BibTeX: 2
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 74 | 34 |
Germany | 2 | 36 | 16 |
France | 3 | 25 | 11 |
China | 4 | 13 | 6 |
United Kingdom | 5 | 6 | 2 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 74