the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning for estimating phytoplankton size structure from satellite ocean color imagery in optically complex Pacific Arctic waters
Abstract. In response to recent advances in satellite ocean color remote sensing, the present study developed a chlorophyll-a size distribution (CSD) model using machine learning (ML) approaches for optically complex Pacific Arctic waters. Previous CSD models capture the spectral features of satellite-estimated phytoplankton absorption coefficient (aph(λ)) through principal component analysis (PCA) and assume a strong correlation between the spectral features and phytoplankton size structure determined from the exponent of the CSD (η). A weakness of this approach is that relies on satellite retrievals of aph(λ), which can be highly uncertain due to the optical effects of water constituents other than phytoplankton. Therefore, we tested the utility of remote sensing reflectance (Rrs(λ)) for directly deriving η and ML methods to identify other viable algorithm formulations besides PCA. Results show superior performance of the ML-based CSD models compared to the PCA-based model utilizing both Rrs(λ) and aph(λ) as predictors of η. Considering the large uncertainties in the inversion of aph(λ) from Rrs(λ), the CSD model with Rrs(λ) based on multivariable linear regression produced the best performance among all models considered. Another key finding is that more complex ML approaches do not always produce more effective models than standard linear regression. Indeed, simple linear regression outperformed other ML approaches for retrieving η directly from Rrs(λ), whereas support vector machine performed the best among diverse ML approaches in the case of aph(λ). Overall, this study found benefits in using Rrs(λ) with ML to improve the retrieval accuracy of η for Pacific Arctic waters.
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-799', Anonymous Referee #1, 28 Aug 2025
-
RC2: 'Comment on egusphere-2025-799', Anonymous Referee #2, 27 Sep 2025
Review of “Machine learning for estimating phytoplankton size structure from satellite ocean color imagery in optically complex Pacific Arctic waters”
This paper is interesting and timely. It tackles an important gap in Arctic Ocean remote sensing: how to retrieve phytoplankton size structure these optically complex waters. The authors explore machine learning to deal with the high uncertainty in going from Rrs to aph. The main finding, that a simple Rrs-based multivariable linear regression model performs best for satellite applications, is significant.
Overall, the study is well-motivated, the methodology is sound, and the results are clear. Revisions are needed, however, to improve the focus and the clarity.
Major comments:
- The paper is slightly unfocused on its main contribution. Is the novelty in the methodology (new CSD model) or in the application (Arctic η distribution)? The Abstract and Introduction should be revised to make it clear.
- The paper discusses the trade-off between model accuracy and interpretability, in terms of machine learning methods. Since the Rrs-based linear regression model performed best for application, the authors should emphasize its transparency and robustness. I recommend expanding the Methods with more detail on this model.
- The strong performance of the ML model with in situ aph reflects a closer fundamental optical link between η and aph than between η and Rrs (Tables 6 and 7). Please clarify that the choice of the Rrs-based model is a practical solution to inversion limitations in optically complex waters, not an indication of a stronger fundamental relationship.
- The statement that the random selection was performed only once "in order to develop and compare different models using the consistent dataset" (Line 230) is a weakness for ML studies. Please comment on the feasibility of cross-validation or at least repeat the split several times and report mean and standard deviation of the metrics.
- The paper is comprehensive but could be streamlined:
- The Discussion section repeats quantitative findings already shown in the Results. Please trim or merge with Results.
- Some supporting material (e.g., phytoplankton community groups, Figure 4, Table 3) could be moved to the Supplementary.
- Figure 5 mainly illustrates optical complexity and could also be moved to the Supplementary.
- Since many models are compared, but not all are equally important, the main text should focus on the most effective model, with detailed comparison tables in the Supplementary.
Minor comments:
- Line 32: Remove “repeat”.
- Line 55. Define “AOPs” at first use.
- Line 61. Please add relevant references on ocean biogeochemical models that use the CSD slope.
- Line 84. There are many recent publications applying machine learning in ocean colour remote sensing, e.g., : https://doi.org/10.1016/j.rse.2023.113596 and https://doi.org/10.1016/j.rse.2023.113628, and etc.
- I recommend adding a table listing all symbols and abbreviations used in the paper for clarity.
Citation: https://doi.org/10.5194/egusphere-2025-799-RC2
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
564 | 0 | 3 | 567 | 0 | 0 |
- HTML: 564
- PDF: 0
- XML: 3
- Total: 567
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The authors develop and compare chlorophyll-a size-distribution (CSD) models to retrieve η (as an indicator of phytoplankton size structure) for an optically complex sector of the Pacific Arctic. They use an in-situ dataset (>150 stations, 2007–2021) to show that machine learning models outperform the commonly used PCA approach, although a simple linear regression on normalized Rrs appears to perform best for satellite applications.
The study is sound and offers novel and relevant insights. The main conclusions are also supported by the analyses. However, several aspects require clarification and strengthening before publication.
Major comments:
1. The dataset (N=177) is rather heterogeneous, encompassing different decades, methodologies, and water masses. With fewer than 200 samples and a random 70/30 split, there is a clear risk of bias during validation.
Before 2012, the cruises used different filter pore-size schemes. While the 5 µm vs. 2 µm cutoff for nanophytoplankton may not introduce major differences, the 20 µm vs. 10 µm cutoff applied in 2009 and 2010 could significantly affect the microphytoplankton fraction. These three cruises alone account for ~1/3 of the dataset.
I would also be cautious about merging fluorometer-derived Chl-a with HPLC-derived values in such a complex region. Is this necessary, particularly when the latter include only 10 samples? Typically, unless the two methods have been explicitly compared and shown to agree for this dataset, it may be better to exclude the HPLC samples.
Furthermore, the in-situ stations span from ~50°N to 78°N, covering the Bering, Chukchi, and Beaufort seas. This spatial heterogeneity thus likely introduces substantial variability. It is also unclear which cruises and years correspond to which regions, but it is likely that different regions were sampled in different years.
Therefore, I recommend the following:
2. For a paper focused on estimating phytoplankton size structure from satellite data, I would have expected a comparison with established PSC/PFT algorithms applied to the in-situ dataset, even if brief.
Given that you already have both Rrs and pigment data available, this would be straightforward to implement. For example, models developed by T. Hirata and B. Brewin could be applied and compared against your results. Such a comparison would help to contextualize your findings and highlight the added value of your study.
3. I recommend reducing the number of figures and tables in the main text.
Currently, there are 12 figures and 7 tables in total in the main body of the manuscript. This makes the manuscript, although very interesting, dense for the reader. Some of these could be moved to the supplementary material. For instance, Figure 11. Also, consider moving parts of the Methods to the supplementary material to further streamline the manuscript.
4. The paper presents monthly climatologies of η from MODIS, but it is unclear why no matchup analysis was conducted to verify that the model performs reliably with satellite data
While the climatology figures are interesting, uncertainties are substantial in such a complex region. Ideally, the authors should identify in-situ matchups and compare η estimates estimated from L2 MODIS images against their dataset. If this is not feasible, a useful alternative would be to compare climatologies restricted to the period of one or two cruises to provide at least a partial validation.
Minor comments: