the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning for estimating phytoplankton size structure from satellite ocean color imagery in optically complex Pacific Arctic waters
Abstract. In response to recent advances in satellite ocean color remote sensing, the present study developed a chlorophyll-a size distribution (CSD) model using machine learning (ML) approaches for optically complex Pacific Arctic waters. Previous CSD models capture the spectral features of satellite-estimated phytoplankton absorption coefficient (aph(λ)) through principal component analysis (PCA) and assume a strong correlation between the spectral features and phytoplankton size structure determined from the exponent of the CSD (η). A weakness of this approach is that relies on satellite retrievals of aph(λ), which can be highly uncertain due to the optical effects of water constituents other than phytoplankton. Therefore, we tested the utility of remote sensing reflectance (Rrs(λ)) for directly deriving η and ML methods to identify other viable algorithm formulations besides PCA. Results show superior performance of the ML-based CSD models compared to the PCA-based model utilizing both Rrs(λ) and aph(λ) as predictors of η. Considering the large uncertainties in the inversion of aph(λ) from Rrs(λ), the CSD model with Rrs(λ) based on multivariable linear regression produced the best performance among all models considered. Another key finding is that more complex ML approaches do not always produce more effective models than standard linear regression. Indeed, simple linear regression outperformed other ML approaches for retrieving η directly from Rrs(λ), whereas support vector machine performed the best among diverse ML approaches in the case of aph(λ). Overall, this study found benefits in using Rrs(λ) with ML to improve the retrieval accuracy of η for Pacific Arctic waters.
Status: open (extended)
- RC1: 'Comment on egusphere-2025-799', Anonymous Referee #1, 28 Aug 2025 reply
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
320 | 0 | 2 | 322 | 0 | 0 |
- HTML: 320
- PDF: 0
- XML: 2
- Total: 322
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The authors develop and compare chlorophyll-a size-distribution (CSD) models to retrieve η (as an indicator of phytoplankton size structure) for an optically complex sector of the Pacific Arctic. They use an in-situ dataset (>150 stations, 2007–2021) to show that machine learning models outperform the commonly used PCA approach, although a simple linear regression on normalized Rrs appears to perform best for satellite applications.
The study is sound and offers novel and relevant insights. The main conclusions are also supported by the analyses. However, several aspects require clarification and strengthening before publication.
Major comments:
1. The dataset (N=177) is rather heterogeneous, encompassing different decades, methodologies, and water masses. With fewer than 200 samples and a random 70/30 split, there is a clear risk of bias during validation.
Before 2012, the cruises used different filter pore-size schemes. While the 5 µm vs. 2 µm cutoff for nanophytoplankton may not introduce major differences, the 20 µm vs. 10 µm cutoff applied in 2009 and 2010 could significantly affect the microphytoplankton fraction. These three cruises alone account for ~1/3 of the dataset.
I would also be cautious about merging fluorometer-derived Chl-a with HPLC-derived values in such a complex region. Is this necessary, particularly when the latter include only 10 samples? Typically, unless the two methods have been explicitly compared and shown to agree for this dataset, it may be better to exclude the HPLC samples.
Furthermore, the in-situ stations span from ~50°N to 78°N, covering the Bering, Chukchi, and Beaufort seas. This spatial heterogeneity thus likely introduces substantial variability. It is also unclear which cruises and years correspond to which regions, but it is likely that different regions were sampled in different years.
Therefore, I recommend the following:
2. For a paper focused on estimating phytoplankton size structure from satellite data, I would have expected a comparison with established PSC/PFT algorithms applied to the in-situ dataset, even if brief.
Given that you already have both Rrs and pigment data available, this would be straightforward to implement. For example, models developed by T. Hirata and B. Brewin could be applied and compared against your results. Such a comparison would help to contextualize your findings and highlight the added value of your study.
3. I recommend reducing the number of figures and tables in the main text.
Currently, there are 12 figures and 7 tables in total in the main body of the manuscript. This makes the manuscript, although very interesting, dense for the reader. Some of these could be moved to the supplementary material. For instance, Figure 11. Also, consider moving parts of the Methods to the supplementary material to further streamline the manuscript.
4. The paper presents monthly climatologies of η from MODIS, but it is unclear why no matchup analysis was conducted to verify that the model performs reliably with satellite data
While the climatology figures are interesting, uncertainties are substantial in such a complex region. Ideally, the authors should identify in-situ matchups and compare η estimates estimated from L2 MODIS images against their dataset. If this is not feasible, a useful alternative would be to compare climatologies restricted to the period of one or two cruises to provide at least a partial validation.
Minor comments: