the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating Seasonal Global Sea Surface Chlorophyll-a with Resource-Efficient Neural Networks
Abstract. Marine chlorophyll-a is an important indicator of ecosystem health, and accurate forecasting, even at the surface level, can have significant implications for climate studies and resource management. Traditionally, these predictions have relied on computationally intensive numerical models, which require extensive domain expertise and careful parameterization.
We propose a data-driven alternative: a lightweight, resource-efficient neural architecture based on the U-Net that reconstructs surface, near-global chlorophyll-a from four physical predictors. The model uses mixed layer depth, sea surface temperature, sea surface salinity, and sea surface height as input, all of which are known to influence phytoplankton distribution and nutrient availability. By leveraging publicly available seasonal forecasts of these variables, we can generate six-month chlorophyll-a predictions in a matter of minutes.
We first validated the quality of the reconstruction by using the GLORYS12 reanalysis as input. The reconstructed time series demonstrated strong agreement with the reference GlobColour observations, with an RMSE of 0.01 and a correlation of 0.95. Extending this approach to seasonal forecasting, we used six-month SEAS5 forecasts as input and found that our predictions maintained high skill globally, with low error rates and stable correlation coefficients throughout the forecast period.
Our model accurately captures spatial and temporal chlorophyll-a patterns across a variety of regions, with an accuracy that meets or exceeds that of the numerical model of reference while significantly reducing computational costs. This approach offers a scalable, efficient alternative for long-term chlorophyll-a forecasting.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Biogeosciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(14271 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1246', Anonymous Referee #1, 28 May 2025
The authors use a neural network to estimate surface chlorophyll-a, a computationally efficient approach that appears to outperform traditional approaches like mechanistic biogeochemical ocean models. The manuscript presents some compelling results, but the experimental setup is not described well enough, and it is unclear why the comparison of chl-a estimates does not include any coastal regions.
general comments:
The manuscript is mostly well written and was easy to follow -- with a major exception: the basic setup of the experiments and implementation details are not well described and after reading through the whole manuscript I still do not quite know what, for example, "6-month predictions" are in the manuscript. Does the "6-month" imply a 6-month lead time, a 6-month forecast length, a 6-month time average or something else? Is there a distinction between "prediction" and "forecast" in the manuscript, if so, what is it? Sentences that are meant to explain experiments sometime increase the confusion of the reader, for instance: "These months correspond to lead-times one out of the six months of each forecast." (l. 156). Sentences like this example are confusing to the reader and could be improved considerably by rephrasing and adding some details. Please take the time and space to clarify how the experiments are set up and what is compared at what resolution (this includes space and time).
Even a reader who does not know much about marine chl-a might find it surprising that the regions where performance evaluated, shown in Fig. 3, do not include any "yellow" values and seem to focus only on open-ocean regions (as an aside, a color bar or at least a description of what property is shown in Fig. 3 would be useful). That is, why weren't any coastal regions with high chl-a concentrations included in the comparison? The authors mention "fisheries management" and "harmful algal blooms" but then neglect to evaluate the model in the biologically active regions where most blooms occur and fishery is prevalent. In general, the chl-a estimates were compared mostly as a global average (Fig. 4, 5) or as averages in the large open-ocean regions (Fig. 7, 9); only Fig. 6 shows the performance on a finer spatial scale. Even in the computation of the RMSE, a spatial average appears to be used: "The spatially-averaged reconstructed time series has a RMSE of 0.01 ..." (l. 151). Why is the RMSE based on a spatial average? The use of spatial averaging is not explained well or mentioned when the RMSE is introduced. Please ensure that the reader knows at all times how key metrics are being computed. In addition, I would suggest using nearshore regions in the comparison and evaluating the model performance at a higher resolution, both in space and time.
Furthermore, the authors later ponder how the decrease in ACC observed in Fig. 6 aligns with little to no increase in RMSE and other metrics in Fig. 9. They explain that "it is likely that the neural network’s ability to capture the strong seasonal dynamics in the data (Figs. 7 and 8) is compensating for the decrease in performance with respect to the anomalies" (l. 168). That could well be, but if the RMSE is based on some spatially averaged chl-a, the averaging could have removed most of the effect of the anomalies. Unfortunately, a reader can only guess here, as it is unclear how the RMSE was computed.
Due to their distribution, when plotting and comparing chl-a values, they are often log-transformed. The authors mention once that a log-transformation was used, but it is unclear where and to what extent: "The physical ocean data was normalized using min-max normalization and the chl-a data was log-transformed" (l. 82) is the only information the reader gets. Was a log-transformation used when computing the ACC, NRMSE etc., are r_i and p_i in Eq 1-4 log-transformed? How were the climatologies computed? More importantly, perhaps, was a log-transformation used in the loss function for the neural network? The authors mention that they needed to modify the loss function: "so we modified the standard mean squared error (MSE) loss function by adding a small penalty for underestimation." (l. 79). With a log-transformation applied to chl-a, one would expect underestimation to be quite heavily penalized by the MSE. More information is needed to better interpret the results and the setup of the neural network.
specific comments:L 1: "Marine chlorophyll-a is an important indicator of ecosystem health, and accurate forecasting, even at the surface level, can have significant implications for climate studies and resource management a lightweight, resource-efficient neural architecture based on the U-Net that reconstructs surface, near-global chlorophyll-a from four physical predictors.": Accurately forecasting/estimating surface chl-a is a good check for "traditional" mechanistic models to verify that they can recreate some key biogeochemical dynamics. How would the output of a neural network model that only estimates surface chl-a be able to inform climate studies and resource management? Maybe this is a point that could be discussed further in Section 4.
L 59: "The goal of this work is to demonstrate that we can not only estimate chl-a from these four variables, but that by using publicly available forecasts of these as input, we are able to generate an ensemble of skillful chl-a predictions for six months into the future.": Here it would be useful for the reader to be more specific: are the 6-month predictions reliant on a 6-month forecast or are they produced from input 6 months into the past?
L 74: "Skip connections link matching layers in the encoder and decoder, facilitating the transfer of information.": Does this mean the first Conv3D layer is linked to the last one, etc.?
Eq 1: It would be good to explain the terms in the equation a bit better (is the data log-transformed?) and move the equation up to where MSE and the terms are introduced.
L 85: What motivated the choice of the 12 "monthly" neural networks? How much worse is the use of a single one for all months?
L 90: "The optimal architecture found for this task has approximately six million trainable parameters...": Is this for one or all 12 of the networks?
L 97: "...provides daily and monthly data...": Here, or somewhere early on, mention if the networks produce daily or monthly mean estimates.
L 122: "lead-time two": Does this mean a 2-month lead time?
Eq 2-4: How do these metrics compare to the cost function used for training the network, why not report/show that value as well? And mention if any of these chl-a values are log-transformed in these metrics.
L 146: "Rather than a direct comparison, we use BIO4 as a benchmark, recognizing that it simulates a wide range of interconnected biogeochemical processes across various depths, whereas our data-driven approach is specifically designed for surface chl-a prediction.": This sentence is a bit confusing. It makes sense to compare the neural network approach to a more classic reference approach for estimating surface chl-a. But why is this dependent on BIO4 also estimating a wide range of other properties? Maybe I just do not understand what "direct comparison" refers to in this context.
L 150: The first sentence of Sec 3 is almost identical to that of Sec 2.2. Unfortunately, it is still not clear to me what a "set of 5-day predictions" means.
L 151: "The spatially-averaged reconstructed time series...": What kind of spatial averaging is performed here, before computing the RMSE etc.?
L 168 and following figures: Are the BIO4 estimates that are shown forecasts as well? For what lead time?
Citation: https://doi.org/10.5194/egusphere-2025-1246-RC1 -
RC2: 'Comment on egusphere-2025-1246', Anonymous Referee #2, 03 Sep 2025
Review of manuscript "Estimating Seasonal Global Sea Surface Chlorophyll-a with Resource-Efficient Neural Networks" by Martinez-Balbontin et al., submitted to Biogeosciences
This article presents an interesting data-driven approach to predict chlorophyll-a (chl-a) fields on a near-global scale based on seasonal forecasts of some oceanic physical properties, which offers an alternative tool to mechanistic biogeochemical models. Previous efforts in data-driven global chl-a reconstruction from oceanic physical properties have primarily targeted long-term retrospective analyses. In contrast, this study explores shorter-term seasonal forecasts of chl-a and argues that the results compare favorably with biogeochemical models. The topic is timely, the scientific question is original and clearly formulated, and the overall structure of the manuscript is easy to follow. However, I think several points would require further elaboration and justification to make the manuscript more rigorous and convincing.
General comments
1) Choice of predictors. The rationale for selecting only four predictors deserves further justification. While they are relevant, some may contain redundancy and some processes are not explicitly considered. It would help to discuss assumptions about neglected drivers and how this may influence the spatial and temporal variability of results. For instance, light availability is a key driver, particularly at high latitudes and in some tropical regions (see for instance Fig3A-B of Racault et al., 2017). Although SST may correlate with PAR seasonally, this relationship does not hold consistently at inter-annual timescales. Why was this predictor not included, for example? Is the potential improvement in performance considered negligible compared to the gain in terms of computing time for model training?
2) Temporal resolution. The use of 5-day data for training, while the final application relies on monthly inputs, is not fully explained. Clarifying the advantages and limitations of this choice would help readers understand whether it may contribute to the underestimation of inter-annual variability.
3) Reproducibility and robustness. Certain technical details are not sufficiently specified to ensure the reproducibility of the experiments (e.g., learning rate, number of training epochs, criteria applied to stop training, etc) and to convince readers of the robustness of the approach. While possible overfitting is mentioned to explain the network's weaker ability to predict inter-annual variability than seasonal one, giving more details regarding any regularization techniques used to monitor and limit this would strengthen confidence in the robustness of the approach.
4) Seasonal vs inter-annual variability. The manuscript could be strengthened by further analyses and discussion on how the data-driven approach performs relative to mechanistic models in representing inter-annual variability. This variability is more difficult to reproduce but highly relevant for societal applications (e.g., ENSO impacts), whereas reproducing the climatological seasonal cycle is comparatively less challenging.
5) Machine learning positioning. Clarifying the distinction between the efficiency of the overall data-driven framework versus the neural network architecture itself would avoid confusion. Six million parameters can be seen relatively large compared with some published CNNs, and the choice of U-Net over simpler alternatives could be justified more explicitly in terms of added value.
Specific comments
Title : I would have rephrased the title into something like : « Forecasting Seasonal Global Sea Surface Chlorophyll-a with a lightweight data-driven approach » to emphasize the forecasting dimension and the efficiency of the overall method (vs. the efficiency of the architecture itself).
L4 : I would recommend a sentence like : « We propose a data-driven resource-efficient alternative : a neural architecture based on the U-Net that reconstructs surface, [..] from four physical predictors »
L15 : I think « long-term » is not appropriate here, I would recommend « seasonal »
L69-70 : « While advances in computational power and data availability have enabled the development of more complex architectures, the simplicity and efficiency of the U-Net makes it an effective and resource-efficient choice for this task ». Although the U-Net architecture may appear simpler than more recent transformer-based models, it is still more complex than basic CNNs with fewer parameters that have been used in some previous studies. While I am convinced that a Unet is well suited for this application, I would have justified this choice differently, for example by emphasizing its ability to better capture different spatial scales.
L77 : « The hyperparameters [...] were optimized using random search ». Please provide a detailed description of all hyperparameters used and clarify on which dataset they were tuned: was optimization based on the training period (1998–2017) or the validation period (2017–2023)? If the latter, how did the authors account for potential overfitting? Why was an entirely independent time period not used to assess the model’s generalization performance more objectively? Could the authors also provide the training and validation loss curves? Finally, did the authors check that the model's learning was stable from one run to the next?
L85 : « To simplify the predictive task, which consists of using a 6-month forecast of the physics to predict chl-a for the same time period, we trained twelve dedicated neural networks, each corresponding to the starting month of the forecast ». The choice seems to increase methodological complexity while reducing the training data available for each model, potentially favoring overfitting. I would have clarified how this risk was assessed and justified this strategy more explicitly, including its potential advantages and drawbacks. Have the authors compared reconstruction performance over the six-month period when using a single model trained across the whole dataset versus the proposed approach based on twelve separate models?
L90-92 : « For context, […], has between 20 and 30 million ». I would have removed that sentence. In Ansari et al. (2022), some of the Unets mentioned have almost six times fewer parameters. If a comparison in terms of parameters is to be made, I think it would be more relevant to compare them with other deep learning models published on the topic of Chl reconstruction.
L104 : « Since remotely sensed chl-a is limited by sunlight availability, we focus on the -60° to 80° latitudes ». Spatial coverage is not symmetric in latitude, suggesting factors beyond sunlight, e.g., cloud-dependent pixel availability (higher in the Southern Ocean). I would have justified the footprint selection more objectively, for example using a minimum pixel density.
L117-L119 : « The SEAS5 forecast [..]. To avoid these from biasing […], and then subtracting this difference from SEAS5 before using it as input ». I am not sure I understand the rationale of this approach. When applying the model to future forecasts, the corresponding GLORYS12 reanalyses will not be available—how will this be addressed? Is only a fixed annual climatology subtracted?
L165 : « the ACC decreases with lead time due to the increased uncertainty in the forecasted physics». For Figure 6b (6-month lead-time ACC), have the authors checked whether the poorly predicted (blue) areas correspond to regions where the seasonal forecasts of the four physical predictors are less accurate, based on literature or error maps?
In the discussion (L235-237), the use of biogeochemical variables as predictors is mentioned. Can’t these forecasts carry larger errors than the physical fields, and couldn’t their inclusion risk degrading six-month forecast quality?
FIG 7 : Have the authors tried plotting seasonal chl‑a forecasts initialized for several months using SEAS5, in the same way as Figure 5, across the different regions? Providing these outputs in the Supplementary Material could help assess spatio-temporal heterogeneity in model performance over multiple lead-time months.
L169-170 : « These figures demonstrate that the neural network is able to capture the seasonal dynamics of the data across regions, regardless of the input physics. » This statement could be complemented by a discussion on interannual variations. ACC metrics for the different regions could support this discussion.
FIG8 & 9: Figure 8 is currently under-described; a more detailed analysis is recommended to strengthen the argument. For Figure 9, adding ACC curves would provide a metric specific to interannual variations and show their evolution over time.
L190 : « a simple neural network » : I would remove the term « simple ». Similarly, L217, I would either remove « shallow », or insist on the fact that this is a lighter approach than a classical mechanistic model. L246 : I would recommend removing « lightweight »
L249 : « These predictions maintain low error and high correlation to observations across different regions and lead-times ». I would qualify that statement, as this may not be the case with regard to interannual variability.
Technical corrections
L26: even ‘when’ limited instead of « even if limited » ?
L44 : « variables » or « predictors » instead of « parameters »
L60 : « available forecast of these as input » may be replaced by « available forecasts of those former variables as input» for clarity ?
L83 : « the physical ocean data was » → the physical ocean data were
L91 : « Ronneberger et al. »→ please precise the date of the reference.
L94-97 : Please precise the initial spatio-temporal resolution of the different used dataset, as well as the final resolution used in this study. If pre-processing was made (resampling, etc), please precise it.
L110 : « it uses the IFS cycle 43r1 for the atmosphere, NEM0 for the ocean, and LIM2 for sea ice ». Please precise the acronym and what it corresponds to for non numerical modeling readers.
Fg2 : Perhaps I have misunderstood, but why not have six reconstructed time series, with lead times 1-6 (see lead time of 6 shown in Figure 6) instead of five in this diagram ?
Fig 3 : it would be great to have the lat/lon axis plotted on the maps.
FIG 7 : I would recommend adding on the Figure some quantitative correlation metrics between the different time series to better compare performance between areas.
References
Racault, M. F., Sathyendranath, S., Brewin, R. J., Raitsos, D. E., Jackson, T., & Platt, T. (2017). Impact of El Niño variability on oceanic phytoplankton. Frontiers in Marine Science, 4,133
Citation: https://doi.org/10.5194/egusphere-2025-1246-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
429 | 80 | 14 | 523 | 15 | 27 |
- HTML: 429
- PDF: 80
- XML: 14
- Total: 523
- BibTeX: 15
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1