Modelling seawater pCO<sub>2</sub> and pH in the Canary Islands region based on satellite measurements and machine learning techniques

Sánchez-Mendoza, Irene; González-Dávila, Melchor; González-Santana, David; Curbelo-Hernández, David; Estupiñan-Santana, David; González, Aridane G.; Santana-Casiano, J. Magdalena

doi:10.5194/egusphere-2025-3699

Preprints

https://doi.org/10.5194/egusphere-2025-3699

Preprints

28 Aug 2025

| 28 Aug 2025

Modelling seawater pCO₂ and pH in the Canary Islands region based on satellite measurements and machine learning techniques

Irene Sánchez-Mendoza, Melchor González-Dávila, David González-Santana, David Curbelo-Hernández, David Estupiñan-Santana, Aridane G. González, and J. Magdalena Santana-Casiano

Abstract. The improvement of remote sensing systems, together with the emergence of new model-fitting algorithms based on machine-learning techniques, has allowed the determination of the partial pressure of carbon dioxide (pCO_2,sw) and pH (pH_T,sw) in the waters of the Canary Islands. Among all the fitted models, the most powerful one was the bootstrap aggregation (bagging), giving a RMSE of 2.0 µatm (R² > 0.99) for pCO_2,sw and RMSE of 0.002 for pH_T,is, although the multilinear regression (MLR), neural network (NN) and categorical boosting (catBoost) also have a good predictive performance, with RMSE ranging from 5.4 to 10 µatm for 360 < pCO_2,sw < 481 µatm and from 0.004 and 0.008 for 7.97 < pH_T,is< 8.07. Using the most reliable model, it was determined that there is an interannual trend of 3.51 ± 0.31 µatm yr^-1 for pCO_2,sw (which surpasses the rate of increase for atmospheric CO₂ of 2.3 µatm yr^-1) and an increase in acidity of -0.003 ± 0.001 pH units yr^-1. The increase in both, the atmospheric CO₂ and the sea surface temperature of 0.2 °C yr^-1observed in the 6-year period, influenced by the unprecedented 2023 marine heat wave, contribute to this important rate. Considering the Canary Islands between 13°–19° W and 27°–30° N, the region has moved from a slight CO₂ source of 0.90 Tg CO₂ yr^-1 in 2019 to 4.5 Tg CO₂ yr^-1 in 2024. After 2022, eastern locations that acted as an annual sink of CO₂ switched to acting as a source.

Received: 30 Jul 2025 – Discussion started: 28 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5937 KB)

Supplement (1834 KB)

Download & links

Irene Sánchez-Mendoza, Melchor González-Dávila, David González-Santana, David Curbelo-Hernández, David Estupiñan-Santana, Aridane G. González, and J. Magdalena Santana-Casiano

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3699', Anonymous Referee #1, 01 Oct 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3699/egusphere-2025-3699-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-3699-RC1
- AC1: 'Reply on RC1', Melchor Gonzalez-Davila, 27 Oct 2025
  
  We sincerely thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your recognition of the relevance of our study. Your comments have been carefully considered and have played a key role in improving the overall quality, clarity, and robustness of the revised manuscript. We have provided a point-by-point response to each of your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3699-AC1
RC2:
'Comment on egusphere-2025-3699', Anonymous Referee #2, 15 Oct 2025

Publisher’s note: the content of this comment was removed on 20 October 2025 since the comment was posted by mistake.

Citation: https://doi.org/10.5194/egusphere-2025-3699-RC2
- RC3:
  'Reply on RC2', Anonymous Referee #2, 17 Oct 2025
  
  This review was incorrectly submitted for this manuscript. My apologies. Please disregard. I will submit a revised review soon, hopefully for the correct manuscript this time.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3699-RC3
  - AC3: 'Reply on RC3', Melchor Gonzalez-Davila, 03 Nov 2025
    
    Thank you for your indications
    
    Citation: https://doi.org/10.5194/egusphere-2025-3699-AC3
RC4:
'Comment on egusphere-2025-3699', Anonymous Referee #2, 23 Oct 2025

The authors have produced a short and focused study relating measured pCO2,sw in the vicinity of the Canary Islands to predictors that can be obtained from remote sensing. The core aim of the paper is modest, but worthwhile. The execution seems to have some mistakes. The paper should be returned to the authors for revisions. In addition to revisions associated with the primary recommendation below, I would urge the authors to reduce or summarize the statements comparing temperatures in various locations. These are often presented without context such that I found myself wondering why so much text was devoted to discussion of how temperature varies spatially and temporally. I feel that limiting this text could help shorten and strengthen the paper.
My primary recommendation for this paper is, perhaps ironically, the same as in the review that I erroneously submitted earlier (and apologies again for my mistake). When fitting machine learning or regression models, it is insufficient to divide the training data randomly by measurement, as appears to have been done here. This is because the measurements that are collected by seagoing work are usually nearly synoptic and are highly correlated both in space and time. Therefore, the relationships that reconstruct the training measurements along a cruise or transect almost invariably do a fantastic job of reconstructing other 'validation' measurements made along the same cruise or transect… even while failing to reconstruct the patterns of variability found at other times and locations. This tendency can be even more pronounced when using ML models with many degrees of freedom. I’m not positive, but I believe an example of this can be clearly seen in figure 4 where the ML model seems to have optimized a specific relationship for the transect with data that does not at all extend spatially into the rest of the ocean. The fix for this is pretty simple: divide up all of your measurements randomly by “cruise” or whatever identifier is appropriate for a given boat making measurements with a given instrument in a given year. Then partition the data between training and validation using random selections of these collections of data. Ideally, use k-fold validation to ensure that all data are included in both the training and validation data at various times. I would expect that the bagging routine’s performance will be much more in line with that of the other approaches after this is done.
Stylistically, I’ll note that the writing struggles at times (see non exhaustive comments below), and the notation is sufficiently inconsistent that it appears to have been written piecemeal by multiple people. Please homogenize the notation.

Line by line comments:
30: line height formatting error
32: this sentence has incorrect grammar.
33: what 6 year period?
58: this reference is almost a quarter of a decade old, so it’s not ideal for making the point that this is still a problem, especially as there have been several recent studies aimed at improving coastal pCO2 products.
67: I suggest italicizing the p in pCO2, especially if you italicize the f in fCO2. This will distinguish it from the “-log10” meaning for p in pH and pe.
69: IUPAC conformant CT has the C italicized (even though it represents the element carbon)… same for AT later
86: use parentheses here, otherwise it appears as though MLR is another element in a list with multilinear regression. Also, MLR is already defined in the abstract.
153:here you have italicized the p, definitely be consistent
179: line height error
189 and 174: inconsistent italicization of x
231: earlier r was not capitalized
245: no need to indent since it’s not the start of a paragraph
273: check spacing
403: line height
413: This could be confirmed by excluding Chl from the fit and confirming that Kd,490 is then selected as a predictor variable
415: I’m confused by this claim. Why does it matter which variables were used to predict pCO2,sw for an algorithm focused on pH? Or are you talking about a calculation for pHT from TA (f(S)) and pCO2sw, in which case why does it matter what the atmospheric value was at all?
432: These sentences are not logically linked. It current reads as though the authors are implying that there is a temporal trend in the distance from the African continent.
445: winter of 2023-2024… or JFM?
464: it seems odd that the model with the highest prediction error has better validation statistics than an alternative presented immediately afterwards
483: it is unclear what is meant if a variable controlling something is characterized by a component. Consider “The strong predictive power of this relationship is likely because pCO2sw variability is dominated by thermal changes in this region, and these changes are directly captured by satellite SST records.”
488: where does this theoretical relationship come from? Also, this relationship is referred to as a rate of change, but there is no temporal component.
Figure 4: I might be misunderstanding what I’m seeing, but it appears as though the ML method has found a way to cheat. The sharp discontinuities at the locations where data are available implies that the ML method has created local relationships specific to the times and locations of measurements intended to exactly reproduce the training/validation cruises without allowing those training data to overly affect the overall relationships. This, if I’m understanding correctly, is a strong demonstration of the hazards of not separating your training data from your validation data by transect/occupation. I reiterate that I might be misinterpreting what I’m seeing somehow.
527: it is odd to suggest that the thermal effect mitigates the expected effect from the temperature increase. I know what you mean, but many readers won’t.

Citation: https://doi.org/10.5194/egusphere-2025-3699-RC4
- AC2: 'Reply on RC4', Melchor Gonzalez-Davila, 03 Nov 2025
  
  We express our gratitude to the reviewer for their insightful comment and concur that, when utilizing in situ data collected along cruises or transects, it is imperative to exercise caution to prevent spatial and temporal autocorrelation between the training and validation datasets
  
  Citation: https://doi.org/10.5194/egusphere-2025-3699-AC2

Irene Sánchez-Mendoza, Melchor González-Dávila, David González-Santana, David Curbelo-Hernández, David Estupiñan-Santana, Aridane G. González, and J. Magdalena Santana-Casiano

Supplement

https://doi.org/10.5194/egusphere-2025-3699-supplement

Irene Sánchez-Mendoza, Melchor González-Dávila, David González-Santana, David Curbelo-Hernández, David Estupiñan-Santana, Aridane G. González, and J. Magdalena Santana-Casiano

Viewed

Total article views: 1,487 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,361	91	35	1,487	45	60	46

HTML: 1,361
PDF: 91
XML: 35
Total: 1,487
Supplement: 45
BibTeX: 60
EndNote: 46

Views and downloads (calculated since 28 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	103	4	3	110
Sep 2025	1,018	10	1	1,029
Oct 2025	162	31	16	209
Nov 2025	63	32	11	106
Dec 2025	15	14	4	33

Cumulative views and downloads (calculated since 28 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	103	4	3	110
Sep 2025	1,018	10	1	1,029
Oct 2025	162	31	16	209
Nov 2025	63	32	11	106
Dec 2025	15	14	4	33

Viewed (geographical distribution)

Total article views: 1,329 (including HTML, PDF, and XML) Thereof 1,329 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Dec 2025

Short summary

This study looked at ocean CO₂ and pH near the Canary Islands using satellite and local data. Of four methods tested, the bagging machine learning worked best. More CO₂ and lower pH were found in the west due to ocean currents. CO₂ released to the air rose from 2019 to 2024, partly due to warmer seas and a 2023 heatwave. The study shows how combining long-term data and smart tools can help us understand how the ocean and air exchange CO₂ in changing coastal waters.


Total:	0
HTML:	0
PDF:	0
XML:	0

Modelling seawater pCO2 and pH in the Canary Islands region based on satellite measurements and machine learning techniques

Supplement

Viewed

Viewed (geographical distribution)

Modelling seawater pCO₂ and pH in the Canary Islands region based on satellite measurements and machine learning techniques