the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Modelling seawater pCO2 and pH in the Canary Islands region based on satellite measurements and machine learning techniques
Abstract. The improvement of remote sensing systems, together with the emergence of new model-fitting algorithms based on machine-learning techniques, has allowed the determination of the partial pressure of carbon dioxide (pCO2,sw) and pH (pHT,sw) in the waters of the Canary Islands. Among all the fitted models, the most powerful one was the bootstrap aggregation (bagging), giving a RMSE of 2.0 µatm (R2 > 0.99) for pCO2,sw and RMSE of 0.002 for pHT,is, although the multilinear regression (MLR), neural network (NN) and categorical boosting (catBoost) also have a good predictive performance, with RMSE ranging from 5.4 to 10 µatm for 360 < pCO2,sw < 481 µatm and from 0.004 and 0.008 for 7.97 < pHT,is < 8.07. Using the most reliable model, it was determined that there is an interannual trend of 3.51 ± 0.31 µatm yr-1 for pCO2,sw (which surpasses the rate of increase for atmospheric CO2 of 2.3 µatm yr-1) and an increase in acidity of -0.003 ± 0.001 pH units yr-1. The increase in both, the atmospheric CO2 and the sea surface temperature of 0.2 °C yr-1 observed in the 6-year period, influenced by the unprecedented 2023 marine heat wave, contribute to this important rate. Considering the Canary Islands between 13°–19° W and 27°–30° N, the region has moved from a slight CO2 source of 0.90 Tg CO2 yr-1 in 2019 to 4.5 Tg CO2 yr-1 in 2024. After 2022, eastern locations that acted as an annual sink of CO2 switched to acting as a source.
- Preprint
(5937 KB) - Metadata XML
-
Supplement
(1834 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3699', Anonymous Referee #1, 01 Oct 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3699/egusphere-2025-3699-RC1-supplement.pdfCitation: https://doi.org/
10.5194/egusphere-2025-3699-RC1 -
AC1: 'Reply on RC1', Melchor Gonzalez-Davila, 27 Oct 2025
We sincerely thank you for your thoughtful and constructive feedback on our manuscript. We greatly appreciate your recognition of the relevance of our study. Your comments have been carefully considered and have played a key role in improving the overall quality, clarity, and robustness of the revised manuscript. We have provided a point-by-point response to each of your comments.
-
AC1: 'Reply on RC1', Melchor Gonzalez-Davila, 27 Oct 2025
-
RC2: 'Comment on egusphere-2025-3699', Anonymous Referee #2, 15 Oct 2025
Publisher’s note: the content of this comment was removed on 20 October 2025 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/egusphere-2025-3699-RC2 -
RC3: 'Reply on RC2', Anonymous Referee #2, 17 Oct 2025
This review was incorrectly submitted for this manuscript. My apologies. Please disregard. I will submit a revised review soon, hopefully for the correct manuscript this time.
Citation: https://doi.org/10.5194/egusphere-2025-3699-RC3 -
AC3: 'Reply on RC3', Melchor Gonzalez-Davila, 03 Nov 2025
Thank you for your indications
Citation: https://doi.org/10.5194/egusphere-2025-3699-AC3
-
AC3: 'Reply on RC3', Melchor Gonzalez-Davila, 03 Nov 2025
-
RC3: 'Reply on RC2', Anonymous Referee #2, 17 Oct 2025
-
RC4: 'Comment on egusphere-2025-3699', Anonymous Referee #2, 23 Oct 2025
The authors have produced a short and focused study relating measured pCO2,sw in the vicinity of the Canary Islands to predictors that can be obtained from remote sensing. The core aim of the paper is modest, but worthwhile. The execution seems to have some mistakes. The paper should be returned to the authors for revisions. In addition to revisions associated with the primary recommendation below, I would urge the authors to reduce or summarize the statements comparing temperatures in various locations. These are often presented without context such that I found myself wondering why so much text was devoted to discussion of how temperature varies spatially and temporally. I feel that limiting this text could help shorten and strengthen the paper.
My primary recommendation for this paper is, perhaps ironically, the same as in the review that I erroneously submitted earlier (and apologies again for my mistake). When fitting machine learning or regression models, it is insufficient to divide the training data randomly by measurement, as appears to have been done here. This is because the measurements that are collected by seagoing work are usually nearly synoptic and are highly correlated both in space and time. Therefore, the relationships that reconstruct the training measurements along a cruise or transect almost invariably do a fantastic job of reconstructing other 'validation' measurements made along the same cruise or transect… even while failing to reconstruct the patterns of variability found at other times and locations. This tendency can be even more pronounced when using ML models with many degrees of freedom. I’m not positive, but I believe an example of this can be clearly seen in figure 4 where the ML model seems to have optimized a specific relationship for the transect with data that does not at all extend spatially into the rest of the ocean. The fix for this is pretty simple: divide up all of your measurements randomly by “cruise” or whatever identifier is appropriate for a given boat making measurements with a given instrument in a given year. Then partition the data between training and validation using random selections of these collections of data. Ideally, use k-fold validation to ensure that all data are included in both the training and validation data at various times. I would expect that the bagging routine’s performance will be much more in line with that of the other approaches after this is done.
Stylistically, I’ll note that the writing struggles at times (see non exhaustive comments below), and the notation is sufficiently inconsistent that it appears to have been written piecemeal by multiple people. Please homogenize the notation.
Line by line comments:
30: line height formatting error
32: this sentence has incorrect grammar.
33: what 6 year period?
58: this reference is almost a quarter of a decade old, so it’s not ideal for making the point that this is still a problem, especially as there have been several recent studies aimed at improving coastal pCO2 products.
67: I suggest italicizing the p in pCO2, especially if you italicize the f in fCO2. This will distinguish it from the “-log10” meaning for p in pH and pe.
69: IUPAC conformant CT has the C italicized (even though it represents the element carbon)… same for AT later
86: use parentheses here, otherwise it appears as though MLR is another element in a list with multilinear regression. Also, MLR is already defined in the abstract.
153:here you have italicized the p, definitely be consistent
179: line height error
189 and 174: inconsistent italicization of x
231: earlier r was not capitalized
245: no need to indent since it’s not the start of a paragraph
273: check spacing
403: line height
413: This could be confirmed by excluding Chl from the fit and confirming that Kd,490 is then selected as a predictor variable
415: I’m confused by this claim. Why does it matter which variables were used to predict pCO2,sw for an algorithm focused on pH? Or are you talking about a calculation for pHT from TA (f(S)) and pCO2sw, in which case why does it matter what the atmospheric value was at all?
432: These sentences are not logically linked. It current reads as though the authors are implying that there is a temporal trend in the distance from the African continent.
445: winter of 2023-2024… or JFM?
464: it seems odd that the model with the highest prediction error has better validation statistics than an alternative presented immediately afterwards
483: it is unclear what is meant if a variable controlling something is characterized by a component. Consider “The strong predictive power of this relationship is likely because pCO2sw variability is dominated by thermal changes in this region, and these changes are directly captured by satellite SST records.”
488: where does this theoretical relationship come from? Also, this relationship is referred to as a rate of change, but there is no temporal component.
Figure 4: I might be misunderstanding what I’m seeing, but it appears as though the ML method has found a way to cheat. The sharp discontinuities at the locations where data are available implies that the ML method has created local relationships specific to the times and locations of measurements intended to exactly reproduce the training/validation cruises without allowing those training data to overly affect the overall relationships. This, if I’m understanding correctly, is a strong demonstration of the hazards of not separating your training data from your validation data by transect/occupation. I reiterate that I might be misinterpreting what I’m seeing somehow.
527: it is odd to suggest that the thermal effect mitigates the expected effect from the temperature increase. I know what you mean, but many readers won’t.
Citation: https://doi.org/10.5194/egusphere-2025-3699-RC4 -
AC2: 'Reply on RC4', Melchor Gonzalez-Davila, 03 Nov 2025
We express our gratitude to the reviewer for their insightful comment and concur that, when utilizing in situ data collected along cruises or transects, it is imperative to exercise caution to prevent spatial and temporal autocorrelation between the training and validation datasets
-
AC2: 'Reply on RC4', Melchor Gonzalez-Davila, 03 Nov 2025
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,318 | 62 | 27 | 1,407 | 34 | 55 | 40 |
- HTML: 1,318
- PDF: 62
- XML: 27
- Total: 1,407
- Supplement: 34
- BibTeX: 55
- EndNote: 40
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1