the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
PyESPERv1.01.01: A Python implementation of empirical seawater property estimation routines (ESPERs)
Abstract. This project produced a Python language implementation of locally interpolated regression (LIR) and neural network (NN) algorithms from empirical seawater property estimation routines (PyESPER). These routines estimate total alkalinity, dissolved inorganic carbon, total pH, nitrate, phosphate, silicate, and oxygen from geographic coordinates, depth, salinity, and 16 combinations of 0 to 4 additional predictors (temperature and biogeochemical information), and were previously available only in the MATLAB programming language. Here we document modifications to reduce discrepancies between the implementations, calculate the disagreements between the methods, and quantify Global Ocean Data Analysis Project (GLODAPv2.2022) reconstruction errors with PyESPER. While the PyESPER routine based on neural networks (PyESPER_NN) faithfully reproduces the corresponding MATLAB routine estimates of properties that do not require anthropogenic carbon change information, PyESPER_LIR and—to a lesser extent—PyESPER_NN estimates for total pH and dissolved inorganic carbon do not exactly reproduce the MATLAB routine estimates due to differences in interpolation and extrapolation methods between the programming languages. While the MATLAB and Python LIR-based estimates are not identical, we show that they are similarly skilled at reproducing the GLODAPv2.2022 data product and are thus comparable. This project increases the accessibility of ESPER algorithms by providing users with code in the freely available Python language and enables future ESPER updates to be released in multiple coding languages.
- Preprint
(2311 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2025-458', Anonymous Referee #1, 09 May 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-458/egusphere-2025-458-RC1-supplement.pdf
- AC1: 'Reply on RC1', Larissa M Dias, 16 May 2025
-
RC2: 'Comment on egusphere-2025-458', Anonymous Referee #2, 20 May 2025
The manuscript describes a new python-based version of the existing ESPER algorithms. No new development or training is performed, but detailed comparison of outcomes with both the original Matlab and the new Python versions is described.
The manuscript is well written and clearly details what is new and how the new version performs compared to the original. It is a nice added value that the algorithms are now available in several programming languages.
I only have some minor comments. Once those issues are fixed I’m happy to see this published.
Minor issues:
Throughout the manuscript information is needlessly repeated several times. In particular which observational data are included is presented again and again. It should be sufficient to define once what is o and w data, including how much data there is, and then just refer to those definitions. The many repetitions of this information makes the text a bit cumbersome to read.
Line 141: ensemble is the more commonly used word so I suggest using only that
Line 156-158: The sentence is quite awkwardly phrased. Try revising for clarity.
Line 160: I do not understand the meaning of the sentence. Please revise for clarity.
Line 230-235: All these numbers are also given in the table so it is unnecessary to repeat here. The information is also more easily digestible from a table. Same goes for lines 275-279.
Table 2 caption: Try splitting the information into smaller sentences or removing some redundant information.
Captions for Figures 2, 7, B1 and B4: There is no information about what the histograms/bars on the top and right represent. This should be added.
Lines 287-288: This statement appears to contradict the information on lines 318-319. Please clarify.
Figure 4: Most differences are found in the northwest Pacific Ocean. It would be interesting if you could add a brief discussion about why this is and the implications of it.
Line 320: I suggest you rename this section. It is not intuitive that it deals with the differences in speed of calculation
Figure 6: It would be useful to have a panel showing the differences between panels a and b
Line 358: I suggest to rename the section future work or future improvements.
Please also list the data product doi(s) in the data availability section along with the references.
Line 383-384: “essentially identical” is not true for DIC and pH. That should be mentioned here too
Table A1: Caption refers to Table S2, but that is really A2.
Citation: https://doi.org/10.5194/egusphere-2025-458-RC2 - AC2: 'Reply on RC2', Larissa M Dias, 28 May 2025
-
RC3: 'Comment on egusphere-2025-458', Anonymous Referee #3, 28 May 2025
Overview:
This manuscript introduces PyESPERv1.01.01, a Python-based implementation of empirical seawater property estimation routines (ESPERs), previously developed and made available only in MATLAB by author Carter. These routines estimate core seawater biogeochemical properties —such as total alkalinity, dissolved inorganic carbon, total pH, nitrate, phosphate, silicate, and oxygen—using inputs like geographic coordinates, depth, salinity, and up to four additional predictors (e.g., temperature and biogeochemical information). Two statistical algorithms, a locally interpolated regression (LIR) and a neural network (NN) estimation are averaged to produce a best estimate.
By transitioning ESPERs to Python, the authors enhance accessibility for the scientific community, as Python is an open-source language widely used in oceanographic research. The study also documents modifications made to reduce discrepancies between the Python and MATLAB implementations and evaluates the disagreements between the methods. The implementation also updates underlying datasets using Global Ocean Data Analysis Project (GLODAPv2.2022) dataset and addresses a couple minor issues with the original code.
The work submitted here will be a valuable resource to the community and required a large amount of detailed assessment and validation. I recommend publication after consideration and edits based on the range of suggestions from reviewers.
General Feedback:
This work will have substantial impact on the field of ocean biogeochemistry and carbon cycling, as well as serve as an important resource for characterizing baseline inorganic carbon chemistry in the context of marine carbon dioxide removal (mCDR) activities. While the concepts and ideas are not new, and build on the original ESPER, transitioning this tool to Python will broaden accessibility and encourage further scientific inquiry and discovery.
The calculations/algorithms used are described in precise and comprehensive detail. Care is taken to evaluate uncertainty, as well as assess internal consistency within the inorganic carbon system.
I commend the authors for making the code available on GitHub through a Jupyter Notebook example. However, two improvements would make this much more accessible to the community: (1) I am very surprised the performance was so much worse with python relative to Matlab. Profiling the code to see where the slowdown is likely could lead to massive performance improvements with some refactoring. (2) providing the code in a pip or conda installable package would make it much more reproducible and less error prone.
The overall presentation is clear, although somewhat dense. I appreciate the detailed documentation of methodology though.
Minor Feedback:
Do you have insight into why DIC and pH seems to have considerably larger python-Matlab differences?
L145: For clarification – NN functions were translated from scratch? Was this compared to using something ‘out of the box’ like pytorch? It would be interesting to compare both reproducibility and performance.
Figure B2: There seems to be structure in the large mismatches – For example in the North Pacific along margins, and perhaps on an A10 GO-SHIP line. Could you add discussion on this? Does this point towards potentially a data problem with one cruise?
Figure B3: The colorbar should ‘depth’ but there are no labels or units?
Citation: https://doi.org/10.5194/egusphere-2025-458-RC3 -
AC3: 'Reply on RC3', Larissa M Dias, 31 May 2025
Thank you for your comments. Please see the attached file for responses. For a tracked record of changes within the manuscript, please contact the authors directly or submit a separate request (we are not allowed to post manuscript changes here).
-
AC3: 'Reply on RC3', Larissa M Dias, 31 May 2025
-
RC4: 'Comment on egusphere-2025-458', Matthew P. Humphreys, 28 May 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-458/egusphere-2025-458-RC4-supplement.pdf
-
AC4: 'Reply on RC4', Larissa M Dias, 13 Jun 2025
We thank you for your comments. Please do note that, due to the quickly approaching deadline for this review, we are still "tidying" the code in some of the ways recommended. However, we fully expect to have this completed in the next week or two. We otherwise have responded in full to all comments in the attached document. Please reach out for a tracked changes version of the manuscript.
-
AC4: 'Reply on RC4', Larissa M Dias, 13 Jun 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-458', Anonymous Referee #1, 09 May 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-458/egusphere-2025-458-RC1-supplement.pdf
- AC1: 'Reply on RC1', Larissa M Dias, 16 May 2025
-
RC2: 'Comment on egusphere-2025-458', Anonymous Referee #2, 20 May 2025
The manuscript describes a new python-based version of the existing ESPER algorithms. No new development or training is performed, but detailed comparison of outcomes with both the original Matlab and the new Python versions is described.
The manuscript is well written and clearly details what is new and how the new version performs compared to the original. It is a nice added value that the algorithms are now available in several programming languages.
I only have some minor comments. Once those issues are fixed I’m happy to see this published.
Minor issues:
Throughout the manuscript information is needlessly repeated several times. In particular which observational data are included is presented again and again. It should be sufficient to define once what is o and w data, including how much data there is, and then just refer to those definitions. The many repetitions of this information makes the text a bit cumbersome to read.
Line 141: ensemble is the more commonly used word so I suggest using only that
Line 156-158: The sentence is quite awkwardly phrased. Try revising for clarity.
Line 160: I do not understand the meaning of the sentence. Please revise for clarity.
Line 230-235: All these numbers are also given in the table so it is unnecessary to repeat here. The information is also more easily digestible from a table. Same goes for lines 275-279.
Table 2 caption: Try splitting the information into smaller sentences or removing some redundant information.
Captions for Figures 2, 7, B1 and B4: There is no information about what the histograms/bars on the top and right represent. This should be added.
Lines 287-288: This statement appears to contradict the information on lines 318-319. Please clarify.
Figure 4: Most differences are found in the northwest Pacific Ocean. It would be interesting if you could add a brief discussion about why this is and the implications of it.
Line 320: I suggest you rename this section. It is not intuitive that it deals with the differences in speed of calculation
Figure 6: It would be useful to have a panel showing the differences between panels a and b
Line 358: I suggest to rename the section future work or future improvements.
Please also list the data product doi(s) in the data availability section along with the references.
Line 383-384: “essentially identical” is not true for DIC and pH. That should be mentioned here too
Table A1: Caption refers to Table S2, but that is really A2.
Citation: https://doi.org/10.5194/egusphere-2025-458-RC2 - AC2: 'Reply on RC2', Larissa M Dias, 28 May 2025
-
RC3: 'Comment on egusphere-2025-458', Anonymous Referee #3, 28 May 2025
Overview:
This manuscript introduces PyESPERv1.01.01, a Python-based implementation of empirical seawater property estimation routines (ESPERs), previously developed and made available only in MATLAB by author Carter. These routines estimate core seawater biogeochemical properties —such as total alkalinity, dissolved inorganic carbon, total pH, nitrate, phosphate, silicate, and oxygen—using inputs like geographic coordinates, depth, salinity, and up to four additional predictors (e.g., temperature and biogeochemical information). Two statistical algorithms, a locally interpolated regression (LIR) and a neural network (NN) estimation are averaged to produce a best estimate.
By transitioning ESPERs to Python, the authors enhance accessibility for the scientific community, as Python is an open-source language widely used in oceanographic research. The study also documents modifications made to reduce discrepancies between the Python and MATLAB implementations and evaluates the disagreements between the methods. The implementation also updates underlying datasets using Global Ocean Data Analysis Project (GLODAPv2.2022) dataset and addresses a couple minor issues with the original code.
The work submitted here will be a valuable resource to the community and required a large amount of detailed assessment and validation. I recommend publication after consideration and edits based on the range of suggestions from reviewers.
General Feedback:
This work will have substantial impact on the field of ocean biogeochemistry and carbon cycling, as well as serve as an important resource for characterizing baseline inorganic carbon chemistry in the context of marine carbon dioxide removal (mCDR) activities. While the concepts and ideas are not new, and build on the original ESPER, transitioning this tool to Python will broaden accessibility and encourage further scientific inquiry and discovery.
The calculations/algorithms used are described in precise and comprehensive detail. Care is taken to evaluate uncertainty, as well as assess internal consistency within the inorganic carbon system.
I commend the authors for making the code available on GitHub through a Jupyter Notebook example. However, two improvements would make this much more accessible to the community: (1) I am very surprised the performance was so much worse with python relative to Matlab. Profiling the code to see where the slowdown is likely could lead to massive performance improvements with some refactoring. (2) providing the code in a pip or conda installable package would make it much more reproducible and less error prone.
The overall presentation is clear, although somewhat dense. I appreciate the detailed documentation of methodology though.
Minor Feedback:
Do you have insight into why DIC and pH seems to have considerably larger python-Matlab differences?
L145: For clarification – NN functions were translated from scratch? Was this compared to using something ‘out of the box’ like pytorch? It would be interesting to compare both reproducibility and performance.
Figure B2: There seems to be structure in the large mismatches – For example in the North Pacific along margins, and perhaps on an A10 GO-SHIP line. Could you add discussion on this? Does this point towards potentially a data problem with one cruise?
Figure B3: The colorbar should ‘depth’ but there are no labels or units?
Citation: https://doi.org/10.5194/egusphere-2025-458-RC3 -
AC3: 'Reply on RC3', Larissa M Dias, 31 May 2025
Thank you for your comments. Please see the attached file for responses. For a tracked record of changes within the manuscript, please contact the authors directly or submit a separate request (we are not allowed to post manuscript changes here).
-
AC3: 'Reply on RC3', Larissa M Dias, 31 May 2025
-
RC4: 'Comment on egusphere-2025-458', Matthew P. Humphreys, 28 May 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-458/egusphere-2025-458-RC4-supplement.pdf
-
AC4: 'Reply on RC4', Larissa M Dias, 13 Jun 2025
We thank you for your comments. Please do note that, due to the quickly approaching deadline for this review, we are still "tidying" the code in some of the ways recommended. However, we fully expect to have this completed in the next week or two. We otherwise have responded in full to all comments in the attached document. Please reach out for a tracked changes version of the manuscript.
-
AC4: 'Reply on RC4', Larissa M Dias, 13 Jun 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
874 | 98 | 27 | 999 | 23 | 35 |
- HTML: 874
- PDF: 98
- XML: 27
- Total: 999
- BibTeX: 23
- EndNote: 35
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1