the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Obtaining accurate, high-frequency and long-term seawater pH data by using coupled lab-on-chip and optode sensing technologies
Abstract. The marine science community requires accurate, cost-effective, and reliable pH sensors capable of long-term, stable operations in-situ from coastal to deep-sea environments. Spectrophotometric pH sensors based on lab-on-chip (LOC) technology have been shown to offer long-term accuracy that can sample every 10 minutes. However, for applications where higher-frequency measurements are important, this maximum sample rate may be limiting, in addition to the power requirements needed to operate the sensor.
In contrast, commercially available pH optodes (PyroScience GmbH) are relatively inexpensive, consume little power and have a small form factor, but with intense use the pH sensitive membrane can photo-oxidise, causing signal drift. The combination of LOC and optode technologies, however, can be used to provide long-term, high-frequency and high-stability in-situ pH data, but protocols to correct for sensor drift need to be developed and evaluated.
To examine sensor drift and develop protocols to account for it, we suspended two LOC pH sensors with two pH optodes at 0.5 m depth from a floating pontoon within a harbour in Southampton, UK for six months (June–December 2023). This is a highly dynamic tidal environment with substantial biofouling. The optode (AquapHOx-L-pH, PyroScience GmbH) and an independent pH sensor (Deep SeapHOx V2, Sea-Bird Scientific) measured at a high frequency (e.g., ≤5 min) alongside a LOC pH sensor measuring at a lower frequency (e.g., ≤2 hr). Triplicate lab validated co-samples were collected each week, in addition to dedicated sensors monitoring the temperature, salinity, dissolved oxygen and tidal height. We find good agreement i.e., mean ∆pH = -0.022 ± 0.023 (3,182 data points in common) pH units between the SeapHOx and LOC sensors, in addition to individual field accuracies of <0.020 pH units. As expected, we found significant signal drift (e.g., generally ≤0.012 pH units per day) and offsets (e.g., 0.1–0.2 pH units) with the pH optodes after intensive use in a high biofouling environment. However, by coupling accurate LOC pH data to high frequency optode data, we corrected the optode signal drift/offset and achieved a similar field accuracy (<0.02 pH units) to the SeapHOx sensor even when using ultra-low LOC pH sensor measurement frequencies (e.g., several days to weeks). Overall, this work provides the oceanographic community with guidelines on how to achieve accurate, rapid and long-term pH measurements, while also balancing power requirements, by combining two complementary pH sensing technologies.
- Preprint
(1163 KB) - Metadata XML
-
Supplement
(1279 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-5566', Anonymous Referee #1, 06 Jan 2026
-
AC1: 'Reply on RC1', Anthony Lucio, 23 Mar 2026
Referee #1
Here the authors present data from two deployments of pH sensors that operate at different frequencies. By deploying a pH optode alongside their LOC (lab on chip) pH sensor, they were able to demonstrate that the greater accuracy provided by the LOC pH measurement could be used to correct drift in the pH optode signal. This opened up the ability to measure pH at far greater frequency by the LOC sensor alone for longer duration using the higher frequency optode that is prone to more rapid drift. An ISFET-based pH sensor package was deployed alongside as well to provide an independent indication of optode performance and assessment of the LOC applied correction in addition to bottle samples. As part of the study, it was also determined how often the optode benefitted from the correction by the LOC sensor.
A few comments:
- Line 74: ISFET-based pH warmup time depends on choice of reference electrode. If using the Cl- ISE, there is a longer conditioning requirement.
Thank you for pointing that out. We see that the Deep SeapHOx V2 (used in the present study) only has an external Ag/AgCl reference electrode and not in addition to an internal (gelled electrolyte) Ag/AgCl reference electrode. This results in a longer conditioning time and as such the salinity correction to the pH becomes quite important. We have added the following text highlighted in blue to the manuscript on page 3, lines 82-86 to clarify this:
“The conditioning time is dependent on the reference electrode configuration e.g., the Deep SeapHOx V2 only has an external Ag/AgCl reference electrode whereas the shallow-water SeapHOx and SeaFET units utilise an external Ag/AgCl reference electrode in addition to an internal Ag/AgCl (gelled electrolyte) reference electrode. As a result, the Deep SeapHOx V2 sensor (used in the present study) requires a longer conditioning time in the sensing environment and the salinity correction to the pH data becomes quite important.”
- Table 1 (and text above): Size of sensors is a little unusual to include without more details- why list the size of the seafet if you deployed a seaphox? Is this just the sensor or the electronics, housing, power, etc.? Sensor footprint is different from a fully autonomous package.
The LOC and optode sensors are only pH sensors, so we wanted to compare the physical size to the most relevant ISFET-based pH only sensor (hence the SeaFET dimensions), but we have now changed this to reflect the Deep SeapHOx V2 sensor used in the present study. We have added the following text highlighted in blue to the manuscript on page 5, lines 141-143 to clarify this:
“The sensor dimensions reported in Table 1 are of the exterior housing to give an indication of the overall footprint that is relevant to physical integration onto vehicles/platforms, but the optode and SeapHOx were operated as fully autonomous systems whereas the LOC sensor utilised an external power supply.”
- Line 159: I don’t recall mention of pH scales used. It is important when comparing different sensors to describe which scale is being used and where/how conversions are being applied. What is the composition of the pyroscience calibration solutions?
The pH reported is the total proton scale (pHT). We have added the following text highlighted in blue to the manuscript on page 6, lines 165-166 to clarify this:
“The pH values are reported on the total proton scale (pHT), and no signal averaging was done to any data within the present study.”
Unfortunately, the composition of the PyroScience buffer solutions is not disclosed but they do recommend using their specific pH 2 / pH 11 buffers instead of common commercial buffers that contain preservatives, and the fact some can be coloured solutions that may impact the optical calibration.
General:
- It would be useful to discuss the decisions behind the measurement frequencies of the different sensors and the decision to change during phase 2
The SeapHOx and optode pH sensors were set to sample at a comparatively high frequency (<1 min per measurement) during phase 1. This high frequency was selected to represent use cases when high-frequency data is required in fast-changing environments and aimed to ascertain how the sensors performed in the field when sampling at this rate and to understand if the sampling frequency impacted the sensor performance. While the LOC pH sensor has a maximum sampling rate of every ~10 min, in this study it was set to sample at a much lower frequency (ca. every 1 hr) to negate the need for very frequent battery swaps.
During phase 2 the sample frequencies were decreased to test the hypothesis that the rate of sensor drift was an effect of the number of samples measured rather than simply a function of time. We have added the following two bits of text highlighted in blue to the manuscript on page 9, lines 259-261 and page 10, lines 270-273 to clarify this:
“The measurement frequency was reduced during phase 2 to examine the effect of sample frequency on the drift of the optode (e.g., via photodegradation) and to conserve battery during the colder months of the deployment.”
and
“Furthermore, the reduced measurement frequency within phase 2 does not appear to show the same rate of significant signal drifting as encountered towards the end of phase 1. This could be a result of the colder temperatures experienced in phase 2 where the temperature decreased gradually from ca. 20 °C to ca. 8 °C or an outcome of the lower measurement frequency.”
- From a scientific standpoint, it might be useful to include known observational sites or studies, BGC activities, etc. where having increased resolution would provide new or interesting information.
We have added the following text highlighted in blue to the manuscript on page 2, lines 37-42 to clarify this:
“Fast pH measurements can be useful in several settings e.g., dynamic estuarine and coastal regions or on ship underway systems which can experience rapid changes in the composition of the surface and near-surface seawater (Zheng et al., 2025; Aßmann et al., 2011). Furthermore, short-term anthropogenic perturbations such as runoff, upwelling and localised CO2 emissions can create rapid pH changes (Schaap et al., 2021; Monk et al., 2021). High-frequency pH sensors can detect these transient signals that discrete sampling would otherwise miss, which highlights the need for accurate, rapid, and autonomous seawater pH sensors.”
- Was there any averaging used in the sensor signal? And would this improve uncertainty for any of the sensors?
We appreciate the Referee for asking this question. Signal averaging can improve sensor accuracy by reducing the noise (i.e., increase signal to noise ratio) and minimise errors, however, no averaging was done to the data presented within this study. The raw pH data were post-processed to account for the known temperature and salinity at the time of measurement but were otherwise not processed.
We have added the following text highlighted in blue to the manuscript on page 6, lines 165-166 to clarify this:
“The pH values are reported on the total proton scale (pHT), and no signal averaging was done to any data within the present study.”
- Deploying multiple different sensor packages is perhaps more cumbersome than a more unified package- why not just deploy the seapHOx? Some more discussion of why choosing the optode/LOC configuration would be helpful.
The combined sensor package can provide benefits over a single sensor. In the present study the LOC and optode sensors were deployed physically attached together but operating independently of each other. Due to the small form-factor of the optode, and the use of an identical MCBH-8F SubConn connector, the logistics of deploying the two sensor systems was not overly cumbersome. Future work will look at developing an integration protocol for this combined sensor package and establish communication procedures that could allow this type of data correction to be done in-situ.
We have added the following text highlighted in blue to the manuscript on page 18, lines 479-482 to clarify this:
“The LOC + optode sensor system provided complementary strengths of low/stable pH offsets (LOC) and rapid (optode) data collection. Furthermore, having two sensor technologies enhances data robustness, provides flexibility in expanding modular observational networks, and removes the reliance on a single commercial platform that can be vulnerable to supply/servicing issues.”
- Cost would be useful to include
We are hesitant to include exact cost into the discussion, but we do understand this is an important consideration for the community. These are currently all state-of-the-art oceanographic pH sensors, and cost is a relevant factor. Therefore, we have added a “cost indication” within Table 1 (page 5, line 139) using a scale of £-£££ so that readers can understand the relative cost of the sensors.
-
AC1: 'Reply on RC1', Anthony Lucio, 23 Mar 2026
-
CC1: 'Comment on egusphere-2025-5566', Anthony Lucio, 16 Jan 2026
RC1: Line 74: ISFET-based pH warmup time depends on choice of reference electrode. If using the Cl- ISE, there is a longer conditioning requirement.
AJL: Thank you for pointing that out. We see that the Deep SeapHOx V2 (used in the present study) only has an external Ag/AgCl reference electrode and not in addition to an internal (gelled electrolyte) Ag/AgCl reference electrode. This results in a longer conditioning time and as such the salinity correction to the pH becomes quite important. We will make sure this is clearer in the text.
RC1: Table 1 (and text above): Size of sensors is a little unusual to include without more details- why list the size of the seafet if you deployed a seaphox? Is this just the sensor or the electronics, housing, power, etc.? Sensor footprint is different from a fully autonomous package.
AJL: The LOC and optode sensors are only pH sensors, so we wanted to compare the physical size to the most relevant ISFET-based pH only sensor (hence the SeaFET dimensions). The sizes listed are of the sensor exterior housing to give an indication of their overall footprint that is relevant to physical integration onto vehicles/platforms, but we should note that (as deployed) the Deep SeapHOx V2 and AquapHOx-L-pH (optode) were operated as fully autonomous systems whereas the LOC pH sensor utilised an external power supply.
RC1: Line 159: I don’t recall mention of pH scales used. It is important when comparing different sensors to describe which scale is being used and where/how conversions are being applied. What is the composition of the pyroscience calibration solutions?
AJL: The pH reported is the total proton scale (pHT). We will make sure this is clearly stated in the text. Unfortunately, the composition of the PyroScience buffer solutions is not disclosed but they do recommend using their specific pH 2 / pH 11 buffers instead of common commercial buffers that contain preservatives.
RC1: General notes...
AJL: Thank you for highlighting a few additional comments. We will address these general notes in our formal author response to be submitted in due course.Citation: https://doi.org/10.5194/egusphere-2025-5566-CC1 -
RC2: 'Comment on egusphere-2025-5566', Anonymous Referee #2, 23 Jan 2026
This manuscript presents the results of an intercomparison between several pH sensors and laboratory measurements that were deployed in a challenging (in terms of biofouling) field environment. The experimental approach was very thorough and it seems likely that the dataset is excellent for doing the presented analysis. Overall it will be a useful contribution to the field. The concept of using a high-accuracy, low-resolution sensor to calibrate a low-accuracy, high-resolution one is interesting and does need more work in the context of specific sensor setups but it is not novel. There are a couple of limitations with the analysis. Primarily more evidence is needed to support the proposed approach, e.g. comparisons with other possible approaches and improvements to mitigate identified limitations, if the authors wish to present it as a guideline for the community to follow. My major points are in titled sections below, followed by minor comments and then technical corrections.
Instability of a 2-point regression
The issue of instability in a linear regression with 2 points (lines 331-333) is the major problem with the approach presented; it is mentioned briefly but not convincingly dealt with. Were this a study focused on reporting a particular observational dataset to interpret in some environmental context, it would probably be sufficient, because the uncertainties for the method used have been calculated and reported. But given the aim of this manuscript to provide guidelines for the research community on how to do this correction, it becomes essential here to do the extra work to see if the approach can be adapted to eliminate this issue, or at least to demonstrate that adaptations don’t add any value. The authors have already collected the data needed to test these things. For example, linear fits could be made over 3+ consecutive LOC points to reduce the sensitivity to individual points. Doing linear fits also means there are sharp transitions between gradients as points are crossed; how does doing some smoothing fit (e.g. PCHIP, moving average) between the subsampled LOC points affect the quality of the corrected data?
I recognise this requires some more work, but a paper proposing community guidelines should have done the due diligence to show that the guidelines are actually the best way of doing something. (While fully accepting that “the best way” will be a balance between complexity of the approach and accuracy of the results.) Alternatively, the relevant parts should be rephrased to indicate that this is a manuscript proposing and evaluating one potential way to do something, not claiming that this is a guideline that others should follow. Taking the latter choice would also reduce the impact of this study.
Terminology: accuracy
Throughout, especially e.g. Table 3, Fig. 7 and associated discussion: the mean offset is referred to as “accuracy” which is not always helpful terminology. This leads to e.g. describing an apparent accuracy “minimum” at some intermediate correction interval (line 326). An alternative, and to me more convincing, interpretation of Table 3 is that there is some constant offset (-0.018) e.g. due to an offset between LOC and co-samples that cannot be improved upon by increasing resolution, but because this is negative and the initial offset is positive (+0.111), you necessarily have to pass through zero to get from one to the other. But it doesn’t mean that the results are “more accurate” at that intermediate point. Indeed if the initial offset happened to be more negative than this final constant value (or the constant value positive) then we the apparent “accuracy minimum” would probably not appear. In this case, the accuracy minimum is a fluke and not a reproducible feature that would necessarily be found in other datasets that had a different offset between the LOC and co-samples.
Not helping my interpretation of the above is that I found it unclear exactly how this “accuracy” error was calculated. I’m assuming it’s corrected optode vs lab co-samples in the comment above. Please clarify or make more obviously explicit in the relevant parts of the discussion.
Finally the points stated to represent these local minima in the text (2 days for x and 1 day for 1-sigma) are not the lowest local minima in the table (1 week for x and 2 hours for 1-sigma), so I don’t follow why they were selected to be highlighted.
I think a big step towards a solution here would be to be more specific about what is meant in each statement and avoid using the somewhat ambiguous term “accuracy” when a more specifically meaningful alternative word is available.
What is accurate?
The manuscript refers often to producing measurements that are “accurate” but does not define what this means – what constitutes “accurate” and whether something is accurate enough? It depends on the research question being asked of the data. Please could this be addressed briefly where relevant (e.g., Introduction and Conclusions, maybe relevant parts of R&D). Often this is done with reference to the GOA-ON “weather” and “climate” uncertainty targets (Newton et al., 2015), although other approaches are possible.
Manufacturer claims
Manufacturer accuracy claims are presented in the Introduction and Methods. They are sometimes alluded to in the R&D but it might be useful to have a short paragraph or section that directly addresses if these accuracy claims could indeed be achieved by the various sensors in the tests here.
Minor comments
158 PyroScience have several different sensor caps available, with different pK values and returning results on different pH scales; please specify what was used.
159 The PyroScience optode software also has the option to add a third calibration point of a buffer (e.g., tris) within the measuring range to improve accuracy. Could the authors comment on if and how excluding this step may have affected their results and conclusions?
167 Was it really possible to always get the sample from the harbour, into the lab, in an optical cell, equilibrated to 20 °C, injected with mCP and measured in under 5 minutes? Impressive if so, but the relevant time to report would be the actual moment of measurement, not just the moment that the sample handling in the lab began – please check & confirm.
202 Does a “battery failure” refer to the battery running out of charge, or something else more dramatic? Please clarify.
Technical corrections
pH is dimensionless; please remove references to “pH units” throughout.
42 Grammar: change “and until recently, was” to e.g. (“which until recently was”).
43 If pH is calculated from DIC, TA or fCO2 then it is not a “measurement of pH”, please rephrase.
55 Provide a location for the NOC.
58 The ocean goes deeper than 6000 m, please rephrase “full ocean depth”.
67 Not clear specifically what “This” refers to.
77 Grammar: either “version” => “versions” and “are a cylinder” => “are cylinders”, or “are” => “is”. Also probably “sensors” => “sensor”.
108 Presumably “The NOC” should be “The harbour”, or make it clear that the harbour is at the NOC in the previous sentence.
155 “calibration-less” is a bit awkward; “calibration-free”?
Section 3.2 Several aspects of results that should be in past tense are written in the present. Also applies to other parts of the Results & Discussion. Some of these I have noted as technical corrections here but my list will be incomplete so please check through carefully.
290 Brackets around the two b terms at the end of Eq. (5) are unnecessary.
Table 3 Should n be the same in every row? I would have guessed it is how many LOC points were used in the calibration, which would be smaller for the longer correction intervals. If not, then please rewrite the caption to make it clearer what n means. If it is supposed to be the same then it doesn’t need to be a column in the table. Also, please mention in the caption which sensor data are being shown and what they are being compared to.
343 “30/80/2023” => “30/08/2023”.
359 “are reporting” => “were reporting” or “reported”.
376 “are tracking” => “were tracking”.
Citation: https://doi.org/10.5194/egusphere-2025-5566-RC2 -
AC2: 'Reply on RC2', Anthony Lucio, 23 Mar 2026
Referee #2
This manuscript presents the results of an intercomparison between several pH sensors and laboratory measurements that were deployed in a challenging (in terms of biofouling) field environment. The experimental approach was very thorough and it seems likely that the dataset is excellent for doing the presented analysis. Overall it will be a useful contribution to the field. The concept of using a high-accuracy, low-resolution sensor to calibrate a low-accuracy, high-resolution one is interesting and does need more work in the context of specific sensor setups but it is not novel. There are a couple of limitations with the analysis. Primarily more evidence is needed to support the proposed approach, e.g. comparisons with other possible approaches and improvements to mitigate identified limitations, if the authors wish to present it as a guideline for the community to follow. My major points are in titled sections below, followed by minor comments and then technical corrections.
Instability of a 2-point regression
- The issue of instability in a linear regression with 2 points (lines 331-333) is the major problem with the approach presented; it is mentioned briefly but not convincingly dealt with. Were this a study focused on reporting a particular observational dataset to interpret in some environmental context, it would probably be sufficient, because the uncertainties for the method used have been calculated and reported. But given the aim of this manuscript to provide guidelines for the research community on how to do this correction, it becomes essential here to do the extra work to see if the approach can be adapted to eliminate this issue, or at least to demonstrate that adaptations don’t add any value. The authors have already collected the data needed to test these things. For example, linear fits could be made over 3+ consecutive LOC points to reduce the sensitivity to individual points. Doing linear fits also means there are sharp transitions between gradients as points are crossed; how does doing some smoothing fit (e.g. PCHIP, moving average) between the subsampled LOC points affect the quality of the corrected data?
I recognise this requires some more work, but a paper proposing community guidelines should have done the due diligence to show that the guidelines are actually the best way of doing something. (While fully accepting that “the best way” will be a balance between complexity of the approach and accuracy of the results.) Alternatively, the relevant parts should be rephrased to indicate that this is a manuscript proposing and evaluating one potential way to do something, not claiming that this is a guideline that others should follow. Taking the latter choice would also reduce the impact of this study.
We thank the Referee for this comment and the opportunity to clarify our data fitting methodology. The linear fit used in our approach is not intended to perform a statistically rigorous regression in the traditional sense but rather functions as an enhanced offset correction. Specifically, instead of applying a simple constant offset (i.e., a correction only along the y-axis in pH space), the two-point line fit provides additional information about the local gradient in the LOC data within a user-defined correction window. This allows the correction to account for gradual drift in the high frequency optode measurements while maintaining the short-term variability captured by the optode.
Although LOC measurements were collected every 1–2 hours during this study, we demonstrate in Table 3 that a substantially reduced LOC sampling frequency (e.g., once every 24 hours) is sufficient to generate correction metrics that bring the drifting/offset high-frequency optode data into close agreement with an independent pH sensor (SeapHOx). Increasing the LOC sampling frequency would provide additional fitting points, but this comes at the expense of increased reagent consumption, power demand, and waste generation. Importantly, our analysis indicates that these additional measurements yield only marginal improvements in performance.
Nonetheless, to evaluate the Referee’s suggestion more fully, we tested a range of alternative fitting approaches, including multi-point linear fits, moving-average smoothing of LOC data, offset-only corrections, and higher-order interpolation methods (e.g., PCHIP). Performance was evaluated using the mean bias, standard deviation and root mean square error (RMSE) of the differences between corrected optode measurements and the discrete co-samples measured in the laboratory. While some alternative approaches slightly reduced bias or produced comparable precision, none consistently improved both metrics simultaneously. In particular, more complex interpolation methods tended to increase systematic bias or introduce signs of overfitting. Across the tested methods, the two-point line fitting approach described in the manuscript provided the most balanced performance, maintaining low bias while also minimising the spread of residual differences relative to the reference co-sample dataset. For this reason, and given its conceptual simplicity and low data requirements, we consider it the most appropriate method for the correction approach presented here.
To highlight this, we have now expanded ESI section 3 to also report the results of different fitting methods. The changes are highlighted in blue to the ESI on pages S7-S8, lines 78-109 and in the main manuscript (using text below) on page 11, lines 292-295.
“We evaluated several alternative correction methods e.g., offset correction, moving average smoothing, multi-point line fitting, etc. that can be seen in ESI 3. While these methods produced comparable results in some cases, the method described here provides the most favourable balance between correction performance, simplicity of implementation and a reduced LOC sampling frequency.”
Terminology: accuracy
- Throughout, especially e.g. Table 3, Fig. 7 and associated discussion: the mean offset is referred to as “accuracy” which is not always helpful terminology. This leads to e.g. describing an apparent accuracy “minimum” at some intermediate correction interval (line 326). An alternative, and to me more convincing, interpretation of Table 3 is that there is some constant offset (-0.018) e.g. due to an offset between LOC and co-samples that cannot be improved upon by increasing resolution, but because this is negative and the initial offset is positive (+0.111), you necessarily have to pass through zero to get from one to the other. But it doesn’t mean that the results are “more accurate” at that intermediate point. Indeed if the initial offset happened to be more negative than this final constant value (or the constant value positive) then we the apparent “accuracy minimum” would probably not appear. In this case, the accuracy minimum is a fluke and not a reproducible feature that would necessarily be found in other datasets that had a different offset between the LOC and co-samples.
We thank the Referee for this viewpoint. We agree that considering the mean offset alone can lead to an apparent accuracy minimum when a positively biased dataset (e.g., optode) is corrected using a second dataset that itself exhibits a negative bias (e.g., LOC) relative to validation co-samples. In this situation the mean difference must pass through zero as the correction interval changes (i.e., as the performance improves in our case), and we agree that this crossing point does not by itself imply that the measurements are most accurate at the zero point.
For this reason, in the present work we do not interpret the minimum in the mean difference alone as indicating maximum accuracy. Instead, the sensor performance is evaluated using the combined statistics of the mean difference, the standard deviation of the residuals, and the RMSE all relative to the discrete co-sample measurements. The RMSE metric incorporates both systematic bias and random variability and therefore provides a more representative measure of total error. When these metrics are considered together (Table 3), the intermediate correction intervals correspond to reduced overall error rather than simply reflecting the zero-crossing of the mean offset.
However, to remove any potential ambiguity we have amended the below text in the Results & Discussion on pages 13, lines 339-345 to clarify and define explicitly how accuracy is quantified in this study:
“Specifically, sensor performance in the present work is evaluated using three statistical metrics relative to the independent discrete co-sample reference measurements, i.e., the mean error (x̅ ΔpH), the standard deviation of the error (1σ ±ΔpH), and the root mean square error (RMSE ΔpH). The mean error (i.e., x̅) represents systematic bias (offset) relative to the reference measurements, while the standard deviation (1σ) reflects the spread of residual differences and therefore the measurement precision. The RMSE incorporates both the systematic bias and the random variability and therefore provides a combined estimate of total measurement error.”
Lastly, we have also removed nearly all inferences to “accuracy” throughout the manuscript and instead have referred to the above metrics as performance (or difference) relative to discrete validation co-samples.
- Not helping my interpretation of the above is that I found it unclear exactly how this “accuracy” error was calculated. I’m assuming it’s corrected optode vs lab co-samples in the comment above. Please clarify or make more obviously explicit in the relevant parts of the discussion.
We have reviewed the manuscript and ensured that all sensor performances are reported as being relative to discrete co-sample reference data. Please refer to Author Response in above point (Referee #2, point 2) for further details.
- Finally the points stated to represent these local minima in the text (2 days for x and 1 day for 1-sigma) are not the lowest local minima in the table (1 week for x and 2 hours for 1-sigma), so I don’t follow why they were selected to be highlighted.
We thank the Referee for highlighting this point. The values referenced in the manuscript text and reported in the table were intended to illustrate local minima in the trend rather than the absolute minima. When the relationship is examined across the full range of sampling intervals (e.g., from high to low or vice versa), local minima are observed at approximately the 2-day and 1-day points for the average and standard deviation, respectively. Therefore, these points were highlighted in the text. We agree that the table also contains lower absolute minima, however, we have illustrated that their presence is likely an artefact of the fitting process itself. We also highlight that this is also influenced by the bias of the LOC sensor itself, which is driving the correction and is itself at a slightly negative bias relative to the discrete co-sample dataset. However, to clarify this we have added the below text on page 14, lines 368-372:
“Furthermore, the decreasing trend in x̅ with higher correction frequencies will pass through zero difference offset (relative to discrete co-samples) as the optode data are being corrected by the LOC dataset which is itself at a negative offset relative to the reference dataset. Therefore, x̅ = 0 does not necessarily mean that at that point it is the most accurate point, as the sensor performance is evaluated across three collective metrics (i.e., x̅, 1σ, and RMSE) relative to validation samples.”
Readers are also referred to page 14, lines 360-368 which discusses this further.
I think a big step towards a solution here would be to be more specific about what is meant in each statement and avoid using the somewhat ambiguous term “accuracy” when a more specifically meaningful alternative word is available.
We thank the Referee for this helpful suggestion. We agree that the term “accuracy” can be ambiguous if not explicitly defined. In the revised manuscript we have further clarified our terminology (see Author reply to Referee #2, point 2) and continue to report the specific statistical metrics used to evaluate performance. We believe this now provides a clearer and more quantitative description of sensor performance.
What is accurate?
- The manuscript refers often to producing measurements that are “accurate” but does not define what this means – what constitutes “accurate” and whether something is accurate enough? It depends on the research question being asked of the data. Please could this be addressed briefly where relevant (e.g., Introduction and Conclusions, maybe relevant parts of R&D). Often this is done with reference to the GOA-ON “weather” and “climate” uncertainty targets (Newton et al., 2015), although other approaches are possible.
We appreciate the Referee for pointing this out as it helps bring context to the larger aim of sensor development. We have added the following sentences to the Introduction (page 2, lines 43-46) and Conclusion (page 19, lines 496-498) sections, respectively to comment on this point:
“Current observational targets for monitoring ocean acidification have been proposed by the Global Ocean Acidification Observing Network (GOA-ON) framework, which defines approximate uncertainty thresholds of 0.02 for weather quality data and 0.003 for climate quality data (Newton et al., 2015). These benchmarks provide a useful reference point for evaluating sensor performance.”
and
“The corrected optode measurements achieved RMSE values on the order of ~0.02 relative to discrete co-samples, thus meeting the GOA-ON weather-quality observational target, which is sufficient for resolving short-term variability in coastal carbonate chemistry while minimising reagent consumption and operational complexity.”
Manufacturer claims
- Manufacturer accuracy claims are presented in the Introduction and Methods. They are sometimes alluded to in the R&D but it might be useful to have a short paragraph or section that directly addresses if these accuracy claims could indeed be achieved by the various sensors in the tests here.
We have added the below text on page 17, lines 441-444 to mention this point:
“It is worth noting that the SeapHOx pH sensor performance in the field from our work was well within the manufacturer reported limits of ± 0.050. Furthermore, the field measured performance of the LOC pH sensor was close to achieving its manufacturer reported performance of <0.009. Therefore, this 6-month shallow water field study has demonstrated that it is feasible to achieve these manufacturer-reported performances.”
Minor comments
- 158 PyroScience have several different sensor caps available, with different pK values and returning results on different pH scales; please specify what was used.
This has now been updated with the following text page 6, line 171 to clarify this:
“…a fresh optode pH cap (PHCAP-PK8T-SUB, PyroScience GmbH) was soaked…”
- 159 The PyroScience optode software also has the option to add a third calibration point of a buffer (e.g., tris) within the measuring range to improve accuracy. Could the authors comment on if and how excluding this step may have affected their results and conclusions?
This is a fair point raised by the Referee. The PyroScience optical pH sensors (e.g., AquapHOx-L-pH as used in the present study) are typically calibrated using a 1- or 2-point calibration in (non-seawater) acidic and basic pH buffers as recommended and supplied from the manufacturer. This is commonly pH 2 to represent the fully protonated form and pH 11 for the fully deprotonated form. The sensor response from these two buffers represents the plateau regions in a response curve, which is a sigmoidal shape, and intentionally occurs outside of the operable range of the sensor which is typically pH 7-9. An optional third calibration point in a known seawater sample near the pH of the anticipated deployment pH may be applied to adjust the absolute offset within the measurement range. Recent work from Wirth et al. (2024) has shown that including this third point can further reduce offsets and improve absolute accuracy in laboratory settings. However, the two-point buffer calibration defines the sensor response function, and omission of the third point largely affects the absolute bias rather than precision of the measurements. Furthermore, our findings do confirm what Wirth et al. (2024) have proposed in their Comments and Recommendations section: ‘It may be possible to correct for drift with multiple validation samples taken throughout the deployment. Validation samples every 3–4 weeks and at the end of the deployment are recommended to maintain the weather objective quality of 0.02 pH.’
As such we have added the below text on page 10, lines 263-266 to clarify this point:
“Recent work has shown that using a three-point buffer calibration (in lieu of a two-point buffer calibration) can be beneficial in reducing signal offsets and improving absolute accuracy for optode-based pH sensors in laboratory settings, but for field deployments discrete co-samples are required to maintain this level of sensor performance (Wirth et al., 2024).”
- 167 Was it really possible to always get the sample from the harbour, into the lab, in an optical cell, equilibrated to 20 °C, injected with mCP and measured in under 5 minutes? Impressive if so, but the relevant time to report would be the actual moment of measurement, not just the moment that the sample handling in the lab began – please check & confirm.
The lab spectrophotometer and setup were always prepped prior to discrete co-sampling, and the laboratory is directly accessible to the quayside at NOC. Effectively all we had to do was inject the sample and allow it to equilibrate. However, as the temperature equilibration employed was indeed 5 minutes, we have increased the total time to measurement indicated to be ≤10 minutes. The measurement jacketed glass cell is connected to a recirculation water bath which affords relatively fast temperature equilibration of the sample. Furthermore, the relevant time to report is not the point of measurement in the lab but rather the point of sample collection in the harbour. Each discrete co-sample represents a water mass collected at a specific point in time, which allows us to have a meaningful comparison to sensor measurements made at that same water mass and point in time. As such, we have also updated the text page 6, lines 181-183 to clarify which time is used for the co-sample:
“…the time between co-sampling and lab measurements was ≤10 minutes (i.e., the harbour was directly accessible from the NOC), and the time of the discrete sample was noted upon collection (i.e., the collection time was used for comparison to sensor data).”
- 202 Does a “battery failure” refer to the battery running out of charge, or something else more dramatic? Please clarify.
We have amended the text on page 7, line 218-219 to clarify this point:
“…the optode sensor’s battery was never depleted even during periods of...”
Technical correctionspH is dimensionless; please remove references to “pH units” throughout.
We have changed the wording throughout the manuscript and ESI to not report any values as “pH units”.
42 Grammar: change “and until recently, was” to e.g. (“which until recently was”).
Done.
43 If pH is calculated from DIC, TA or fCO2 then it is not a “measurement of pH”, please rephrase.
This wording has been rephrased as suggested.
55 Provide a location for the NOC.
Done.
58 The ocean goes deeper than 6000 m, please rephrase “full ocean depth”.
This wording has been rephrased as suggested.
67 Not clear specifically what “This” refers to.
We have clarified this as “This technology has…”
77 Grammar: either “version” => “versions” and “are a cylinder” => “are cylinders”, or “are” => “is”. Also probably “sensors” => “sensor”.
This wording has been rephrased as suggested.
108 Presumably “The NOC” should be “The harbour”, or make it clear that the harbour is at the NOC in the previous sentence.
This wording has been rephrased as suggested.
155 “calibration-less” is a bit awkward; “calibration-free”?
This wording has been rephrased as suggested.
Section 3.2 Several aspects of results that should be in past tense are written in the present. Also applies to other parts of the Results & Discussion. Some of these I have noted as technical corrections here but my list will be incomplete so please check through carefully.
We have reviewed the Results & Discussion section to ensure that past tense was used.
290 Brackets around the two b terms at the end of Eq. (5) are unnecessary.
Done.
Table 3 Should n be the same in every row? I would have guessed it is how many LOC points were used in the calibration, which would be smaller for the longer correction intervals. If not, then please rewrite the caption to make it clearer what n means. If it is supposed to be the same then it doesn’t need to be a column in the table. Also, please mention in the caption which sensor data are being shown and what they are being compared to.
This has now been updated as suggested. The caption for Table 3 (page 13, line 355) has been updated as shown below to clarify this as well:
“Performance of the optode sensor, relative to discrete lab-validated co-samples, is reported as the mean sensor error (x̅ ΔpH), standard deviation of the error (1σ ±ΔpH), and the root mean square error (RMSE ΔpH). The number of samples (n) for determining these metrics were n = 44 for all correction intervals.”
343 “30/80/2023” => “30/08/2023”.
Done.
359 “are reporting” => “were reporting” or “reported”.
This wording has been rephrased as suggested.
376 “are tracking” => “were tracking”.
This wording has been rephrased as suggested.
-
AC2: 'Reply on RC2', Anthony Lucio, 23 Mar 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 343 | 155 | 34 | 532 | 49 | 21 | 22 |
- HTML: 343
- PDF: 155
- XML: 34
- Total: 532
- Supplement: 49
- BibTeX: 21
- EndNote: 22
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Here the authors present data from two deployments of pH sensors that operate at different frequencies. By deploying a pH optode alongside their LOC (lab on chip) pH sensor, they were able to demonstrate that the greater accuracy provided by the LOC pH measurement could be used to correct drift in the pH optode signal. This opened up the ability to measure pH at far greater frequency by the LOC sensor alone for longer duration using the higher frequency optode that is prone to more rapid drift. An ISFET-based pH sensor package was deployed alongside as well to provide an independent indication of optode performance and assessment of the LOC applied correction in addition to bottle samples. As part of the study, it was also determined how often the optode benefitted from the correction by the LOC sensor.
A few comments:
Line 74: ISFET-based pH warmup time depends on choice of reference electrode. If using the Cl- ISE, there is a longer conditioning requirement.
Table 1 (and text above): Size of sensors is a little unusual to include without more details- why list the size of the seafet if you deployed a seaphox? Is this just the sensor or the electronics, housing, power, etc.? Sensor footprint is different from a fully autonomous package.
Line 159: I don’t recall mention of pH scales used. It is important when comparing different sensors to describe which scale is being used and where/how conversions are being applied. What is the composition of the pyroscience calibration solutions?
General: