the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Combining Neural Networks and Data Assimilation to enhance the spatial impact of Argo floats in the Copernicus Mediterranean biogeochemical model
Anna Teruzzi
Gloria Pietropolli
Luca Manzoni
Gianluca Coidessa
Gianpiero Cossarini
Abstract. Biogeochemical-Argo (BGC-Argo) float profiles provide substantial information for key vertical biogeochemical dynamics and successfully integrated in biogeochemical models via data assimilation approaches. Although results on the BGC-Argo assimilation are encouraging, data scarcity remains a limitation for their effective use in operational oceanography.
To address availability gaps in the BGC-Argo profiles, an Observing System Experiment (OSE), that combines Neural Network (NN) and Data Assimilation (DA), has been performed here. NN was used to reconstruct nitrate profiles starting from oxygen profiles and associated Argo variables (pressure, temperature, salinity), while a variational data assimilation scheme (3DVarBio) has been upgraded to integrate BGC-Argo and reconstructed observations in the Copernicus Mediterranean operational forecast system (MedBFM). To ensure high quality of oxygen data, a post-deployment quality control method has been developed with the aim of detecting and eventually correcting potential sensors drift.
The Mediterranean OSE features three different setups: a control run without assimilation; a multivariate run with assimilation of BGC-Argo chlorophyll, nitrate, and oxygen; and a multivariate run that also assimilates reconstructed observations.
The general improvement of skill performance metrics demonstrated the feasibility in integrating new variables (oxygen and reconstructed nitrate). Major benefits have been observed in reproducing specific BGC process-based dynamics such as the nitracline dynamics, primary production and oxygen vertical dynamics.
The assimilation of BGC-Argo nitrate corrects a generally positive bias of the model in most of the Mediterranean areas, and the addition of reconstructed profiles makes the corrections even stronger. The impact of enlarged nitrate assimilation propagates to ecosystem processes (e.g., primary production) at basin wide scale, demonstrating the importance of BGC-profiles in complementing satellite ocean colour assimilation.
- Preprint
(7638 KB) - Metadata XML
- BibTeX
- EndNote
Carolina Amadio et al.
Status: open (extended)
-
RC1: 'Comment on egusphere-2023-1588', Anonymous Referee #1, 19 Sep 2023
reply
Please note that this is a co-review by an early-career and mid-career scientist.
This paper details new processing of BGC-Argo data in the Mediterranean Sea, including oxygen sensor drift correction and the use of a neural network to reconstruct nitrate from other measured variables. An existing data assimilation scheme, previously used to assimilate chlorophyll and (non-reconstructed) nitrate from BGC-Argo, is extended to also assimilate reconstructed nitrate and measured oxygen profiles. Test runs demonstrate a positive impact on model analyses of assimilating these new variables.
The study is novel, of interest to the community, and within scope for Ocean Science. The study is generally well-conceived and well-presented, but there are aspects which should be more clearly explained.
In many places, the manuscript is hard to follow and would benefit from being made clearer. Some specific examples are given in the comments below, but are not exhaustive. As a general example, the use of passive voice, in particular in the methods section, makes it challenging in some parts to distinguish your work from previous studies. As another example, in the introduction the topics that should be introduced are introduced, but the text lacks flow and links between the topics, and so does not lead to the question you are addressing.
The manuscript would also benefit from English language copy editing, but we believe the journal offers this service as standard, so will not list such technical corrections as part of this review.
The paper aims to “address availability gaps” (Line 4) but this objective is not clear throughout the paper. The introduction does not clarify how, where, or why the data gaps affect the analysis. Results of the two model runs with and without reconstructed observations clearly show differences, but these are not always linked to the change in coverage. Also, the abstract and discussion conclude with a note about Argo data being complementary to satellite ocean colour assimilation, but this study does not show that.
Line 20: “Array for Real-time Geostrophic Oceanography” - this acronym form does not appear to be widely or currently used, suggest just using the name “Argo”.
Line 29: “approximately 270,000 profiles worldwide until now” - better to put “as of [date]” rather than “until now”.
Line 39: “By improving the accuracy…” is the result of the QC and would therefore fit better at the end of the sentence to increase clarity.
Line 45: “encouraged” replace with “is necessary”
Line 46: “optode” was not mentioned before
Line 47: Suggestion to link the topics for better flow of the text: say that purely observation-based Argo studies are regional, and using data assimilation has the potential to create a synthesis
Line 50: “DA underpins …” - suggest rephrasing this sentence slightly to articulate more clearly the aims and principles of DA.
Line 54: “NN algorithms” - “NN” hasn’t been defined yet in the main text. Need to add at least a couple of sentences introducing neural networks at the start of this paragraph.
Line 54: “match specific DA tasks” - a better phrasing might be something like “have the potential to perform specific tasks related to observation processing and DA”. The discussion of the following studies could be made clearer too, as well as stating that it is the method of Pietropolli et al. (2023) that is used in this study.
Line 58: May mention here or later that these examples are timeseries of chlorophyll, while the use of reconstructed nitrate is novel
Line 72: May be worth stating what the first release included for a full account of the developments
Line 81: Not clear at this point what “sequential modular approach” means
Line 87-100: The paragraph about the MedSea oceanography feels out of place and may be covered at the beginning of the first results section.
Line 114 & Fig. 1: the flow of information between 3DVarBio and OGSTM-BFM is implied to be one-way, but presumably it’s two-way, with OGSTM-BFM fields also an input to 3DVarBio? Also, “3DVarBio” is used in the text, but “3D-VarBio” in the figure (and on line 116) - these should be consistent.
Line 150: “preserve optimal values” - a better wording would be “preserve existing values” or “preserve background values”, there’s no guarantee they’re optimal.
Line 152: “spurious assimilation” - please be more specific. “spurious correlations”?
Line 154: “it barely affects other variables” - is it known how model-dependent this finding is? Since the models used here and in Skákala et al. (2022) are very similar, this is a reasonable approach to take here, but it could be worth clarifying that this lack of effect on other variables is in the model, not necessarily the real world.
Line 161: “we decided to not use different values of error for the two nitrate subsets in order to show the highest potential impact of the OSE.” A caveat needs adding either here or in the discussion that as a result of this decision, the assimilation may be non-optimal in terms of fitting the true state (as opposed to just fitting the observations). The same could be said about the lack of accounting for representation error.
Line 163-4: Is there a reference for the oxygen observation error values used? If not, please state how these values were chosen.
Section 2.3: While it is fine to refer the reader to Pietropolli et al. (2023) for details, it would be helpful to have a slightly longer and clearer description of the NN-MLP-MED methodology in this section.
Line 174: “The error of reconstructed nitrate, obtained by using the EMODnet as validation dataset, was 0.5 mmol m−3”. As this figure contrasts with the uncertainty value of 0.87 mmol m−3 given in the previous section, a little more context would be useful. For instance, introduce the EMODnet dataset (that hasn’t been done yet), state that the NN-MLP-MED method was trained on 80% of the EMODnet data, then had an RMSE of 0.5 mmol m−3 when tested against the remaining 20% of EMODnet data, and an RMSE of 0.87 mmol m−3 when the methodology is applied to BGC-Argo data that is not in EMODnet (if I have interpreted Pietropolli et al. (2023) correctly).
Line 180: It would be useful to put the information about added reconstructed profiles into context. As a suggestion, that could be in the form of stating for each aggregated region or the sub-regions how much reconstructed data is added. Having this information about added data per region may be useful in later sections e.g. when looking at RMSE changes between the DA runs, to enable linking the change in coverage to a change in RMSE (or highlighting where this does not link for any reason).
Line 184: “Adjusted and delayed mode data were selected for oxygen and chlorophyll, while exclusively DM data were considered for nitrate.” - A sentence or two explaining the reasons for these choices would be useful. In particular, what level of drift correction for oxygen has been done in these data sets?
Fig. 2: “of chlorophyll-a (red), Nitrate in-situ (orange) and reconstructed Nitrate (grey)” - this may just be a matter of perception, but the colours used don’t look like red/orange/grey.
Line 196: I may not understand the approach, but what happens if a float lives less than a year, which is when the largest drift occurs (Line 193)? Will the drift correction be applied to t0, or not because this is for operational purposes?
Line 197: Please give more details about the splitting into “inliers and outliers”.
Line 201: If drift is expected to linearly increase with depth, why use the average drift between 600 m and 800 m, rather than just the drift at 600 m? This may be reasonable (we’re not experts on oxygen sensor drift), but it’s not clear from the explanation.
Line 207: “Marine Copernicus Service” - “Copernicus Marine Service”
Line 208: “initial conditions from EMODnet dataset (Details are provided in Salon et al. 2019 ).” - does this include the same spin-up procedure as in Salon et al.? That should be detailed.
Line 216-222: This paragraph needs to be clearer, especially around the oxygen saturation procedure.
Line 223-227: How were these thresholds arrived at?
Fig. 3: What does the horizontal line at 600 m represent?
Line 243: “After removing of drift, the deep oxygen concentrations results to be closer to the EMODnet climatological data, allowing to include a higher number of profiles” - does this mean that in the absence of the drift correction the profiles would be expected to fail QC checks and be excluded, rather than the uncorrected profiles being assimilated?
Line 246-247: “While for the satellite comparison the model daily averages are considered, the model first guess (i.e. the model state before the assimilation) is used for metrics based on BGC-Argo.” - This is reasonable given that BGC-Argo is assimilated and ocean colour not, but a clearer reasoning for the decision should be given. Furthermore, is the first guess instantaneous (at midnight? at the observation time?) or an average? Also, it states here that for the satellite comparison the model is a daily average, but two paragraphs later that the observations are a weekly average?
Line 248: RMSE has its place, including here, but could usefully be supplemented by other validation statistics. Furthermore, RMSE is only optimal for Gaussian variables, is this the case for the variables considered? If not, then more robust statistics may be preferable.
Line 250: “the aggregated combination” was not mentioned before. Could be done with the description of Fig. 2.
Line 253 and following: The changes in RMSE should be linked to the change in coverage. From visual inspection, most regions of reduced RMSE are regions of higher pseudo-nitrate (Fig 2), but not all of them e.g. Nwm. Other regions have no (additional) float data yet show changes in the RMSE.
Line 272: “directly ascribed to the increased number…” – this is not clear to me as the Figures do not show how the reconstructed obs are distributed over seasons.
Fig. 5 and associated discussion: it is not at all clear what is displayed in the figure. Absolute values? RMS errors? Percentage RMS errors? Are the x-axis values identical for all variables or have they just been cut off for all except the bottom panel?
Line 281: “Assimilating oxygen profiles enable reducing the model-BGC floats RMSE” - is it possible to know how much this is due to the oxygen assimilation, and how much to the chlorophyll and nitrate assimilation? The lack of impact of reconstructed nitrate is an indicator here, but some further comment would be useful.
Section 3.3.1 may benefit from rewriting for clarity. It is difficult to pick out the key message. As a suggestion (definitely not requirement) you may test describing the BGC differences one region at a time instead of structuring the paragraph by variable. Possibly that improves the understanding.
Line 302: How do you distinguish if a region is still drifting in Fig. 7? To me, ion2 (second column) looks as if it is drifting still, but the differences have smaller magnitude than in Med (third column)
Fig 10: Experiment names on the y axis differ from the main text. Write “npp” instead of “ppn”. I think it would help to include the basin boundaries for orientation. An idea to better visualise the results may be to plot the difference in the subplots for the DA experiments compared to HIND instead of absolute values but that’s not a necessary change.
Line 340: If I understand correctly, the results thus show that nitrate suggests reduced NPP and chlorophyll enhanced NPP? Does that point to a bias in the model or representation of a specific component? (e.g. PFTs) If that’s the case, it may be worth noting in the discussion.
Line 346: “0-300 m”, Figure 11 says 0-600 m in the title
Line 345: When introducing the impact indicator, please add information about how that differs from other statistical metrics such as RMSE or a simple comparison between fields at the end of the simulation. What is the advantage of using this metric?
Line 360: Where does this threshold come from?
Line 368: Do you mean “initial conditions” as in using the analysis to initialise a forecast? If so, that may need clarification because it may be confused with general initial conditions for ocean simulations. For initial conditions in a general sense the QC’d oxygen profiles may not qualify.
Line 377: “threshold on 1mmol/m3” – can you add a value for decadal variability in the sentences before, which puts this threshold into context to illustrate it is indeed a justified choice please.
Line 399: “more than 30 profiles” - what was that before? How much larger is the data availability?
Line 401: “can effectively be constrained” is that referring to previous papers such as observing system simulation experiments? If this is meant as a conclusion from your results, this statement may need more explanation.
Line 409: The decrease in available BGC Argo observations was not mentioned before, but feels like this should be a major motivation of this work (for the introduction)
Line 415 & 443: “feed-forward” - this term is suddenly introduced in the discussion and conclusion when describing the method used, it should be introduced and explained in the methods section.
Line 436: Since ocean colour is not assimilated in this study the statement “should be used in conjunction with…” should have a reference to literature
Citation: https://doi.org/10.5194/egusphere-2023-1588-RC1
Carolina Amadio et al.
Carolina Amadio et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
168 | 78 | 11 | 257 | 7 | 8 |
- HTML: 168
- PDF: 78
- XML: 11
- Total: 257
- BibTeX: 7
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1