the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Combining Neural Networks and Data Assimilation to enhance the spatial impact of Argo floats in the Copernicus Mediterranean biogeochemical model
Abstract. Biogeochemical-Argo (BGC-Argo) float profiles provide substantial information for key vertical biogeochemical dynamics and successfully integrated in biogeochemical models via data assimilation approaches. Although results on the BGC-Argo assimilation are encouraging, data scarcity remains a limitation for their effective use in operational oceanography.
To address availability gaps in the BGC-Argo profiles, an Observing System Experiment (OSE), that combines Neural Network (NN) and Data Assimilation (DA), has been performed here. NN was used to reconstruct nitrate profiles starting from oxygen profiles and associated Argo variables (pressure, temperature, salinity), while a variational data assimilation scheme (3DVarBio) has been upgraded to integrate BGC-Argo and reconstructed observations in the Copernicus Mediterranean operational forecast system (MedBFM). To ensure high quality of oxygen data, a post-deployment quality control method has been developed with the aim of detecting and eventually correcting potential sensors drift.
The Mediterranean OSE features three different setups: a control run without assimilation; a multivariate run with assimilation of BGC-Argo chlorophyll, nitrate, and oxygen; and a multivariate run that also assimilates reconstructed observations.
The general improvement of skill performance metrics demonstrated the feasibility in integrating new variables (oxygen and reconstructed nitrate). Major benefits have been observed in reproducing specific BGC process-based dynamics such as the nitracline dynamics, primary production and oxygen vertical dynamics.
The assimilation of BGC-Argo nitrate corrects a generally positive bias of the model in most of the Mediterranean areas, and the addition of reconstructed profiles makes the corrections even stronger. The impact of enlarged nitrate assimilation propagates to ecosystem processes (e.g., primary production) at basin wide scale, demonstrating the importance of BGC-profiles in complementing satellite ocean colour assimilation.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(7638 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7638 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1588', Anonymous Referee #1, 19 Sep 2023
Please note that this is a co-review by an early-career and mid-career scientist.
This paper details new processing of BGC-Argo data in the Mediterranean Sea, including oxygen sensor drift correction and the use of a neural network to reconstruct nitrate from other measured variables. An existing data assimilation scheme, previously used to assimilate chlorophyll and (non-reconstructed) nitrate from BGC-Argo, is extended to also assimilate reconstructed nitrate and measured oxygen profiles. Test runs demonstrate a positive impact on model analyses of assimilating these new variables.
The study is novel, of interest to the community, and within scope for Ocean Science. The study is generally well-conceived and well-presented, but there are aspects which should be more clearly explained.
In many places, the manuscript is hard to follow and would benefit from being made clearer. Some specific examples are given in the comments below, but are not exhaustive. As a general example, the use of passive voice, in particular in the methods section, makes it challenging in some parts to distinguish your work from previous studies. As another example, in the introduction the topics that should be introduced are introduced, but the text lacks flow and links between the topics, and so does not lead to the question you are addressing.
The manuscript would also benefit from English language copy editing, but we believe the journal offers this service as standard, so will not list such technical corrections as part of this review.
The paper aims to “address availability gaps” (Line 4) but this objective is not clear throughout the paper. The introduction does not clarify how, where, or why the data gaps affect the analysis. Results of the two model runs with and without reconstructed observations clearly show differences, but these are not always linked to the change in coverage. Also, the abstract and discussion conclude with a note about Argo data being complementary to satellite ocean colour assimilation, but this study does not show that.
Line 20: “Array for Real-time Geostrophic Oceanography” - this acronym form does not appear to be widely or currently used, suggest just using the name “Argo”.
Line 29: “approximately 270,000 profiles worldwide until now” - better to put “as of [date]” rather than “until now”.
Line 39: “By improving the accuracy…” is the result of the QC and would therefore fit better at the end of the sentence to increase clarity.
Line 45: “encouraged” replace with “is necessary”
Line 46: “optode” was not mentioned before
Line 47: Suggestion to link the topics for better flow of the text: say that purely observation-based Argo studies are regional, and using data assimilation has the potential to create a synthesis
Line 50: “DA underpins …” - suggest rephrasing this sentence slightly to articulate more clearly the aims and principles of DA.
Line 54: “NN algorithms” - “NN” hasn’t been defined yet in the main text. Need to add at least a couple of sentences introducing neural networks at the start of this paragraph.
Line 54: “match specific DA tasks” - a better phrasing might be something like “have the potential to perform specific tasks related to observation processing and DA”. The discussion of the following studies could be made clearer too, as well as stating that it is the method of Pietropolli et al. (2023) that is used in this study.
Line 58: May mention here or later that these examples are timeseries of chlorophyll, while the use of reconstructed nitrate is novel
Line 72: May be worth stating what the first release included for a full account of the developments
Line 81: Not clear at this point what “sequential modular approach” means
Line 87-100: The paragraph about the MedSea oceanography feels out of place and may be covered at the beginning of the first results section.
Line 114 & Fig. 1: the flow of information between 3DVarBio and OGSTM-BFM is implied to be one-way, but presumably it’s two-way, with OGSTM-BFM fields also an input to 3DVarBio? Also, “3DVarBio” is used in the text, but “3D-VarBio” in the figure (and on line 116) - these should be consistent.
Line 150: “preserve optimal values” - a better wording would be “preserve existing values” or “preserve background values”, there’s no guarantee they’re optimal.
Line 152: “spurious assimilation” - please be more specific. “spurious correlations”?
Line 154: “it barely affects other variables” - is it known how model-dependent this finding is? Since the models used here and in Skákala et al. (2022) are very similar, this is a reasonable approach to take here, but it could be worth clarifying that this lack of effect on other variables is in the model, not necessarily the real world.
Line 161: “we decided to not use different values of error for the two nitrate subsets in order to show the highest potential impact of the OSE.” A caveat needs adding either here or in the discussion that as a result of this decision, the assimilation may be non-optimal in terms of fitting the true state (as opposed to just fitting the observations). The same could be said about the lack of accounting for representation error.
Line 163-4: Is there a reference for the oxygen observation error values used? If not, please state how these values were chosen.
Section 2.3: While it is fine to refer the reader to Pietropolli et al. (2023) for details, it would be helpful to have a slightly longer and clearer description of the NN-MLP-MED methodology in this section.
Line 174: “The error of reconstructed nitrate, obtained by using the EMODnet as validation dataset, was 0.5 mmol m−3”. As this figure contrasts with the uncertainty value of 0.87 mmol m−3 given in the previous section, a little more context would be useful. For instance, introduce the EMODnet dataset (that hasn’t been done yet), state that the NN-MLP-MED method was trained on 80% of the EMODnet data, then had an RMSE of 0.5 mmol m−3 when tested against the remaining 20% of EMODnet data, and an RMSE of 0.87 mmol m−3 when the methodology is applied to BGC-Argo data that is not in EMODnet (if I have interpreted Pietropolli et al. (2023) correctly).
Line 180: It would be useful to put the information about added reconstructed profiles into context. As a suggestion, that could be in the form of stating for each aggregated region or the sub-regions how much reconstructed data is added. Having this information about added data per region may be useful in later sections e.g. when looking at RMSE changes between the DA runs, to enable linking the change in coverage to a change in RMSE (or highlighting where this does not link for any reason).
Line 184: “Adjusted and delayed mode data were selected for oxygen and chlorophyll, while exclusively DM data were considered for nitrate.” - A sentence or two explaining the reasons for these choices would be useful. In particular, what level of drift correction for oxygen has been done in these data sets?
Fig. 2: “of chlorophyll-a (red), Nitrate in-situ (orange) and reconstructed Nitrate (grey)” - this may just be a matter of perception, but the colours used don’t look like red/orange/grey.
Line 196: I may not understand the approach, but what happens if a float lives less than a year, which is when the largest drift occurs (Line 193)? Will the drift correction be applied to t0, or not because this is for operational purposes?
Line 197: Please give more details about the splitting into “inliers and outliers”.
Line 201: If drift is expected to linearly increase with depth, why use the average drift between 600 m and 800 m, rather than just the drift at 600 m? This may be reasonable (we’re not experts on oxygen sensor drift), but it’s not clear from the explanation.
Line 207: “Marine Copernicus Service” - “Copernicus Marine Service”
Line 208: “initial conditions from EMODnet dataset (Details are provided in Salon et al. 2019 ).” - does this include the same spin-up procedure as in Salon et al.? That should be detailed.
Line 216-222: This paragraph needs to be clearer, especially around the oxygen saturation procedure.
Line 223-227: How were these thresholds arrived at?
Fig. 3: What does the horizontal line at 600 m represent?
Line 243: “After removing of drift, the deep oxygen concentrations results to be closer to the EMODnet climatological data, allowing to include a higher number of profiles” - does this mean that in the absence of the drift correction the profiles would be expected to fail QC checks and be excluded, rather than the uncorrected profiles being assimilated?
Line 246-247: “While for the satellite comparison the model daily averages are considered, the model first guess (i.e. the model state before the assimilation) is used for metrics based on BGC-Argo.” - This is reasonable given that BGC-Argo is assimilated and ocean colour not, but a clearer reasoning for the decision should be given. Furthermore, is the first guess instantaneous (at midnight? at the observation time?) or an average? Also, it states here that for the satellite comparison the model is a daily average, but two paragraphs later that the observations are a weekly average?
Line 248: RMSE has its place, including here, but could usefully be supplemented by other validation statistics. Furthermore, RMSE is only optimal for Gaussian variables, is this the case for the variables considered? If not, then more robust statistics may be preferable.
Line 250: “the aggregated combination” was not mentioned before. Could be done with the description of Fig. 2.
Line 253 and following: The changes in RMSE should be linked to the change in coverage. From visual inspection, most regions of reduced RMSE are regions of higher pseudo-nitrate (Fig 2), but not all of them e.g. Nwm. Other regions have no (additional) float data yet show changes in the RMSE.
Line 272: “directly ascribed to the increased number…” – this is not clear to me as the Figures do not show how the reconstructed obs are distributed over seasons.
Fig. 5 and associated discussion: it is not at all clear what is displayed in the figure. Absolute values? RMS errors? Percentage RMS errors? Are the x-axis values identical for all variables or have they just been cut off for all except the bottom panel?
Line 281: “Assimilating oxygen profiles enable reducing the model-BGC floats RMSE” - is it possible to know how much this is due to the oxygen assimilation, and how much to the chlorophyll and nitrate assimilation? The lack of impact of reconstructed nitrate is an indicator here, but some further comment would be useful.
Section 3.3.1 may benefit from rewriting for clarity. It is difficult to pick out the key message. As a suggestion (definitely not requirement) you may test describing the BGC differences one region at a time instead of structuring the paragraph by variable. Possibly that improves the understanding.
Line 302: How do you distinguish if a region is still drifting in Fig. 7? To me, ion2 (second column) looks as if it is drifting still, but the differences have smaller magnitude than in Med (third column)
Fig 10: Experiment names on the y axis differ from the main text. Write “npp” instead of “ppn”. I think it would help to include the basin boundaries for orientation. An idea to better visualise the results may be to plot the difference in the subplots for the DA experiments compared to HIND instead of absolute values but that’s not a necessary change.
Line 340: If I understand correctly, the results thus show that nitrate suggests reduced NPP and chlorophyll enhanced NPP? Does that point to a bias in the model or representation of a specific component? (e.g. PFTs) If that’s the case, it may be worth noting in the discussion.
Line 346: “0-300 m”, Figure 11 says 0-600 m in the title
Line 345: When introducing the impact indicator, please add information about how that differs from other statistical metrics such as RMSE or a simple comparison between fields at the end of the simulation. What is the advantage of using this metric?
Line 360: Where does this threshold come from?
Line 368: Do you mean “initial conditions” as in using the analysis to initialise a forecast? If so, that may need clarification because it may be confused with general initial conditions for ocean simulations. For initial conditions in a general sense the QC’d oxygen profiles may not qualify.
Line 377: “threshold on 1mmol/m3” – can you add a value for decadal variability in the sentences before, which puts this threshold into context to illustrate it is indeed a justified choice please.
Line 399: “more than 30 profiles” - what was that before? How much larger is the data availability?
Line 401: “can effectively be constrained” is that referring to previous papers such as observing system simulation experiments? If this is meant as a conclusion from your results, this statement may need more explanation.
Line 409: The decrease in available BGC Argo observations was not mentioned before, but feels like this should be a major motivation of this work (for the introduction)
Line 415 & 443: “feed-forward” - this term is suddenly introduced in the discussion and conclusion when describing the method used, it should be introduced and explained in the methods section.
Line 436: Since ocean colour is not assimilated in this study the statement “should be used in conjunction with…” should have a reference to literature
Citation: https://doi.org/10.5194/egusphere-2023-1588-RC1 - AC1: 'Reply on RC1', Carolina Amadio, 22 Nov 2023
-
RC2: 'Comment on egusphere-2023-1588', Anonymous Referee #2, 09 Oct 2023
Publisher’s note: this comment was edited on 10 October 2023. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
egusphere-2023-1588
Combining Neural Networks and Data Assimilation to enhance the spatial impact of Argo floats in the Copernicus Mediterranean biogeochemical model
General comments:
Scientific significance:
Reconstruction of nutrient information in CANYON-B based ANN system NN-MLP-MED relies on high accuracy of in situ O2 data while Argo on board O2 sensor is known to suffer significant sensor drift. To mitigate the Argo O2 data drift problem, authors have introduced QC O2 module for further calibration of O2 profile data. This novel approach in conducting secondary O2 calibration is a key component of this study.
Since the pioneering work of Ford et al. (2021), impact of sparsity of BGC Argo profile in ocean state estimation or data assimilation is recognised as a clear issue in BGC Argo profile data assimilation study and operational system. OSE experiment with and without NN-enlarged nitrate profile data for assimilation demonstrated that usefulness of machine-learning retrieved nutrient data can improve model representation of surface phytoplankton dynamics at certain conditions. Impact indicator study reveals clear impact of reconstructed nitrate profile assimilation in model BGC state, especially in the upper macronutrients and chlorophyll-a fields.Scientific quality:
Scientific question raised in this study is clear and important one. Current density of BGC Argo floats array does not cover even basin scale ocean circulation which is an original target of CORE Argo float deployment goal. With the advancement of NN-base BGC variable retrieval methods, it is natural to test if such generated data can help us constrain ocean model for state estimation in operational settings. This study indicates positive impact of such data. However, many evaluation procedures to judge detail impact of the NN-derived nitrate data are not designed effectively to achieve its goal.
While scientific contribution of this study is significant, there is clear problem in how it is delivered as journal paper. Overall, many statements are “speculative” or “subjective” for a data assimilation OSE study and many of statements are not supported directly by provided materials. Typical examples are presentation of RMSEs and difference between HIND and DA experiments in Figure 5 and Figures 6-9. Authors asked reader to read these number from figures rather than presenting actual numbers and the figures are generally presented not adequately for the purposes. Authors should make clear differences between what can be concluded and what can be speculated from background knowledge. For example, authors discuss “impact of chlorophyll profile assimilation”, but OSE setting has only Hindcast (HIND), DA w CHL, O2, NO3 profiles (DAfl) and DA w/ CHL, O2, NO3 profiles plus nn-derived NO3 profiles (DAnn). How can you discuss sole impact of chlorophyll assimilation with this OSE setting? Other cases can be found in the comments under “Specific comments”.Presentation quality:
While scientific value of this study is high, its presentation quality is rather poor as demonstrated in the long list of comments under “Technical corrections”. In general, size of figures and fonts are too small in most of figures. Choice of color scheme in Figure 4 and 5 is questionable for people with color vision deficiencies. Please see guideline on the preparation of graphs: https://www.biogeosciences.net/submission.html. There are many editorial issues ranging from simple wording issue from more serious structural issues. Further detail can be found under “Technical comments”.Combining issues raised in “Scientific quality” and “Presentation quality”, I recommend major revision. Comments follow below.
Specific comments:
P2.l48: model tuning (Wang et al., 2020).
Please add more recent references on this topic:
Yumruktepe et al. 2023 https://gmd.copernicus.org/preprints/gmd-2023-25/
Wang and Fennel 2023 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022GL101220P6.l146: “VH is built using a Gaussian filter whose correlation radius modulates the smoothing intensity”
> What is the size of correlation radius in average? This information is important to understand how far BGC Argo profile assimilation leave impact in the analysis.P7.l184: “Adjusted and delayed mode data were selected for oxygen and chlorophyll”.
> Can you describe which QC flag was used for selecting “good” data both for oxygen and chlorophyll?P8.l199: “when all four drift estimates agree in sign”
> Not clear what do you refer here by “four drift estimates”. In P7.l196, it is mentioned that drift is evaluated at two different depths and what are the rest of two estimates? Or the number of four is nothing to do with that?P8.l202-l203: “it can be assumed that O2 values at surface are already fixed by the GDACs“
> As is stated in p2.l41, not every O2 sensor is calibrated in air and air calibration is one of the most important calibration steps to make O2 data trustable. Do you believe O2 values at surface are fixed even for the old non air calibrated sensor data? Or are old sensor data not included in this specific study period, 2017-2018?P9.l223: “profiles can be excluded when model-observation misfit is higher than given thresholds”
> Does is mean some profiles are actually excluded during your DA run or this just describes online data selection system? I also assume “model-observation misfit” means innovation, is it right? Can you also justify the reason behind of this online data elimination procedure?P10.l246-247: “While for the satellite comparison the model daily averages .. the model first guess is used for metrics based on BGC-Argo”
> I believe the choice of these different RMSE metrices between satellite OC data and Argo profiles are based on the experiment settings of Argo profiles being assimilated while satellite OC data are not assimilated. By choosing the first guess state to be compared with not yet assimilated Argo profiles, you can use the Argo profiles as independent data. If it were the case, better to describe so here.P10.l251: “Satellite L3 products from Copernicus Marine Service catalogue ..”
> Usage of this data set requires proper citation. Plus, this sentence is floating without clear connection in 3.2. Does it mean satellite OC RMSE metric is based on this weekly averaged data? If so, does weekly cycle coincide with an analysis cycle? Please make its significance clear.P11.l274-l275: “Here, improvements related to chlorophyll assimilation can be observed in nwm, ion and lev in winter and at depth in tyr, ion and lev in summer (Figure 5 middle panel)”
P11.l278: “the direct chlorophyll assimilation is more effective than ..”
> Since there is no experiment with only assimilating chlorophyll in this study, it is not easy to point out degree of “improvements related to chlorophyll assimilation” and if direct chlorophyll assimilation is more effective than the dynamical model adjustment after nitrate assimilation. You need to provide extra analysis to support these statements.P13. “3.3.1 Impacts on biogeochemical vertical dynamics”.
> There is no description on how figures 6, 7, 8 and 9 are plotted. Are they sub-basin averaged value? “the basin wide averages of DAnn display .. (Figure 6)” at P13.l310 infers these figures are basin-average, but it is never be stated clearly.P13.l297: “Nitracline depth”
> There is no definition of nitracline depth. Please be specific.P13.l296-l297: “decreases by 8% and 11% in DAfl and DAnn runs, respectively”
> Contrary to the clear difference in impact of nitrate assimilation in DAfl and Dann at nwm in Figure 7, RMSE profiles in Figure 5 (especially Summer Nitrate at Nwm) does not show such difference in the two DA experiments. Can you explain why?P13.l299: “eventually reach a stationary phase”
> What does it mean by a stationary phase and how do you measure it?P13.l306: “As a consequence of both the direct assimilation of chlorophyll profiles and the dynamical model adjustment after nitrate assimilation”
> Again, how can you argue a consequence of dynamical model adjustment only from nitrate assimilation with this OSE settings? For example, why would phytoplankton biomass change as a consequence of direct assimilation of chlorophyll not affect chlorophyll concentration in the DCM as a consequence of its dynamical model adjustment? If extra material not provided, this statement is speculative.P13.l313: “oxygen profiles assimilation (DAfl, second row in Figure 9) provides positive or negative corrections”
> As is describe by authors in the subsequent sentences, changes in phytoplankton biomass also change oxygen through primary production and remineralization process as dynamical model adjustment. Thus, assimilation of chlorophyll and nitrate both have a potential to alter oxygen. How can you judge what can be seen in Figure 9 is sole consequence of oxygen assimilation? This sentence contradicts with statements following about impact if reconstructed nitrate profile assimilation in oxygen.P13.l316-l318: “The only noticeable difference ..”
> This is one of the most important findings in this study as an impact of reconstructed Nitrate profile assimilation, but difference between DAfl and DAnn in figure 9 (summer period in NWM) can not be found in RMSE profiles in figure 5 (summer Oxygen in Nwm) and we can not judge if this difference in DAnn against DAfl is improvement or not. Can you explain why?P16 entire section of 3.3.2
> Since difference between DAfl and DAnn is almost impossible to see in Figure 10, readers can not confirm what described in this subsection. Please reevaluate how to present different impact of DA settings in NPP.P16.l339-l341: “In fact … after chlorophyll assimilation”
How can you measure that weak negative correction of macronutrients is the main cause of reduced NPP outweighing the effect due to change in phytoplankton biomass after chlorophyll assimilation? As far as I read, there is no concrete material supporting this statement is provided. Unless extra material provided, this statement is speculative.P17. 3.3.3
> In figure 11, figure title indicates Nitrate Iij(t) is evaluated over 0-600m depth range rather than 0-300m depth specified in equation (2). If it were the case, please specify so. If not, please fix the figure titles in figure 11.P17.l365: “since the same QC oxygen dataset was assimilated in DAfl and DAnn”
> But authors just described in P13.l316-l318 that impact of the reconstructed Nitrate is noticeable in oxygen at least at NWM where density of the reconstructed Nitrate is large. Then it does not make sense that you do not see difference in the two DA experiments. Why do you not see the difference in the Iij 95th percentiles maps for oxygen?P19.l396-l.397: “In this work, important impacts are also observed in summer for all variables, as a consequence of the increased number of assimilated profiles.”
> It is not clear what does it mean by “a consequence of the increased number of assimilated profiles”. Increased number of nitrate profile from DAfl to DAnn? Or about something else? As far as I understand, main reason why we see impact of DA in summer in DA experiments in this study compared to Teruzzi et al. (2021) is because satellite OC can not see DCM while Argo float profiles see the signal by multiple sensors. In that sense, you could see the impact of Argo profile assimilation no matter how small or large number of profiles is. Please reevaluate this statement.P19.l399-l400: “Indeed .. box every 10 days”
> I do not understand which “results” in this study support this statement. Basin coverage rate of BGC-Argo floats equipped with oxygen sensor is simply determined by deployment plan. Or do you like to say that the new O2 QC module prove enough number of O2 profile survives to be ingested to nn module? I read 3.1, but could not get such information. Please be clearer about meaning of this statement.
P19.l401-l406: “while, up to … by a 3D varying correlation radius (Storto et al., 2014)”
This discussion on improvement in meso-scale dynamics look out of topic and I can not see the reason why it is needed to be discussed here. Especially confusing knowing that 2.5 degree by 2.5 degree horizontal resolution in BGC profiles potentially could be achieved by nn with oxygen profile is far below meso-scale resolving resolution of o (50km).P19.l418: “0.50 mmol2 m−3 for nitrate”
> This information should be included in 2.2.P19.l423-l.429: “Indeed … Li et al.(2021)»
MLP base Sauzède et al. (2017) overcame of this issue by adding pressure as input variables in MLP. Why do you believe choosing other NN approach such as 1D CNN is important before using pressure or depth information in MLP-NN-MED?Technical corrections:
P1.l20: “The Array for Real-time Geostrophic Oceanography”
> Please do not use this acronym for Argo. It is not official. There is a historical background why it should never be and I have it on the authority of one of the program founders who was present on the day the Argo project was first conceived: “Argo was named as a companion project to the proposed Jason altimetric satellite missions. The words indicating a putative interpretation of the letters Argo, Array for Real-time Geostrophic Oceanography, were created in a jocose moment while celebrating in a bar afterwards. It would be best to let an idea die whose origin was mediated entirely through the action of alcohol. Argo is not and was never meant to be an acronym. It should be written "Argo" and never as "ARGO".”.P1.l2: “and successfully integrated in”
> “and are successfully integrated in”P7.l182: 2.4 BGC-Argo data and post-deployment oxygen quality control
> I assume subsection 2.4 is about QC-O2 module, but the module name is never referred to in this section but found in the next section, 2.5. Please make it clear that this is about QC-O2.P7.l184: “and delayed mode data were selected for oxygen and chlorophyll, while exclusively DM data were considered”
> Use only one expression among “delayed mode” or “DM”. Not together in this sentence, but the same unification of usage of acronym would be better for sets of “Adjusted/AM” and “Real Time/RT” for the entire this manuscript after RT,AM and DM defined at p2.l32.P8.Figure 2: Coordinate labels font is too small and almost unreadable. Please enlarge its size.
> It is almost impossible to distinguish the three dots in the figure. Please consider using different colors or separate maps for each type of profile.P8.Figure 2: “lev=lev1+lev2+lev3+lev4; ion=ion1+ion2+ion3; tyr=tyr1+tyr2; adr=adr1+adr2; swm=swm1+swm2”
> As far as I can read, this is the only place where aggregated sub- or macro- basins are defined. This should be properly defined in a table as suggested below. It is also recommended to use either “sub-” or “macro-” for minimizing confusion.P9.l217-l220. “Finally, oxygen …”
> I can guess, this long sentence is hard to understand. Needs reorganization with shorter sentences.P9.l229-l.230: “The oxygen post-deployment quality check method” and “The post deployment oxygen QC method”
> I assume again, the QC method is referring QC O2 module described in 2.4 or not? If it were the case, please specify so explicitly. Plus, two different ways to refer the QC O2 module at the title of 3.1 and body of 3.1 is strange.P10.l248: “is evaluated in winter (from February to April, FMA) and summer (from June to
August, JJA)”
> Since your experiment period is two years from Jan 2017 to Dec 2018, do you use both 2017 and 2018 results for this evaluation?P10.l255. “the eastern sub-basins”
> Please define which sub-basins (lev1, lev2,…etc) are included in the definition of the eastern sub-basins.P8.Figure 2 caption: “lev=lev1+lev2+lev3+lev4; ion=ion1+ion2+ion3; tyr=tyr1+tyr2; adr=adr1+adr2; swm=swm1+swm2”
P10.l257: “alb, swm and nwm”
P11.l263: ” Alboran, South West Mediterranean, North West Mediterranean, Tyrrhenian, Ionian and Levantine Seas”
P11.l271: ”is observed in nwm and tyr (winter) and in ion (summer).”
P11.l275: “in nwm, ion and lev in winter and at depth in tyr, ion and lev in summer”
> Association of long and short names of each sub-basin such as Alboran (alb), South West Mediterranean (swm) etc. is never clearly defined in this article. Please do in section 2.3 or add extra table to do so.P11.Figue 4.
> Figures are too small that it is hard to distinguish three bars at each domain. Please use larger size of figures.P12.Figure 5.
> Figures are too small that it is hard to distinguish three profiles especially between DAfl and DAnn. Plus, many figures do not have x axis labels. Please use larger size of figures or reconsider different way of presentation such as scatter plots and tables of RMSEs at selected depths.P13.l288: “two sub-basins”
> Please specify names of “two sub-basins” here before referring Figure 2.Figure 6, 7, 8, 9
> Figures and font sizes are too small. Be more specific about definitions of the second row and the third row in figure caption.P16. Figure 10. Name of experiments, HIND, DAfi and DAnn are Hind, DaIns and Dasyn in y axis label in the figures. It is confusing.
P17.l348: “Here, HIND is here the reference”
> “Here, HIND is the reference”P19.l367-l.389: Five paragraphs about oxygen QC.
> This information do not fit to “Discussion”, but rather should be integrated to 2.4.P20.l408: “BGC-Argo OS”
> Please define meaning of OS. Observing system?Citation: https://doi.org/10.5194/egusphere-2023-1588-RC2 - AC2: 'Reply on RC2', Carolina Amadio, 22 Nov 2023
-
EC1: 'Comment on egusphere-2023-1588', Julien Brajard, 10 Oct 2023
The work presented in this paper is of interest to the community, as outlined by the two Reviewers. Nevertheless, as it is noted by the reviewers, the paper needs to be clarified, and the main message more clearly conveyed. I hope that all the comments and suggestions by the reviewers will help to provide an improved revised version.
Other comments:
Section 2.3 I agree with Reviewer 1 that details about the neural net approach are missing. Especially the sentence "incorporating nonlinear
functions, adjusting neuron count, and optimizing the training algorithm" needs to be expanded, since we could wrongly understand that the Fourier et al. approach does not incorporate nonlinear functions (while in reality, they use the nonlinear sigmoid function)L246 "the model first guess" does it correspond to the background?
About the assimilation: how frequent is the assimilation update? Is it 10 days?
About the validation: Can you comment a bit on the choice of using the RMSE between BGC-Argo profile and model first guess as a validation. Since a previous measurement of a BGC-Argo profile was already assimilated, can a new measurement be considered independent? It could be interesting to have a quick discussion about the lagrangian autocorrelation...
Citation: https://doi.org/10.5194/egusphere-2023-1588-EC1 - AC3: 'Reply on EC1', Carolina Amadio, 22 Nov 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1588', Anonymous Referee #1, 19 Sep 2023
Please note that this is a co-review by an early-career and mid-career scientist.
This paper details new processing of BGC-Argo data in the Mediterranean Sea, including oxygen sensor drift correction and the use of a neural network to reconstruct nitrate from other measured variables. An existing data assimilation scheme, previously used to assimilate chlorophyll and (non-reconstructed) nitrate from BGC-Argo, is extended to also assimilate reconstructed nitrate and measured oxygen profiles. Test runs demonstrate a positive impact on model analyses of assimilating these new variables.
The study is novel, of interest to the community, and within scope for Ocean Science. The study is generally well-conceived and well-presented, but there are aspects which should be more clearly explained.
In many places, the manuscript is hard to follow and would benefit from being made clearer. Some specific examples are given in the comments below, but are not exhaustive. As a general example, the use of passive voice, in particular in the methods section, makes it challenging in some parts to distinguish your work from previous studies. As another example, in the introduction the topics that should be introduced are introduced, but the text lacks flow and links between the topics, and so does not lead to the question you are addressing.
The manuscript would also benefit from English language copy editing, but we believe the journal offers this service as standard, so will not list such technical corrections as part of this review.
The paper aims to “address availability gaps” (Line 4) but this objective is not clear throughout the paper. The introduction does not clarify how, where, or why the data gaps affect the analysis. Results of the two model runs with and without reconstructed observations clearly show differences, but these are not always linked to the change in coverage. Also, the abstract and discussion conclude with a note about Argo data being complementary to satellite ocean colour assimilation, but this study does not show that.
Line 20: “Array for Real-time Geostrophic Oceanography” - this acronym form does not appear to be widely or currently used, suggest just using the name “Argo”.
Line 29: “approximately 270,000 profiles worldwide until now” - better to put “as of [date]” rather than “until now”.
Line 39: “By improving the accuracy…” is the result of the QC and would therefore fit better at the end of the sentence to increase clarity.
Line 45: “encouraged” replace with “is necessary”
Line 46: “optode” was not mentioned before
Line 47: Suggestion to link the topics for better flow of the text: say that purely observation-based Argo studies are regional, and using data assimilation has the potential to create a synthesis
Line 50: “DA underpins …” - suggest rephrasing this sentence slightly to articulate more clearly the aims and principles of DA.
Line 54: “NN algorithms” - “NN” hasn’t been defined yet in the main text. Need to add at least a couple of sentences introducing neural networks at the start of this paragraph.
Line 54: “match specific DA tasks” - a better phrasing might be something like “have the potential to perform specific tasks related to observation processing and DA”. The discussion of the following studies could be made clearer too, as well as stating that it is the method of Pietropolli et al. (2023) that is used in this study.
Line 58: May mention here or later that these examples are timeseries of chlorophyll, while the use of reconstructed nitrate is novel
Line 72: May be worth stating what the first release included for a full account of the developments
Line 81: Not clear at this point what “sequential modular approach” means
Line 87-100: The paragraph about the MedSea oceanography feels out of place and may be covered at the beginning of the first results section.
Line 114 & Fig. 1: the flow of information between 3DVarBio and OGSTM-BFM is implied to be one-way, but presumably it’s two-way, with OGSTM-BFM fields also an input to 3DVarBio? Also, “3DVarBio” is used in the text, but “3D-VarBio” in the figure (and on line 116) - these should be consistent.
Line 150: “preserve optimal values” - a better wording would be “preserve existing values” or “preserve background values”, there’s no guarantee they’re optimal.
Line 152: “spurious assimilation” - please be more specific. “spurious correlations”?
Line 154: “it barely affects other variables” - is it known how model-dependent this finding is? Since the models used here and in Skákala et al. (2022) are very similar, this is a reasonable approach to take here, but it could be worth clarifying that this lack of effect on other variables is in the model, not necessarily the real world.
Line 161: “we decided to not use different values of error for the two nitrate subsets in order to show the highest potential impact of the OSE.” A caveat needs adding either here or in the discussion that as a result of this decision, the assimilation may be non-optimal in terms of fitting the true state (as opposed to just fitting the observations). The same could be said about the lack of accounting for representation error.
Line 163-4: Is there a reference for the oxygen observation error values used? If not, please state how these values were chosen.
Section 2.3: While it is fine to refer the reader to Pietropolli et al. (2023) for details, it would be helpful to have a slightly longer and clearer description of the NN-MLP-MED methodology in this section.
Line 174: “The error of reconstructed nitrate, obtained by using the EMODnet as validation dataset, was 0.5 mmol m−3”. As this figure contrasts with the uncertainty value of 0.87 mmol m−3 given in the previous section, a little more context would be useful. For instance, introduce the EMODnet dataset (that hasn’t been done yet), state that the NN-MLP-MED method was trained on 80% of the EMODnet data, then had an RMSE of 0.5 mmol m−3 when tested against the remaining 20% of EMODnet data, and an RMSE of 0.87 mmol m−3 when the methodology is applied to BGC-Argo data that is not in EMODnet (if I have interpreted Pietropolli et al. (2023) correctly).
Line 180: It would be useful to put the information about added reconstructed profiles into context. As a suggestion, that could be in the form of stating for each aggregated region or the sub-regions how much reconstructed data is added. Having this information about added data per region may be useful in later sections e.g. when looking at RMSE changes between the DA runs, to enable linking the change in coverage to a change in RMSE (or highlighting where this does not link for any reason).
Line 184: “Adjusted and delayed mode data were selected for oxygen and chlorophyll, while exclusively DM data were considered for nitrate.” - A sentence or two explaining the reasons for these choices would be useful. In particular, what level of drift correction for oxygen has been done in these data sets?
Fig. 2: “of chlorophyll-a (red), Nitrate in-situ (orange) and reconstructed Nitrate (grey)” - this may just be a matter of perception, but the colours used don’t look like red/orange/grey.
Line 196: I may not understand the approach, but what happens if a float lives less than a year, which is when the largest drift occurs (Line 193)? Will the drift correction be applied to t0, or not because this is for operational purposes?
Line 197: Please give more details about the splitting into “inliers and outliers”.
Line 201: If drift is expected to linearly increase with depth, why use the average drift between 600 m and 800 m, rather than just the drift at 600 m? This may be reasonable (we’re not experts on oxygen sensor drift), but it’s not clear from the explanation.
Line 207: “Marine Copernicus Service” - “Copernicus Marine Service”
Line 208: “initial conditions from EMODnet dataset (Details are provided in Salon et al. 2019 ).” - does this include the same spin-up procedure as in Salon et al.? That should be detailed.
Line 216-222: This paragraph needs to be clearer, especially around the oxygen saturation procedure.
Line 223-227: How were these thresholds arrived at?
Fig. 3: What does the horizontal line at 600 m represent?
Line 243: “After removing of drift, the deep oxygen concentrations results to be closer to the EMODnet climatological data, allowing to include a higher number of profiles” - does this mean that in the absence of the drift correction the profiles would be expected to fail QC checks and be excluded, rather than the uncorrected profiles being assimilated?
Line 246-247: “While for the satellite comparison the model daily averages are considered, the model first guess (i.e. the model state before the assimilation) is used for metrics based on BGC-Argo.” - This is reasonable given that BGC-Argo is assimilated and ocean colour not, but a clearer reasoning for the decision should be given. Furthermore, is the first guess instantaneous (at midnight? at the observation time?) or an average? Also, it states here that for the satellite comparison the model is a daily average, but two paragraphs later that the observations are a weekly average?
Line 248: RMSE has its place, including here, but could usefully be supplemented by other validation statistics. Furthermore, RMSE is only optimal for Gaussian variables, is this the case for the variables considered? If not, then more robust statistics may be preferable.
Line 250: “the aggregated combination” was not mentioned before. Could be done with the description of Fig. 2.
Line 253 and following: The changes in RMSE should be linked to the change in coverage. From visual inspection, most regions of reduced RMSE are regions of higher pseudo-nitrate (Fig 2), but not all of them e.g. Nwm. Other regions have no (additional) float data yet show changes in the RMSE.
Line 272: “directly ascribed to the increased number…” – this is not clear to me as the Figures do not show how the reconstructed obs are distributed over seasons.
Fig. 5 and associated discussion: it is not at all clear what is displayed in the figure. Absolute values? RMS errors? Percentage RMS errors? Are the x-axis values identical for all variables or have they just been cut off for all except the bottom panel?
Line 281: “Assimilating oxygen profiles enable reducing the model-BGC floats RMSE” - is it possible to know how much this is due to the oxygen assimilation, and how much to the chlorophyll and nitrate assimilation? The lack of impact of reconstructed nitrate is an indicator here, but some further comment would be useful.
Section 3.3.1 may benefit from rewriting for clarity. It is difficult to pick out the key message. As a suggestion (definitely not requirement) you may test describing the BGC differences one region at a time instead of structuring the paragraph by variable. Possibly that improves the understanding.
Line 302: How do you distinguish if a region is still drifting in Fig. 7? To me, ion2 (second column) looks as if it is drifting still, but the differences have smaller magnitude than in Med (third column)
Fig 10: Experiment names on the y axis differ from the main text. Write “npp” instead of “ppn”. I think it would help to include the basin boundaries for orientation. An idea to better visualise the results may be to plot the difference in the subplots for the DA experiments compared to HIND instead of absolute values but that’s not a necessary change.
Line 340: If I understand correctly, the results thus show that nitrate suggests reduced NPP and chlorophyll enhanced NPP? Does that point to a bias in the model or representation of a specific component? (e.g. PFTs) If that’s the case, it may be worth noting in the discussion.
Line 346: “0-300 m”, Figure 11 says 0-600 m in the title
Line 345: When introducing the impact indicator, please add information about how that differs from other statistical metrics such as RMSE or a simple comparison between fields at the end of the simulation. What is the advantage of using this metric?
Line 360: Where does this threshold come from?
Line 368: Do you mean “initial conditions” as in using the analysis to initialise a forecast? If so, that may need clarification because it may be confused with general initial conditions for ocean simulations. For initial conditions in a general sense the QC’d oxygen profiles may not qualify.
Line 377: “threshold on 1mmol/m3” – can you add a value for decadal variability in the sentences before, which puts this threshold into context to illustrate it is indeed a justified choice please.
Line 399: “more than 30 profiles” - what was that before? How much larger is the data availability?
Line 401: “can effectively be constrained” is that referring to previous papers such as observing system simulation experiments? If this is meant as a conclusion from your results, this statement may need more explanation.
Line 409: The decrease in available BGC Argo observations was not mentioned before, but feels like this should be a major motivation of this work (for the introduction)
Line 415 & 443: “feed-forward” - this term is suddenly introduced in the discussion and conclusion when describing the method used, it should be introduced and explained in the methods section.
Line 436: Since ocean colour is not assimilated in this study the statement “should be used in conjunction with…” should have a reference to literature
Citation: https://doi.org/10.5194/egusphere-2023-1588-RC1 - AC1: 'Reply on RC1', Carolina Amadio, 22 Nov 2023
-
RC2: 'Comment on egusphere-2023-1588', Anonymous Referee #2, 09 Oct 2023
Publisher’s note: this comment was edited on 10 October 2023. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
egusphere-2023-1588
Combining Neural Networks and Data Assimilation to enhance the spatial impact of Argo floats in the Copernicus Mediterranean biogeochemical model
General comments:
Scientific significance:
Reconstruction of nutrient information in CANYON-B based ANN system NN-MLP-MED relies on high accuracy of in situ O2 data while Argo on board O2 sensor is known to suffer significant sensor drift. To mitigate the Argo O2 data drift problem, authors have introduced QC O2 module for further calibration of O2 profile data. This novel approach in conducting secondary O2 calibration is a key component of this study.
Since the pioneering work of Ford et al. (2021), impact of sparsity of BGC Argo profile in ocean state estimation or data assimilation is recognised as a clear issue in BGC Argo profile data assimilation study and operational system. OSE experiment with and without NN-enlarged nitrate profile data for assimilation demonstrated that usefulness of machine-learning retrieved nutrient data can improve model representation of surface phytoplankton dynamics at certain conditions. Impact indicator study reveals clear impact of reconstructed nitrate profile assimilation in model BGC state, especially in the upper macronutrients and chlorophyll-a fields.Scientific quality:
Scientific question raised in this study is clear and important one. Current density of BGC Argo floats array does not cover even basin scale ocean circulation which is an original target of CORE Argo float deployment goal. With the advancement of NN-base BGC variable retrieval methods, it is natural to test if such generated data can help us constrain ocean model for state estimation in operational settings. This study indicates positive impact of such data. However, many evaluation procedures to judge detail impact of the NN-derived nitrate data are not designed effectively to achieve its goal.
While scientific contribution of this study is significant, there is clear problem in how it is delivered as journal paper. Overall, many statements are “speculative” or “subjective” for a data assimilation OSE study and many of statements are not supported directly by provided materials. Typical examples are presentation of RMSEs and difference between HIND and DA experiments in Figure 5 and Figures 6-9. Authors asked reader to read these number from figures rather than presenting actual numbers and the figures are generally presented not adequately for the purposes. Authors should make clear differences between what can be concluded and what can be speculated from background knowledge. For example, authors discuss “impact of chlorophyll profile assimilation”, but OSE setting has only Hindcast (HIND), DA w CHL, O2, NO3 profiles (DAfl) and DA w/ CHL, O2, NO3 profiles plus nn-derived NO3 profiles (DAnn). How can you discuss sole impact of chlorophyll assimilation with this OSE setting? Other cases can be found in the comments under “Specific comments”.Presentation quality:
While scientific value of this study is high, its presentation quality is rather poor as demonstrated in the long list of comments under “Technical corrections”. In general, size of figures and fonts are too small in most of figures. Choice of color scheme in Figure 4 and 5 is questionable for people with color vision deficiencies. Please see guideline on the preparation of graphs: https://www.biogeosciences.net/submission.html. There are many editorial issues ranging from simple wording issue from more serious structural issues. Further detail can be found under “Technical comments”.Combining issues raised in “Scientific quality” and “Presentation quality”, I recommend major revision. Comments follow below.
Specific comments:
P2.l48: model tuning (Wang et al., 2020).
Please add more recent references on this topic:
Yumruktepe et al. 2023 https://gmd.copernicus.org/preprints/gmd-2023-25/
Wang and Fennel 2023 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022GL101220P6.l146: “VH is built using a Gaussian filter whose correlation radius modulates the smoothing intensity”
> What is the size of correlation radius in average? This information is important to understand how far BGC Argo profile assimilation leave impact in the analysis.P7.l184: “Adjusted and delayed mode data were selected for oxygen and chlorophyll”.
> Can you describe which QC flag was used for selecting “good” data both for oxygen and chlorophyll?P8.l199: “when all four drift estimates agree in sign”
> Not clear what do you refer here by “four drift estimates”. In P7.l196, it is mentioned that drift is evaluated at two different depths and what are the rest of two estimates? Or the number of four is nothing to do with that?P8.l202-l203: “it can be assumed that O2 values at surface are already fixed by the GDACs“
> As is stated in p2.l41, not every O2 sensor is calibrated in air and air calibration is one of the most important calibration steps to make O2 data trustable. Do you believe O2 values at surface are fixed even for the old non air calibrated sensor data? Or are old sensor data not included in this specific study period, 2017-2018?P9.l223: “profiles can be excluded when model-observation misfit is higher than given thresholds”
> Does is mean some profiles are actually excluded during your DA run or this just describes online data selection system? I also assume “model-observation misfit” means innovation, is it right? Can you also justify the reason behind of this online data elimination procedure?P10.l246-247: “While for the satellite comparison the model daily averages .. the model first guess is used for metrics based on BGC-Argo”
> I believe the choice of these different RMSE metrices between satellite OC data and Argo profiles are based on the experiment settings of Argo profiles being assimilated while satellite OC data are not assimilated. By choosing the first guess state to be compared with not yet assimilated Argo profiles, you can use the Argo profiles as independent data. If it were the case, better to describe so here.P10.l251: “Satellite L3 products from Copernicus Marine Service catalogue ..”
> Usage of this data set requires proper citation. Plus, this sentence is floating without clear connection in 3.2. Does it mean satellite OC RMSE metric is based on this weekly averaged data? If so, does weekly cycle coincide with an analysis cycle? Please make its significance clear.P11.l274-l275: “Here, improvements related to chlorophyll assimilation can be observed in nwm, ion and lev in winter and at depth in tyr, ion and lev in summer (Figure 5 middle panel)”
P11.l278: “the direct chlorophyll assimilation is more effective than ..”
> Since there is no experiment with only assimilating chlorophyll in this study, it is not easy to point out degree of “improvements related to chlorophyll assimilation” and if direct chlorophyll assimilation is more effective than the dynamical model adjustment after nitrate assimilation. You need to provide extra analysis to support these statements.P13. “3.3.1 Impacts on biogeochemical vertical dynamics”.
> There is no description on how figures 6, 7, 8 and 9 are plotted. Are they sub-basin averaged value? “the basin wide averages of DAnn display .. (Figure 6)” at P13.l310 infers these figures are basin-average, but it is never be stated clearly.P13.l297: “Nitracline depth”
> There is no definition of nitracline depth. Please be specific.P13.l296-l297: “decreases by 8% and 11% in DAfl and DAnn runs, respectively”
> Contrary to the clear difference in impact of nitrate assimilation in DAfl and Dann at nwm in Figure 7, RMSE profiles in Figure 5 (especially Summer Nitrate at Nwm) does not show such difference in the two DA experiments. Can you explain why?P13.l299: “eventually reach a stationary phase”
> What does it mean by a stationary phase and how do you measure it?P13.l306: “As a consequence of both the direct assimilation of chlorophyll profiles and the dynamical model adjustment after nitrate assimilation”
> Again, how can you argue a consequence of dynamical model adjustment only from nitrate assimilation with this OSE settings? For example, why would phytoplankton biomass change as a consequence of direct assimilation of chlorophyll not affect chlorophyll concentration in the DCM as a consequence of its dynamical model adjustment? If extra material not provided, this statement is speculative.P13.l313: “oxygen profiles assimilation (DAfl, second row in Figure 9) provides positive or negative corrections”
> As is describe by authors in the subsequent sentences, changes in phytoplankton biomass also change oxygen through primary production and remineralization process as dynamical model adjustment. Thus, assimilation of chlorophyll and nitrate both have a potential to alter oxygen. How can you judge what can be seen in Figure 9 is sole consequence of oxygen assimilation? This sentence contradicts with statements following about impact if reconstructed nitrate profile assimilation in oxygen.P13.l316-l318: “The only noticeable difference ..”
> This is one of the most important findings in this study as an impact of reconstructed Nitrate profile assimilation, but difference between DAfl and DAnn in figure 9 (summer period in NWM) can not be found in RMSE profiles in figure 5 (summer Oxygen in Nwm) and we can not judge if this difference in DAnn against DAfl is improvement or not. Can you explain why?P16 entire section of 3.3.2
> Since difference between DAfl and DAnn is almost impossible to see in Figure 10, readers can not confirm what described in this subsection. Please reevaluate how to present different impact of DA settings in NPP.P16.l339-l341: “In fact … after chlorophyll assimilation”
How can you measure that weak negative correction of macronutrients is the main cause of reduced NPP outweighing the effect due to change in phytoplankton biomass after chlorophyll assimilation? As far as I read, there is no concrete material supporting this statement is provided. Unless extra material provided, this statement is speculative.P17. 3.3.3
> In figure 11, figure title indicates Nitrate Iij(t) is evaluated over 0-600m depth range rather than 0-300m depth specified in equation (2). If it were the case, please specify so. If not, please fix the figure titles in figure 11.P17.l365: “since the same QC oxygen dataset was assimilated in DAfl and DAnn”
> But authors just described in P13.l316-l318 that impact of the reconstructed Nitrate is noticeable in oxygen at least at NWM where density of the reconstructed Nitrate is large. Then it does not make sense that you do not see difference in the two DA experiments. Why do you not see the difference in the Iij 95th percentiles maps for oxygen?P19.l396-l.397: “In this work, important impacts are also observed in summer for all variables, as a consequence of the increased number of assimilated profiles.”
> It is not clear what does it mean by “a consequence of the increased number of assimilated profiles”. Increased number of nitrate profile from DAfl to DAnn? Or about something else? As far as I understand, main reason why we see impact of DA in summer in DA experiments in this study compared to Teruzzi et al. (2021) is because satellite OC can not see DCM while Argo float profiles see the signal by multiple sensors. In that sense, you could see the impact of Argo profile assimilation no matter how small or large number of profiles is. Please reevaluate this statement.P19.l399-l400: “Indeed .. box every 10 days”
> I do not understand which “results” in this study support this statement. Basin coverage rate of BGC-Argo floats equipped with oxygen sensor is simply determined by deployment plan. Or do you like to say that the new O2 QC module prove enough number of O2 profile survives to be ingested to nn module? I read 3.1, but could not get such information. Please be clearer about meaning of this statement.
P19.l401-l406: “while, up to … by a 3D varying correlation radius (Storto et al., 2014)”
This discussion on improvement in meso-scale dynamics look out of topic and I can not see the reason why it is needed to be discussed here. Especially confusing knowing that 2.5 degree by 2.5 degree horizontal resolution in BGC profiles potentially could be achieved by nn with oxygen profile is far below meso-scale resolving resolution of o (50km).P19.l418: “0.50 mmol2 m−3 for nitrate”
> This information should be included in 2.2.P19.l423-l.429: “Indeed … Li et al.(2021)»
MLP base Sauzède et al. (2017) overcame of this issue by adding pressure as input variables in MLP. Why do you believe choosing other NN approach such as 1D CNN is important before using pressure or depth information in MLP-NN-MED?Technical corrections:
P1.l20: “The Array for Real-time Geostrophic Oceanography”
> Please do not use this acronym for Argo. It is not official. There is a historical background why it should never be and I have it on the authority of one of the program founders who was present on the day the Argo project was first conceived: “Argo was named as a companion project to the proposed Jason altimetric satellite missions. The words indicating a putative interpretation of the letters Argo, Array for Real-time Geostrophic Oceanography, were created in a jocose moment while celebrating in a bar afterwards. It would be best to let an idea die whose origin was mediated entirely through the action of alcohol. Argo is not and was never meant to be an acronym. It should be written "Argo" and never as "ARGO".”.P1.l2: “and successfully integrated in”
> “and are successfully integrated in”P7.l182: 2.4 BGC-Argo data and post-deployment oxygen quality control
> I assume subsection 2.4 is about QC-O2 module, but the module name is never referred to in this section but found in the next section, 2.5. Please make it clear that this is about QC-O2.P7.l184: “and delayed mode data were selected for oxygen and chlorophyll, while exclusively DM data were considered”
> Use only one expression among “delayed mode” or “DM”. Not together in this sentence, but the same unification of usage of acronym would be better for sets of “Adjusted/AM” and “Real Time/RT” for the entire this manuscript after RT,AM and DM defined at p2.l32.P8.Figure 2: Coordinate labels font is too small and almost unreadable. Please enlarge its size.
> It is almost impossible to distinguish the three dots in the figure. Please consider using different colors or separate maps for each type of profile.P8.Figure 2: “lev=lev1+lev2+lev3+lev4; ion=ion1+ion2+ion3; tyr=tyr1+tyr2; adr=adr1+adr2; swm=swm1+swm2”
> As far as I can read, this is the only place where aggregated sub- or macro- basins are defined. This should be properly defined in a table as suggested below. It is also recommended to use either “sub-” or “macro-” for minimizing confusion.P9.l217-l220. “Finally, oxygen …”
> I can guess, this long sentence is hard to understand. Needs reorganization with shorter sentences.P9.l229-l.230: “The oxygen post-deployment quality check method” and “The post deployment oxygen QC method”
> I assume again, the QC method is referring QC O2 module described in 2.4 or not? If it were the case, please specify so explicitly. Plus, two different ways to refer the QC O2 module at the title of 3.1 and body of 3.1 is strange.P10.l248: “is evaluated in winter (from February to April, FMA) and summer (from June to
August, JJA)”
> Since your experiment period is two years from Jan 2017 to Dec 2018, do you use both 2017 and 2018 results for this evaluation?P10.l255. “the eastern sub-basins”
> Please define which sub-basins (lev1, lev2,…etc) are included in the definition of the eastern sub-basins.P8.Figure 2 caption: “lev=lev1+lev2+lev3+lev4; ion=ion1+ion2+ion3; tyr=tyr1+tyr2; adr=adr1+adr2; swm=swm1+swm2”
P10.l257: “alb, swm and nwm”
P11.l263: ” Alboran, South West Mediterranean, North West Mediterranean, Tyrrhenian, Ionian and Levantine Seas”
P11.l271: ”is observed in nwm and tyr (winter) and in ion (summer).”
P11.l275: “in nwm, ion and lev in winter and at depth in tyr, ion and lev in summer”
> Association of long and short names of each sub-basin such as Alboran (alb), South West Mediterranean (swm) etc. is never clearly defined in this article. Please do in section 2.3 or add extra table to do so.P11.Figue 4.
> Figures are too small that it is hard to distinguish three bars at each domain. Please use larger size of figures.P12.Figure 5.
> Figures are too small that it is hard to distinguish three profiles especially between DAfl and DAnn. Plus, many figures do not have x axis labels. Please use larger size of figures or reconsider different way of presentation such as scatter plots and tables of RMSEs at selected depths.P13.l288: “two sub-basins”
> Please specify names of “two sub-basins” here before referring Figure 2.Figure 6, 7, 8, 9
> Figures and font sizes are too small. Be more specific about definitions of the second row and the third row in figure caption.P16. Figure 10. Name of experiments, HIND, DAfi and DAnn are Hind, DaIns and Dasyn in y axis label in the figures. It is confusing.
P17.l348: “Here, HIND is here the reference”
> “Here, HIND is the reference”P19.l367-l.389: Five paragraphs about oxygen QC.
> This information do not fit to “Discussion”, but rather should be integrated to 2.4.P20.l408: “BGC-Argo OS”
> Please define meaning of OS. Observing system?Citation: https://doi.org/10.5194/egusphere-2023-1588-RC2 - AC2: 'Reply on RC2', Carolina Amadio, 22 Nov 2023
-
EC1: 'Comment on egusphere-2023-1588', Julien Brajard, 10 Oct 2023
The work presented in this paper is of interest to the community, as outlined by the two Reviewers. Nevertheless, as it is noted by the reviewers, the paper needs to be clarified, and the main message more clearly conveyed. I hope that all the comments and suggestions by the reviewers will help to provide an improved revised version.
Other comments:
Section 2.3 I agree with Reviewer 1 that details about the neural net approach are missing. Especially the sentence "incorporating nonlinear
functions, adjusting neuron count, and optimizing the training algorithm" needs to be expanded, since we could wrongly understand that the Fourier et al. approach does not incorporate nonlinear functions (while in reality, they use the nonlinear sigmoid function)L246 "the model first guess" does it correspond to the background?
About the assimilation: how frequent is the assimilation update? Is it 10 days?
About the validation: Can you comment a bit on the choice of using the RMSE between BGC-Argo profile and model first guess as a validation. Since a previous measurement of a BGC-Argo profile was already assimilated, can a new measurement be considered independent? It could be interesting to have a quick discussion about the lagrangian autocorrelation...
Citation: https://doi.org/10.5194/egusphere-2023-1588-EC1 - AC3: 'Reply on EC1', Carolina Amadio, 22 Nov 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
435 | 197 | 32 | 664 | 25 | 23 |
- HTML: 435
- PDF: 197
- XML: 32
- Total: 664
- BibTeX: 25
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Anna Teruzzi
Gloria Pietropolli
Luca Manzoni
Gianluca Coidessa
Gianpiero Cossarini
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7638 KB) - Metadata XML