Using deep learning to assimilate sun-induced fluorescence satellite observations in the ISBA land surface model
Abstract. Accurate representations of the surface and vegetation are critical for simulating the terrestrial CO2 cycle in response to climate and meteorological conditions. To meet this challenge, an increasing number of satellite missions are being launched which can monitor vegetation conditions and biomass. One is the Copernicus Sentinel-5P mission, which carries the TROPOMI instrument and retrieves Solar-Induced Chlorophyll Fluorescence (SIF). As an indicator of plant photosynthetic activity, SIF can provide critical information for evaluating and parameterising gross carbon flux dynamics in surface models. This study aims to assimilate TROPOMI SIF data into the ISBA (Interactions between Soil, Biosphere and Atmosphere) land surface model developed by Meteo-France, with the objective of directly correcting the representation of Leaf Area Index (LAI) and Gross Primary Production (GPP). To achieve this, we have developed a dedicated observation operator that links the modelled LAI to the TROPOMI SIF daily product. This neural network operator was developed using deep learning and was trained using observations over Europe. This operator achieved good accuracy, was implemented in a land data assimilation system (LDAS), and was used to assimilate TROPOMI SIF in ISBA using a sequential simplified extended Kalman filter. Specific experiments were conducted to study the assimilation process over the Ebro basin in Spain. This area is known for its irrigated croplands, which are not well represented by ISBA. Some experiments assimilate TROPOMI SIF, some assimilate a Copernicus Land Monitoring Service (CLMS) LAI 10-day product, and some assimilate both. This provided us with a useful point of reference for improving the vegetation simulation. Following SIF assimilation, LAI representation improved across the domain, highlighting heavily irrigated croplands. The gross primary production (GPP) derived from the analysis is closing the gap between the simulated and observed values, though a significant difference remains. When compared with other assimilation experiments, assimilating SIF alone provides a similar benefit to standard assimilation of a CLMS LAI product on LAI and GPP. The best improvements to the LAI and GPP results come from co-assimilating TROPOMI SIF with the CLMS LAI product, which combines the advantages of high-frequency SIF observations and robust 10-day LAI assimilation.
In this article, the authors propose to use a NN emulator as observation operator in a data assimilation framework to assimilate SIF (sun-induced fluorescence) observations in a land surface model. The general idea is that a physical model of the observation process would be computationally prohibitive. The NN is first trained using a dataset of satellite observations, and then successfully used in data assimilation experiments. As acknowledged by the authors, the method is not new and was adapted from previous work. Nevertheless, it provides a new application that would make an interesting addition to the literature.
I am therefore overall positive about the manuscript. That being said, the manuscript is not always very clear, especially for non-experts in land surface modelling. In many cases the formulation could be improved. Some details are missing here and there about the experiments (hopefully I spotted and listed most of them). Finally, in a few occasions the method is borderline, in particular when the assimilation experiment is evaluated over the same period as the NN training set or when the analysis is evaluated against observations that are assimilated - see the detailed remarks.
General remarks
---------------
1) On several occasions, the text alternates between two topics, which can be sometimes confusing:
- In the introduction, between physical- and machine learning aspects of your problem.
- L91-92 between the description of the retrieval process and a post-processing that you apply (discarding negative values + averaging)
- L212-213 between description of results and conclusion
- L362-367+ between the NN results on the test (RMSE and correlation) and the application of the NN.
I would advise to reorganise the text to avoid giving the reader the impression to go back-and-forth between topics, and also to take this opportunity to make the logical connections more visible. For example, it is only after reading the whole introduction that I understood the logical links between the various aspects of the problem.
2) Eqs 1-3: at first, I thought that there was a typo in 1a (that the argument of the model should be the analysis and not the background). It is only after reading Eq. 2 that I understood that it is not. Overall, the presentation of the data assimilation method is highly confusing, first because there is a temporal aspect, which means that the method should in principle be called "smoother" and not "filter" (I can understand that, for historical reason, you would prefer to call it a filter, but you should at least mention this point), and second because the temporal aspect is never really explicitly explained: is the control vector the state at t=i or t=i-1?
From what I understand, the fact that M appears in the definition of the Kalman gain means that the control vector is the state at time t=i-1, which means that there is a typo in 1b, it should be:
x^a_{i-1} = x^b_{i-1} + K...
or equivalently:
x^a_{i} = Mx^a_{i-1} = x^b_{i} + MK...
in the case of a linear model.
Then, the sentence L 557-158 "The initial state is determined by the analysis performed over the previous 24-hour assimilation window." is confusing. The initial state for what? Or is this sentence only here to explain that the time step between t=i-1 and t=i is 24h?
Finally, there is no mention about cycling. Are you using cycled data assimilation or is the background reset at each assimilation cycle?
Specific remarks
----------------
- The LAI acronym introduced multiple times (L30 & L37)
- L37 "SIF observations" later in the introduction, these are called "TROPOMI SIF retrieval" (L46-47, L50) Are you referring to the same object? Please try to be consistent with your nomenclature.
- L40 a transition is missing before the second paragraph
- L42 "greater accuracy" -> "higher accuracy"?
- L43 "better suited to the requirements of an operational system" this statement is questionable. Taking the example of meteorology, machine learning weather prediction models sometimes show a lack of physical consistency in their forecasts, which is typically not what you would like in an operational setting.
- L44 "are beginning to emerge" the article cited has been published 6 years ago, I wouldn't say that it is the beginning.
- L45 "Deep learning is based on the supervised training" there are many examples of applications of semi-supervised or even unsupervised learning in the geosciences
- L46 "To enable more real-time and operational monitoring..." Shouldn't you start a new paragraph here?
- L54 "seems more suitable" at this point, this is hypothetical, as it remains to be proven that such a ML-based operator can be constructed and possesses sufficient accuracy for the task. Perhaps replace "seems" by "could" or "would"?
- L63 "the methods including the dataset" I wouldn't say that a dataset can be considered as a method.
- Figure 1: For panel (a), I would suggest to use a rectangular projection (like PlateCarree for example) to be consistent with the domain used. I assume that the colourmap is the same for all three panels (it is not explicitly mentioned). With this choice of colours, the map is very difficult to read. I would advise to choose a so-called categorical colourmap. Also, I am not sure whether the figure is intended to be single or double column, but I have the impression that a lot of space is lost with padding and marging.
- L86-91 I would reformulate these sentences. As a non-expert in land surface modelling, I got the impression that the same process is described three times and was totally unable to understand exactly what is done here.
- L91-92 "This means that negative values for TROPOMI SIF can occur, which we have discarded." I am a bit lost here (most probably because I don't know what exactly SIF is): is SIF supposed to be positive?
- L98-100 At this point, it is unclear for what these additional measurements will be used.
- L107-110 "if an observation is missing, the gap is not filled" and "using an arithmetic average if at least 50% of the grid points are observed (i.e. half the maximum amount)" this whole part is confusing. First, I think that it would be good to remind the reader why observations can be missing and if we are speaking of only parts of the observation missing or entire maps missing. Then, depending on the answer to that question, does the order of interpolations (space or time first) matter? Finally, the 50% threshold does not really make sense to me.
- L111 "instead" is repeated
- L112 "is spatially interpolated in the same way" no time interpolation for these observations?
- Figure 2: I would advise to use a grid, otherwise the plot is difficult to read. You can also set the green axis range to (0, 5) so that the tick positions are exactly the same for all three axes. Furthermore, is there a real interest to present the data as points (with fancy markers that can be distracting) instead of lines? In the caption, you mention the domain in Fig 1, I would suggest to add the (a) label to remind the reader that it is the median over the European domain. Finally, are you showing the statistics for the raw or interpolated product?
- L 118 "which is present only in the TROPOMI SIF and LAI V1 datasets" I only saw that spike for LAI V1 in the figure after zooming (the green cross is almost hidden by a blue triangle). I think that this is a good example why lines would be perhaps more appropriate for this figure.
- L136-137 "This process involves identifying the optimal control vector according to a model that minimises discrepancies between observations and a prior background estimates" This needs to be reformulated. As is, one understands that the choice of the control vector is part of the DA system, that the model does the minimisation, that discrepancies are computed between background and observations, and that the background is to be optimised! Furthermore "prior background" seems redundant.
- L139 "(SEKF) (Mahfouf and Bilodeau, 2009)" -> "(SEKF, Mahfouf and Bilodeau, 2009)"
- L140 "x, where the subscript denotes the temporal step" Please reformulate. As is, you are referring to a subscript in "x", which has no subscript.
- L144 "where the superscripts "a", "b", and "o" stand for analysis, background and observation, respectively." I think that you can remove the "o" superscript. Also, I would use an italic font for "a" and "b" here, to match what is used in the equations.
- Eqs. 3a & 3b, K should be in \mathbf{}
- L160: is "assessed" the appropriate word here?
- Before section 2.3 (but also before section 2.2) I think that it would be good to provide some transitions.
- L172-173: "we will train a single neural network to cover all pixels." this gives the impression that you design a NN that processes the map at once, which, as far as I understand later on, is not the case: each grid point is processed independently.
- L174 "probability of belonging to a given category" Now I am confused, I thought it was a regression problem, why do you mention classification?
- L 183: "with a maximum close to zero" do you mean here the maximum of the SIF values or the mode of the distribution?
- L 184 "below-zero predictions" -> "negative predictions"?
- L184-185: "we applied the following strictly increasing function to the TROPOMI SIF values to predict." The sentence should end with ":". Also, to prove your point, it would be perhaps easier to give the expression of the inverse function and show that it has positive values, no?
- L190-200: I am confused here. First, you should mention explicitly what is coming from observations and what is coming from simulations. I guess that GPP, LAI V1, TROPOMI SIF are observations, but for the rest it is not explicitly said. Then, you mention a simulation database: where does it come from? Did it already include assimilation? Does it use the same model as the one you are using for the assimilation experiments? This last point is especially important: if the NN is trained to predict observations based on simulated variables (i.e. more or less something close to a background), then in a data assimilation experiment I would expect the innovation (y-Hx^b) to be essentially noise, right? In the end, I don't think it is an issue, because you only selected LAI-V1 (which, as far as I understood, is an observation product) plus some extra predictors, but this is definitely confusing.
- L204-207: I did not understand this paragraph, in particular the logical connections.
- Fig 4: perhaps it would be better to present these results as a table instead of a figure?
- L214: ZSE and LON seem to have similar impact. Why not dropping LON as well?
- L215 "even if the observation operator has learnt the model biases": how could the observation operator learn the model biases, since it only relies on observation products in the training?
- L217: "to estimates" -> "to estimate"
- L218-225: I would reformulate this paragraph a bit to make it a bit smoother. Also, you mention "from 1 June 2018 to 31 May 2021" for the full data, but it is mentioned in 2.1.2 that Troposif provides data from May 2018 to December 2021. Did you discard part of this data? Also the final year is not used at all?
- L228-229: "As mention earlier" -> "As mentionned earlier"
- L227: "our focus is on a small area" it is unclear whether the small domain is used only for the assimilation or for the selection of the training dataset as well? Also, it is unclear whether the forecast model used in the assimilation experiments is able to run on the small domain only or needs to be run on the whole European domain?
- L231: "The assimilation period runs from 1 January 2018 to 31 December 2021" I thought that the Tropomi data only started in May 2018?
- L233: why did you change the land cover map? If the second is more accurate and more recent, then why not using it for the training database?
- L241-243: It is not entirely clear in the text that you are talking about the observation error for the SIF observations (and not for the LAI observations).
- L236-246: you don't mention the dates for the experiments assimilating SIF. EDIT: From the caption of figure 7, I get the impression that the SIF assimilation experiments cover June 2018 - May 2020, which includes the training set of the NN (June 2018 - June 2019). I would heavily advise to remove this period from the statistics.
- Table 2: is there an interest in including the line without data points? Also, you haven't defined ubRMSE (I haven't checked, but there might be other undefined acronyms)
- Fig. 5: I would advise reducing the padding of the figure overall (I don't think that the ticklabels are needed) and using continuous (instead of discrete) colourmaps. Perhaps you could also change the projection to PlateCarree. Furthermore, I don't understand the choice of a diverging colourmap for correlation, unless there is something specific happening when the correlation is 0.5? Finally, even if this figure is intended for one column, I think you can use a bit more horizontal space.
- L279: "The saved weights of the neural network are implemented in the assimilation scheme" please reformulate.
- L281: "LAI representation will be our main focus on the control vector variables" I don't understand what you mean here?
- L282: "the monthly-averaged maps over the C3 crops" It is unclear to me whether the figure will show the increments only over this land category or over the entire domain. Also, is there a reason why you want to focus on this category?
- Figs. 6 & 7: Same as for the previous figure, I think that you can remove the ticklabels and reduce padding overall. For this data, you absolutely need a diverging colourmap centered on zero.
- L290: You mention an RMSE here. Can you be explicit and say between what and what this RMSE is computed? From the caption in the figure, I would assume that this is the RMSE between the LAI observations and the LAI analysis. I think it would be good to remind the reader that LAI observations are not assimilated in this setup and hence that they can be used to evaluate the analysis, provided that their errors are independent of the errors of the SIF observations.
- L295: "all cells with a p-value higher than 0.01 were removed" can you please specify which statistical test is used to compute this significance test? Is this mask also applied in the delta(RMSE) panel?
- L300: "are closer to the observed values" according to which metric?
- L306: "the reference LAI-V1 experiment" I think that you should be careful with the results of that experiment since here you are evaluating against LAI observations, i.e. the observations that are assimilated.
- Section 3.2.2: same remark for the experiments here.
- Fig. 11: The caption is not very clear. It doesn't give a description for (a). Which analysis is plotted? I assume the whisker show the in-situ measurements but it is not mentioned.
- L237-238: "we must compare all in-situ SIF measurements on a given day with the simulated SIF value resulting from the analysis for the two situations shown in Fig. 11(b)" I don't understand this.
- L340-341: "the values are closer overall to the average airborne measurement value": but I thought that absolute values should not be compared because of different wavebands?
- Section 4.1 After reading the entire section, I am confused. I don't understand what you want to demonstrate here. The figure seems to show that the output of the NN more or less matches the values of TROPOMI SIF (which is what it has been trained for) and does not match another instrument: this is absolutely to be expected, or am I missing something here?
- L361: "Using this simple architecture" you are presenting the setup here, not really the architecture.
- L362: "no overfitting evident" -> "no evident overfitting"?
- L370: "update rate yielded by TROPOMI SIF assimilation is five times higher than that of 10-day synthesis assimilation" I dont understand this point
- L374: "Using a neural network operator instead of a physical model makes propagating uncertainties difficult" What do you mean here? The uncertainties are all accounted for the in the B and R matrices, no?
- L375-376: "It is also important to understand that the neural network is trained to emulate observations with instrument uncertainties rather than the true physical variable, making quantifying the observation error of the true value challenging. I don't fully agree with that statement. The instrumental errors are (in principle) unpredictable and hence the NN cannot learn them. It is true thah the use of a NN will introduce modelling error (because the process from state to observations is not perfectly represented) but this is also the case for any observation operator.
- L416-417: "information that cannot be changed" this is unclear: why couldn't you change the values of the support parameters?