the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using deep learning to assimilate sun-induced fluorescence satellite observations in the ISBA land surface model
Abstract. Accurate representations of the surface and vegetation are critical for simulating the terrestrial CO2 cycle in response to climate and meteorological conditions. To meet this challenge, an increasing number of satellite missions are being launched which can monitor vegetation conditions and biomass. One is the Copernicus Sentinel-5P mission, which carries the TROPOMI instrument and retrieves Solar-Induced Chlorophyll Fluorescence (SIF). As an indicator of plant photosynthetic activity, SIF can provide critical information for evaluating and parameterising gross carbon flux dynamics in surface models. This study aims to assimilate TROPOMI SIF data into the ISBA (Interactions between Soil, Biosphere and Atmosphere) land surface model developed by Meteo-France, with the objective of directly correcting the representation of Leaf Area Index (LAI) and Gross Primary Production (GPP). To achieve this, we have developed a dedicated observation operator that links the modelled LAI to the TROPOMI SIF daily product. This neural network operator was developed using deep learning and was trained using observations over Europe. This operator achieved good accuracy, was implemented in a land data assimilation system (LDAS), and was used to assimilate TROPOMI SIF in ISBA using a sequential simplified extended Kalman filter. Specific experiments were conducted to study the assimilation process over the Ebro basin in Spain. This area is known for its irrigated croplands, which are not well represented by ISBA. Some experiments assimilate TROPOMI SIF, some assimilate a Copernicus Land Monitoring Service (CLMS) LAI 10-day product, and some assimilate both. This provided us with a useful point of reference for improving the vegetation simulation. Following SIF assimilation, LAI representation improved across the domain, highlighting heavily irrigated croplands. The gross primary production (GPP) derived from the analysis is closing the gap between the simulated and observed values, though a significant difference remains. When compared with other assimilation experiments, assimilating SIF alone provides a similar benefit to standard assimilation of a CLMS LAI product on LAI and GPP. The best improvements to the LAI and GPP results come from co-assimilating TROPOMI SIF with the CLMS LAI product, which combines the advantages of high-frequency SIF observations and robust 10-day LAI assimilation.
- Preprint
(3319 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-941', Alban Farchi, 02 Apr 2026
-
RC2: 'Comment on egusphere-2026-941', Anonymous Referee #2, 10 Apr 2026
Overall quality very high. Comments are all on the presentation of the research, rather than on the calculations.
Abstract: very much like it. It perfectly balances being specific without losing itself in too much detail.
Lines 40-45: while I have experience is ‘classical’ machine learning (Random forest) or data assimilation, deep learning is something new for me. Please state briefly (and explicitly) what the difference is between deep learning and more classical machine learning approaches.
Line 46: if near-real time is the concern, you could have used near-daily SIF data. What is the advantage of using the 8-day TROPOMI? Please clarify.
A thought: the TROPOMI SIF data have never been properly calibrated/validated. They also contain quiet some negative values, so it is likely that some of its values are biased. The traditional answer here is that the variability in SIF contains most of the information, as opposed to the absolute SIF value.
Line 97-100: the airborne campaign is not mentioned in the introduction. How important is it for the research question? The ‘why an airborne campaign’ needs more context.
Line 106: the introduction (line 46) mentions an 8-day SIF product, while this line mentions a daily SIF product. Please clarify
General thought: section 2.2 within methodology would be easier to understand if there were some kind of visual. Figure 3 is helping a bit, but it focusses on the neural network itself, rather than on the assimilation.
Line 193: why having two LAI datasets? LAI and LAIv1?
Line 195: when doing machine learning for SIF, I often note that not DOY, but sin(doy/365*2pi) and cos(doy/365*2pi) are more potent predictors compared to doy itself. What’s the reasoning for not using these?
Line 208: year missing in reference.
Line 228: leave out ‘as mentioned earlier in this study’. Be either specific on where it was mentioned or leave it out altogether (I recommend the latter)
Section 2.3.3 the term ‘learning process’ does not properly cover the content of this paragraph. Consider something like ‘feature selection’ or ‘training variable identification’. My main point is, 2.3.3. does not describe the neural network itself, but rather the selection of the variables and the train-test split.
Section 2.4: role of airplane data in the assimilation is not clear here. The methodology just mentions the existence of the dataset, but not its purpose within the study.
Line 236: a detail, but I am struck the use of the future tense here. Why would you ‘will run’. Haven’t you done it? I presume you mean it’s a continuous (and therefore unfinished, but already initiated) process. The future tense made me think the process wasn’t even started yet.
Line 239: repeat what precisely you mean by the ‘baseline experiment’.
Line 243: again a future tense. Please don’t
Line 244: more future tense. Please don’t
Line 256: I agree on the good agreement, but the word ‘good agreement’ looks sloppy. Just write a metric here.
Line 260: I agree with this interpretation but Huber loss is a bit of an esoteric term and I want to encourage you to make this paper readable for a larger audience than the people that are familiar with both SIF and Huber loss. Explain it a bit more in conceptual terms.
Line 263: how do you determine a bias? What is the reference here? The difference between the TROPOSIF measurement and the modelled SIF? Or difference with the airplane?
Figure 5: the non-existent correlation above the 60th parallel is striking. Why would that be? Presence of snow?
Figure 5: I noticed you cut out mountainous regions. Why?
Conclusion: really strong, but it ends on a false note. What’s stopping you from adding additional variables (you have e.g., temperature maps to your disposal) or you can tune the hypermarameters.
Citation: https://doi.org/10.5194/egusphere-2026-941-RC2 -
RC3: 'Comment on egusphere-2026-941', Anonymous Referee #3, 13 Apr 2026
In this manuscript, Vanderbecken et al. developed a deep learning operator to assimilate TROPOMI SIF into a land surface model (ISBA) to improve the predictions of LAI and GPP. The manuscript reflects considerable effort in the model experiments and data analysis. It has the potential to contribute to the literature by providing a novel framework for assimilating SIF into land surface models. However, I have several concerns that must be addressed.
Major comments:
- The setup of the neural network observation operator:
1) Feature selection: They used a correlation analysis and SHAP to select the features for the neural network.
However, it is unclear to me whether the considered variables are all observed or simulated by the land surface model or mixed. This clarification is important. In my view, the operator should ideally be trained on observational data, as it is intended to capture the observation-derived relationships between the input features and SIF. If, instead, these relationships are learned from model simulations, any inherent biases in the model may be embedded in the operator. As a result, during assimilation, biased model outputs fed into the operator may already be implicitly corrected by the learned relationships. Consequently, the mismatch between the operator output and the observations would be artificially reduced, undermining the effectiveness of the assimilation process.
Furthermore, the SHAP results show that surface soil moisture, GPP, and ET have very limited importance, while only LAI and several spatial and temporal positional features are important. It is hard to believe that SIF only depends on the structural variable, i.e., LAI, in addition to the positional features. SIF should at least correlate with some functional variables, e.g., GPP and/or soil moisture, which is partly shown by the correlation analysis. Would this be because SHAP can not consider the inter-dependence between variables? Therefore, the roles of GPP, etc., are hidden by LAI. A conditional SHAP could be used in this case. In addition, the model on which SHAP is based is not reported.
2) Temporal scale of SIF: The daily SIF is used in this study, but the correlation between SIF and GPP is usually higher in a coarser temporal resolution, e.g., 8-daily. If there is no strong reason that the assimilation must be conducted at the daily scale, I would suggest using a coarser temporal resolution, which also avoids the time interpolation of LAI.
3) The training strategy: During the training, only 40% of the one-year data is used as the training set, while 60% is used for validation. This is a very unusual training-validation split, and I have never seen a split where training is less than 60%. Further, the data from 1 June 2019 to 31 May 2020 is used as the test set if this is not a typo in the text, and then it seems that the data from 1 June 2020 to 31 May 2021 is not used at all, which is strange as well. Overall, the training data is too little compared to the data for validation, test, and unused. A clarification is needed for this choice. Otherwise, significantly more data should be used for training to be expected to improve the model performance.
- The readability of the manuscript:
I found parts of the manuscript somewhat difficult to follow, partly due to issues with flow and the brevity of the figure and table captions. Improving these aspects would likely enhance the overall readability. Specifically:
-Ki in equations 1b and 3a are different with or without bold, which I guess represent the original Kalman filter and the extended Kalman filter, respectively. But this is not clear in the text.
-line 149: I did not understand what is the reason of ‘that is why’. Is the linear assumption not correct?
-lines 160-161: I did not understand this sentence. What do you mean by ‘assimilation is assessed’? What is observation space?
-line 165: unclear to me, probably because of the grammar.
-line 238: What is the open-loop simulation? Is it the baseline?
-Table 1’s caption should include a detailed description of the different experiments, which makes it clear when looking at it alone without checking the description in the main text
-Table 2: Why are the metrics of the training dataset shown? Not validation. Also, the limited overfitting evidence is claimed based on the train and test datasets. Although limited overfitting is true in this case, this is usually demonstrated by training and validation loss.
-Figure 6: It is unclear to me what is shown here and why it is shown. What are increments?
-lines 295-296: unclear what the relation is between the data availability and p-value.
-Figures 7-10: should mention that the domain is the Ebro basin in the captions.
-Figure 11: For each day, does the boxplot represent the values of many pixels from the in-situ measurements?
-line 370: What is the update rate?
Other comments:
-line 9: could mention neural network here already
-lines 96-97: could also mention 0.1 resolution here to guide readers.
-line 109: ‘interpolated’-->’aggregated’?
-line 116: If the FLUXCOM remains at its original resolution, how was the training test run conducted?
-line 183: ‘maximum’. You mean peak?
-line 187: I did not understand how this function keeps the prediction positive. Would ln(SIF) lead to SIF<1 negative?
-lines 204-207: DOY provides the information on the seasonality but LAT and LON should help capture the spatial variability of SIF across the domain.
-Figure 5: could display as a 2*2 figure to save the space. ‘were’-->’where’
-Figure 6: could make use of the horizontal space more.
-lines 321-322: For me, it looks like the co-assimilating SIF alongside LAI and assimilating only LAI-V1 have the similar RMSEs on LAI.
-Figure 7: could change to a divergent color scheme that could differentiate positive and negative values.
-Figure 8: Flxucom should not be able to capture irrigation effects as it infers GPP largely based on meteorological conditions. Therefore, the additional water input by irrigation is likely not be incorporated. So FLUXCOM GPP can not be treated as a ground truth here. Should reconsider the discussion of the relevant results.
-line 339: The operator is not trained in this way. Why not obtain the SIF values for each pixel and then calculate the average value?
-lines 344-345: I did not understand this. Isn’t it an emulator of TROPOMI SIF?
Citation: https://doi.org/10.5194/egusphere-2026-941-RC3
Data sets
WP4 - supplementary data - Using deep learning to assimilate sun-induced fluorescence satellite observations in the ISBA land surface model: Datasets Pierre Vanderbecken https://doi.org/10.5281/zenodo.18668100
Model code and software
WP4 - supplementary data - Using deep learning to assimilate sun-induced fluorescence satellite observations in the ISBA land surface model: model Pierre Vanderbecken https://doi.org/10.5281/zenodo.18669437
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 158 | 88 | 19 | 265 | 13 | 25 |
- HTML: 158
- PDF: 88
- XML: 19
- Total: 265
- BibTeX: 13
- EndNote: 25
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this article, the authors propose to use a NN emulator as observation operator in a data assimilation framework to assimilate SIF (sun-induced fluorescence) observations in a land surface model. The general idea is that a physical model of the observation process would be computationally prohibitive. The NN is first trained using a dataset of satellite observations, and then successfully used in data assimilation experiments. As acknowledged by the authors, the method is not new and was adapted from previous work. Nevertheless, it provides a new application that would make an interesting addition to the literature.
I am therefore overall positive about the manuscript. That being said, the manuscript is not always very clear, especially for non-experts in land surface modelling. In many cases the formulation could be improved. Some details are missing here and there about the experiments (hopefully I spotted and listed most of them). Finally, in a few occasions the method is borderline, in particular when the assimilation experiment is evaluated over the same period as the NN training set or when the analysis is evaluated against observations that are assimilated - see the detailed remarks.
General remarks
---------------
1) On several occasions, the text alternates between two topics, which can be sometimes confusing:
- In the introduction, between physical- and machine learning aspects of your problem.
- L91-92 between the description of the retrieval process and a post-processing that you apply (discarding negative values + averaging)
- L212-213 between description of results and conclusion
- L362-367+ between the NN results on the test (RMSE and correlation) and the application of the NN.
I would advise to reorganise the text to avoid giving the reader the impression to go back-and-forth between topics, and also to take this opportunity to make the logical connections more visible. For example, it is only after reading the whole introduction that I understood the logical links between the various aspects of the problem.
2) Eqs 1-3: at first, I thought that there was a typo in 1a (that the argument of the model should be the analysis and not the background). It is only after reading Eq. 2 that I understood that it is not. Overall, the presentation of the data assimilation method is highly confusing, first because there is a temporal aspect, which means that the method should in principle be called "smoother" and not "filter" (I can understand that, for historical reason, you would prefer to call it a filter, but you should at least mention this point), and second because the temporal aspect is never really explicitly explained: is the control vector the state at t=i or t=i-1?
From what I understand, the fact that M appears in the definition of the Kalman gain means that the control vector is the state at time t=i-1, which means that there is a typo in 1b, it should be:
x^a_{i-1} = x^b_{i-1} + K...
or equivalently:
x^a_{i} = Mx^a_{i-1} = x^b_{i} + MK...
in the case of a linear model.
Then, the sentence L 557-158 "The initial state is determined by the analysis performed over the previous 24-hour assimilation window." is confusing. The initial state for what? Or is this sentence only here to explain that the time step between t=i-1 and t=i is 24h?
Finally, there is no mention about cycling. Are you using cycled data assimilation or is the background reset at each assimilation cycle?
Specific remarks
----------------
- The LAI acronym introduced multiple times (L30 & L37)
- L37 "SIF observations" later in the introduction, these are called "TROPOMI SIF retrieval" (L46-47, L50) Are you referring to the same object? Please try to be consistent with your nomenclature.
- L40 a transition is missing before the second paragraph
- L42 "greater accuracy" -> "higher accuracy"?
- L43 "better suited to the requirements of an operational system" this statement is questionable. Taking the example of meteorology, machine learning weather prediction models sometimes show a lack of physical consistency in their forecasts, which is typically not what you would like in an operational setting.
- L44 "are beginning to emerge" the article cited has been published 6 years ago, I wouldn't say that it is the beginning.
- L45 "Deep learning is based on the supervised training" there are many examples of applications of semi-supervised or even unsupervised learning in the geosciences
- L46 "To enable more real-time and operational monitoring..." Shouldn't you start a new paragraph here?
- L54 "seems more suitable" at this point, this is hypothetical, as it remains to be proven that such a ML-based operator can be constructed and possesses sufficient accuracy for the task. Perhaps replace "seems" by "could" or "would"?
- L63 "the methods including the dataset" I wouldn't say that a dataset can be considered as a method.
- Figure 1: For panel (a), I would suggest to use a rectangular projection (like PlateCarree for example) to be consistent with the domain used. I assume that the colourmap is the same for all three panels (it is not explicitly mentioned). With this choice of colours, the map is very difficult to read. I would advise to choose a so-called categorical colourmap. Also, I am not sure whether the figure is intended to be single or double column, but I have the impression that a lot of space is lost with padding and marging.
- L86-91 I would reformulate these sentences. As a non-expert in land surface modelling, I got the impression that the same process is described three times and was totally unable to understand exactly what is done here.
- L91-92 "This means that negative values for TROPOMI SIF can occur, which we have discarded." I am a bit lost here (most probably because I don't know what exactly SIF is): is SIF supposed to be positive?
- L98-100 At this point, it is unclear for what these additional measurements will be used.
- L107-110 "if an observation is missing, the gap is not filled" and "using an arithmetic average if at least 50% of the grid points are observed (i.e. half the maximum amount)" this whole part is confusing. First, I think that it would be good to remind the reader why observations can be missing and if we are speaking of only parts of the observation missing or entire maps missing. Then, depending on the answer to that question, does the order of interpolations (space or time first) matter? Finally, the 50% threshold does not really make sense to me.
- L111 "instead" is repeated
- L112 "is spatially interpolated in the same way" no time interpolation for these observations?
- Figure 2: I would advise to use a grid, otherwise the plot is difficult to read. You can also set the green axis range to (0, 5) so that the tick positions are exactly the same for all three axes. Furthermore, is there a real interest to present the data as points (with fancy markers that can be distracting) instead of lines? In the caption, you mention the domain in Fig 1, I would suggest to add the (a) label to remind the reader that it is the median over the European domain. Finally, are you showing the statistics for the raw or interpolated product?
- L 118 "which is present only in the TROPOMI SIF and LAI V1 datasets" I only saw that spike for LAI V1 in the figure after zooming (the green cross is almost hidden by a blue triangle). I think that this is a good example why lines would be perhaps more appropriate for this figure.
- L136-137 "This process involves identifying the optimal control vector according to a model that minimises discrepancies between observations and a prior background estimates" This needs to be reformulated. As is, one understands that the choice of the control vector is part of the DA system, that the model does the minimisation, that discrepancies are computed between background and observations, and that the background is to be optimised! Furthermore "prior background" seems redundant.
- L139 "(SEKF) (Mahfouf and Bilodeau, 2009)" -> "(SEKF, Mahfouf and Bilodeau, 2009)"
- L140 "x, where the subscript denotes the temporal step" Please reformulate. As is, you are referring to a subscript in "x", which has no subscript.
- L144 "where the superscripts "a", "b", and "o" stand for analysis, background and observation, respectively." I think that you can remove the "o" superscript. Also, I would use an italic font for "a" and "b" here, to match what is used in the equations.
- Eqs. 3a & 3b, K should be in \mathbf{}
- L160: is "assessed" the appropriate word here?
- Before section 2.3 (but also before section 2.2) I think that it would be good to provide some transitions.
- L172-173: "we will train a single neural network to cover all pixels." this gives the impression that you design a NN that processes the map at once, which, as far as I understand later on, is not the case: each grid point is processed independently.
- L174 "probability of belonging to a given category" Now I am confused, I thought it was a regression problem, why do you mention classification?
- L 183: "with a maximum close to zero" do you mean here the maximum of the SIF values or the mode of the distribution?
- L 184 "below-zero predictions" -> "negative predictions"?
- L184-185: "we applied the following strictly increasing function to the TROPOMI SIF values to predict." The sentence should end with ":". Also, to prove your point, it would be perhaps easier to give the expression of the inverse function and show that it has positive values, no?
- L190-200: I am confused here. First, you should mention explicitly what is coming from observations and what is coming from simulations. I guess that GPP, LAI V1, TROPOMI SIF are observations, but for the rest it is not explicitly said. Then, you mention a simulation database: where does it come from? Did it already include assimilation? Does it use the same model as the one you are using for the assimilation experiments? This last point is especially important: if the NN is trained to predict observations based on simulated variables (i.e. more or less something close to a background), then in a data assimilation experiment I would expect the innovation (y-Hx^b) to be essentially noise, right? In the end, I don't think it is an issue, because you only selected LAI-V1 (which, as far as I understood, is an observation product) plus some extra predictors, but this is definitely confusing.
- L204-207: I did not understand this paragraph, in particular the logical connections.
- Fig 4: perhaps it would be better to present these results as a table instead of a figure?
- L214: ZSE and LON seem to have similar impact. Why not dropping LON as well?
- L215 "even if the observation operator has learnt the model biases": how could the observation operator learn the model biases, since it only relies on observation products in the training?
- L217: "to estimates" -> "to estimate"
- L218-225: I would reformulate this paragraph a bit to make it a bit smoother. Also, you mention "from 1 June 2018 to 31 May 2021" for the full data, but it is mentioned in 2.1.2 that Troposif provides data from May 2018 to December 2021. Did you discard part of this data? Also the final year is not used at all?
- L228-229: "As mention earlier" -> "As mentionned earlier"
- L227: "our focus is on a small area" it is unclear whether the small domain is used only for the assimilation or for the selection of the training dataset as well? Also, it is unclear whether the forecast model used in the assimilation experiments is able to run on the small domain only or needs to be run on the whole European domain?
- L231: "The assimilation period runs from 1 January 2018 to 31 December 2021" I thought that the Tropomi data only started in May 2018?
- L233: why did you change the land cover map? If the second is more accurate and more recent, then why not using it for the training database?
- L241-243: It is not entirely clear in the text that you are talking about the observation error for the SIF observations (and not for the LAI observations).
- L236-246: you don't mention the dates for the experiments assimilating SIF. EDIT: From the caption of figure 7, I get the impression that the SIF assimilation experiments cover June 2018 - May 2020, which includes the training set of the NN (June 2018 - June 2019). I would heavily advise to remove this period from the statistics.
- Table 2: is there an interest in including the line without data points? Also, you haven't defined ubRMSE (I haven't checked, but there might be other undefined acronyms)
- Fig. 5: I would advise reducing the padding of the figure overall (I don't think that the ticklabels are needed) and using continuous (instead of discrete) colourmaps. Perhaps you could also change the projection to PlateCarree. Furthermore, I don't understand the choice of a diverging colourmap for correlation, unless there is something specific happening when the correlation is 0.5? Finally, even if this figure is intended for one column, I think you can use a bit more horizontal space.
- L279: "The saved weights of the neural network are implemented in the assimilation scheme" please reformulate.
- L281: "LAI representation will be our main focus on the control vector variables" I don't understand what you mean here?
- L282: "the monthly-averaged maps over the C3 crops" It is unclear to me whether the figure will show the increments only over this land category or over the entire domain. Also, is there a reason why you want to focus on this category?
- Figs. 6 & 7: Same as for the previous figure, I think that you can remove the ticklabels and reduce padding overall. For this data, you absolutely need a diverging colourmap centered on zero.
- L290: You mention an RMSE here. Can you be explicit and say between what and what this RMSE is computed? From the caption in the figure, I would assume that this is the RMSE between the LAI observations and the LAI analysis. I think it would be good to remind the reader that LAI observations are not assimilated in this setup and hence that they can be used to evaluate the analysis, provided that their errors are independent of the errors of the SIF observations.
- L295: "all cells with a p-value higher than 0.01 were removed" can you please specify which statistical test is used to compute this significance test? Is this mask also applied in the delta(RMSE) panel?
- L300: "are closer to the observed values" according to which metric?
- L306: "the reference LAI-V1 experiment" I think that you should be careful with the results of that experiment since here you are evaluating against LAI observations, i.e. the observations that are assimilated.
- Section 3.2.2: same remark for the experiments here.
- Fig. 11: The caption is not very clear. It doesn't give a description for (a). Which analysis is plotted? I assume the whisker show the in-situ measurements but it is not mentioned.
- L237-238: "we must compare all in-situ SIF measurements on a given day with the simulated SIF value resulting from the analysis for the two situations shown in Fig. 11(b)" I don't understand this.
- L340-341: "the values are closer overall to the average airborne measurement value": but I thought that absolute values should not be compared because of different wavebands?
- Section 4.1 After reading the entire section, I am confused. I don't understand what you want to demonstrate here. The figure seems to show that the output of the NN more or less matches the values of TROPOMI SIF (which is what it has been trained for) and does not match another instrument: this is absolutely to be expected, or am I missing something here?
- L361: "Using this simple architecture" you are presenting the setup here, not really the architecture.
- L362: "no overfitting evident" -> "no evident overfitting"?
- L370: "update rate yielded by TROPOMI SIF assimilation is five times higher than that of 10-day synthesis assimilation" I dont understand this point
- L374: "Using a neural network operator instead of a physical model makes propagating uncertainties difficult" What do you mean here? The uncertainties are all accounted for the in the B and R matrices, no?
- L375-376: "It is also important to understand that the neural network is trained to emulate observations with instrument uncertainties rather than the true physical variable, making quantifying the observation error of the true value challenging. I don't fully agree with that statement. The instrumental errors are (in principle) unpredictable and hence the NN cannot learn them. It is true thah the use of a NN will introduce modelling error (because the process from state to observations is not perfectly represented) but this is also the case for any observation operator.
- L416-417: "information that cannot be changed" this is unclear: why couldn't you change the values of the support parameters?