Advancing Crop Modeling and Data Assimilation Using AquaCrop v7.2 in NASA's Land Information System Framework v7.5

De Lannoy, Gabriëlle J. M.; Busschaert, Louise; Bechtold, Michel; Lanfranco, Niccolò; de Roos, Shannon; Heyvaert, Zdenko; Mortelmans, Jonas; Scherrer, Samuel A.; Van den Bossche, Maxime; Kumar, Sujay; Mocko, David M.; Kemp, Eric; Heng, Lee; Steduto, Pasquale; Raes, Dirk

doi:10.5194/egusphere-2025-4417

Preprints

https://doi.org/10.5194/egusphere-2025-4417

Preprints

06 Oct 2025

| 06 Oct 2025

Advancing Crop Modeling and Data Assimilation Using AquaCrop v7.2 in NASA's Land Information System Framework v7.5

Gabriëlle J. M. De Lannoy, Louise Busschaert, Michel Bechtold, Niccolò Lanfranco, Shannon de Roos, Zdenko Heyvaert, Jonas Mortelmans, Samuel A. Scherrer, Maxime Van den Bossche, Sujay Kumar, David M. Mocko, Eric Kemp, Lee Heng, Pasquale Steduto, and Dirk Raes

Abstract. This paper introduces the open-source AquaCrop v7.2 model as a new process-based crop model within NASA's Land Information System Framework (LISF) v7.5. The LISF enables high-performance crop modeling with efficient geospatial data handling, and paves the way for scalable satellite data assimilation into AquaCrop. Through three exploratory showcases, we demonstrate the current capabilities of AquaCrop in the LISF, along with topics for future development. First, coarse-scale crop growth simulations with various crop parameterizations are performed over Europe. Satellite-based estimates of land surface phenology are used to inform spatially variable crop parameters. These parameters improve canopy cover simulations in growing degree days compared to using uniform crop parameters in calendar days. Second, ensembles of coarse-scale simulations over Europe are created by perturbing meteorological forcings and soil moisture. The resulting uncertainties in root-zone soil moisture and biomass are often greater in water-limited regions than elsewhere. The third showcase aims to improve fine-scale agricultural simulations through satellite data assimilation. Fine-scale canopy cover observations are assimilated with an ensemble Kalman filter to update the crop state over winter wheat fields in the Piedmont region of Italy. The state updating is beneficial for the intermediary biomass estimates, but leads to only small improvements in yield estimates relative to reference data. This is due to strong model (parameter) constraints and limitations in the assimilated satellite observations and reference yield data. The showcases highlight pathways to improve or advance future crop estimates, e.g. through crop parameter updating and multi-sensor and multi-variate data assimilation.

Received: 13 Sep 2025 – Discussion started: 06 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-4417', Anonymous Referee #1, 06 Nov 2025

This work presents crop modeling using the AquaCrop model within NASA’s Land Information System Framework (LISF). First, the authors describe how crop growth is simulated in the AquaCrop model. Then, they present three experiments that inform new LISF features in parameter perturbation and data assimilation. They find that updated crop parameters based on satellite estimates of land surface phenology improve canopy cover simulations. Using data from crop states in winter wheat fields in the Piedmont region of Italy, biomass and canopy cover improved, though crop yield did not. The manuscript is well written. However, some comments need to be addressed before it can be considered for publication.

Comments:

Line 5. Please specify the coarse-scale, for example: coarse-scale (>10km?)

For the showcase 2, I don’t understand how exactly to perform the perturbations on shortwave radiation, precipitation, soil moisture. In my understanding, perturbations are used to understand the model sensitivity to the changes of forcing data or other targets. Typically, large ensemble simulations are performed with an increase or decrease of a target (using precipitation as an example) to understand how crop growth responds to variations in precipitation. Some studies perform such perturbations one at a time to understand the impact of a single perturbation on the simulation. In Table 2, I see that SW and P are multiplied by a standard deviation ratio. Is that correct? If so, then SW and P are both decreased and soil moisture is increased. This type of perturbation seems odd to me because it only shows how the crops respond to decreased SW and P. Why do these perturbation experiments matter for the study?

Line 290. Why is a perturbation bias correction is needed here? Again, in my understanding, perturbation experiment is just to vary the forcing data to an acceptable range and see how crop growth repones to these changes.

Line 292. What are these 24 members? With which perturbation combinations?

Line 294. Why use only three years results?

For Showcase 3, I think the goal is to compare the DA results and the original results, why was the ensemble model performed? What is the relationship between the ensemble runs in showcase 2 and 3? I suggest deleting the OL results because they distract from the main points of Showcase 3.

Line 354. “The years 2017 through 2023” why the simulations are carried out for these years? Is it because of observation availability?

Table 3. Please explain why DA does not generate a better yield simulation.

Figure 10. Please show a plot for obs vs Det.

Line 482. Could you perform process-based deeper analysis to confirm there are parameter constraints.

Citation: https://doi.org/10.5194/egusphere-2025-4417-RC1
- AC1: 'Reply on RC1', Gabriëlle De Lannoy, 06 Nov 2025
  
  We thank the reviewer for the timely and constructive comments. We list the comments in bold fonts below and provide answer in normal fonts, with suggestions for updated text in italic (additions are underlined). Line numbers refer to the submitted manuscript.
  1. Line 5. Please specify the coarse-scale, for example: coarse-scale (>10km?)
  Sure, we will add in the abstract that the resolution is 0.1 degree.
  2. For the showcase 2, I don’t understand how exactly to perform the perturbations on shortwave radiation, precipitation, soil moisture. In my understanding, perturbations are used to understand the model sensitivity to the changes of forcing data or other targets. Typically, large ensemble simulations are performed with an increase or decrease of a target (using precipitation as an example) to understand how crop growth responds to variations in precipitation. Some studies perform such perturbations one at a time to understand the impact of a single perturbation on the simulation. In Table 2, I see that SW and P are multiplied by a standard deviation ratio. Is that correct? If so, then SW and P are both decreased and soil moisture is increased. This type of perturbation seems odd to me because it only shows how the crops respond to decreased SW and P. Why do these perturbation experiments matter for the study?
  The perturbation setup follows what is done in state-of-the-art ensemble land surface modeling (e.g. Kumar et al., 2008, Heyvaert et al. 2023). The goal is to obtain a full random error (uncertainty) estimate on the model simulations, introduced by a combination of all possible errors (perturbations) in the input. We chose to perturb the input forcing and state variables (assuming that random errors in the parameters are implicitly captured in errors on the state variables) to get an integrated dynamic estimate of the uncertainty in CC, biomass and other output variables. See also response to comment 3 below.
  We will re-order sentence describing the purpose of ensembles as follows:
  L.275: “...to create an ensemble of crop model trajectories to determine the model sensitivity to these aspects individually. However, ensembles are also used to quantify (i) the total time-varying uncertainty of the simulation output, or forecast error, and (ii) the correlation of the forecast errors between the various simulated variables. These dynamic ensemble uncertainty estimates are particularly important for DA.... most ensemble simulations with AquaCrop have been performed to study the model sensitivity to crop parameters, and not to estimate the total model uncertainty in response to errors in the state or meteorological estimates.”
  The SW and P are perturbed through multiplication with a factor 1+/- a random number taken from a distribution with standard deviation (std) 0.3 or 0.4. For example, for a std of 0.3, 68% of the distribution of the multiplication factor is thus randomly sampled between 0.7 and 1.3.
  We will add in the caption of Table 2: “The std is shown relative to the mean (1 or 0) multiplicative or additive perturbation value.”
  Furthermore, we will update the text with a reference to earlier studies that use similar tables for land surface modelling, and we will repeat details from the Table 2 caption in the text:
  L.286: “The perturbation parameters are spatially and temporally constant, as summarized in Table 2. The setup is inspired by state-of-the-art land surface data assimilation studies (Kumar et al., 2008, Heyvaert et al., 2023). The resulting random perturbations are applied (i) hourly to the hourly MERRA-2 shortwave radiation and precipitation, with a 24 hour temporal autocorrelation to obtain meaningful daily perturbed forcings as input to AquaCrop, and (ii) daily to soil moisture estimate without any temporal autocorrelation. The hourly perturbed MERRA-2 data are converted to daily AquaCrop forcing input of ETo and P as in Busschaert et al. (2022).”
  3. Line 290. Why is a perturbation bias correction is needed here? Again, in my understanding, perturbation experiment is just to vary the forcing data to an acceptable range and see how crop growth repones to these changes.
  The crop model is non-linear and the zero-centered perturbations to the input can lead to biases in the output. The perturbation bias correction ensures that the ensemble open loop remains unbiased relative to the deterministic simulation as mentioned in the paper. The ensembles in this study are meant to estimate the integrated uncertainty only (to serve later in a data assimilation system), not systematic deviations or biases.
  We will edit this as follows:
  “to keep the soil moisture ensembles centered around the unperturbed deterministic simulation, and avoid that biases in soil moisture propagate into the biomass uncertainty estimates.”
  4. Line 292. What are these 24 members? With which perturbation combinations?
  The perturbation combinations are random. One member is one entire model trajectory using a combination of slightly perturbed SW, P, and soil moisture in multiple compartments. The perturbations are random, and different at each time step (even if there is some autocorrelation in the perturbations for the meteorological forcings). References will be added as proposed in response to comment 2.
  5. Line 294. Why use only three years results?
  This is a showcase, and 3 years already makes the point that we are able to construct good ensemble uncertainty estimates and provide scientific insight in them. Also note that ensemble simulations are computationally intensive.
  6. For Showcase 3, I think the goal is to compare the DA results and the original results, why was the ensemble model performed? What is the relationship between the ensemble runs in showcase 2 and 3? I suggest deleting the OL results because they distract from the main points of Showcase 3.
  The DA simulations are based on an ensemble simulation to obtain forecast uncertainty estimates. The reference without DA is thus an ensemble OL. Because of nonlinearities, the ensemble mean OL and deterministic simulations are never perfectly identical and we want to disentangle this effect from the DA update process. We therefore prefer to keep the OL results in the paper, in line with most DA publications for transparency.
  7. Line 354. “The years 2017 through 2023” why the simulations are carried out for these years? Is it because of observation availability?
  This is because of the combined availability of crop maps, yield data, and assimilated observations.
  8. Table 3. Please explain why DA does not generate a better yield simulation.
  This is explained in L. 467-474.
  9. Figure 10. Please show a plot for obs vs Det.
  We can add the figure in supplement, if needed, but it would distract from the main message of what the DA is doing relative to its reference OL. By adding the deterministic and OL run, we would need to introduce more discussion of why ensembles deteriorate the deterministic run, which is in fact beyond the goal of this showcase. See also our response to comment 6.
  10. Line 482. Could you perform process-based deeper analysis to confirm there are parameter constraints.
  Some of these parameter constraints are explained in Line 464-466, but might have been lost in the discussion. We will explicitly add the word “parameter” in these lines and add references to the respective sections for clarity:
  “CC_i cannot be updated above the CC_pot,sf,I parameter (Section 2.2, Appendix A)....and the yield range is limited by the CC_pot,sd,i and HI_o parameters.”
  
  Citation: https://doi.org/10.5194/egusphere-2025-4417-AC1
- AC2:
  'Reply on RC1', Gabriëlle De Lannoy, 17 Dec 2025
  We would like to complete our response to the valuable suggestions of Reviewer 1 with two additional remarks:
  We will further edit the text on how ensembles are generated, as described in our response to Reviewer 2.
  
  In addition to addressing all comments, we reran the simulations for Showcase 1&2 with ERA5 forcings instead of MERRA2 forcings, because the reference evapotranspiration obtained with MERRA2 in LIS is at the high end for crop simulations. The Figures 4 through 7 will thus be replaced for consistency, but without any consequence for the scientific findings.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4417-AC2
RC2:
'Comment on egusphere-2025-4417', Matthew McCabe, 28 Nov 2025

The work by De Lannoy et al. applies the AquaCrop model within NASA’s Land Information System Framework (LISF) to support high-performance crop modeling and scalable satellite data assimilation (DA). The capabilities of this integrated system are demonstrated through three exploratory showcases: (1) coarse-scale simulations across Europe, where incorporating spatially varying Growing Degree Day (GDD) crop parameters, derived from satellite-based phenology estimates, improved canopy cover (CC) simulations compared to uniform calendar-day parameters; (2) coarse-scale ensemble simulations revealing that uncertainties in root-zone soil moisture and biomass (B) are greater in water-limited regions; and (3) fine-scale satellite DA using an Ensemble Kalman Filter (EnKF) over winter wheat fields in Italy, where CC state updating benefited biomass estimates but provided limited improvements in yield estimates. Overall, the authors concluded that the resulting computational efficiency is encouraging for advancing regional-scale crop modeling and DA.
The manuscript is well written and addresses an important application. In the review below, I outline several suggestions and related comments that the authors might consider to help with the interpretation and readability of the manuscript. These comments are provided in two categories; (1) page- and line-specific comments, and (2) general comments related to the conceptual clarity of broader sections.
(1) Comments specific to Page and Line:
Page 7. Lines 177-178. The statement from “The entire…” to “…simulation year.”
This statement appears to imply that this condition holds for both GDD and Calendar-day configurations. However, I don't think this is the case when AquaCrop is run in calendar-day mode. It would be helpful if the authors could clarify this distinction and explicitly state that the described behavior pertains only to the GDD configuration.
Page 8. Figure 3.
The parameters related to rooting depth evolution (Zn and Zx) did not appear to be defined in the text prior to this figure. It would be helpful to clarify their meaning in the caption (Zn: minimum rooting depth and Zx: maximum rooting depth).
In addition, I understand that the crop stage timings derived from the GLSP product correspond to points A through F, and these should align with AquaCrop’s crop stages, as they represent equivalent physiological events. However, in Figure 3, the GLSP points did not appear to align temporally with the corresponding AquaCrop’s crop stages (with the exception of point A, which matched the (I) time to emergence). Could the authors clarify whether this misalignment is due to the visualization, or whether there is a conceptual reason for the offset?
Page 9. Lines 213-215. The statement from “Furthermore, only….” to “…future versions.”
Are these parameters (i.e., spatially variable vs. uniform parameters) listed anywhere in the manuscript? It would be helpful for readers if these parameters were explicitly stated.
Page 10. Lines 231-233. The statement from “The most…” to “…and HIo.”
First, this statement would be stronger if supported by one or more references. Ideally, the authors might cite studies that have conducted sensitivity analyses for AquaCrop and identified these parameters as the most important crop parameters. As it stands, making this claim without supporting evidence or analysis is not fully convincing.
Second, there appears to be a typo, GCG should be CGC (Crop Growth Coefficient). The acronym is used as CGC earlier in the manuscript, so this should be corrected here.
Page 10. Lines 250-251. The statement from “The mapping…” to “…in Appendix A.”
This comment is related to the earlier comment raised for Page 8, Figure 3, but with an additional point that requires clarification.
Table 1 clearly explained the relationships between AquaCrop and GLSP crop stages; however, in Figure 3, these parameters did not align on the same points on the x-axis. AquaCrop’s crop stages consistently appeared earlier than the corresponding GLSP, except for one stage. It would be helpful if the authors could clarify whether this misalignment is intentional (e.g., due to methodological reasons) or a visualization issue.
Additionally, the definition of stage “II. Time to maximum rooting depth” being set at 0.7*D needs further explanation, why was 0.7 chosen specifically? What is the basis for this proportion?
Page 12. Line 285. The statement “….an ensemble of 24 members.”
It would be helpful to provide a brief explanation (either here or in an appendix) of how the perturbations to the meteorological forcing and soil moisture state variables generated the ensemble of 24 members. A short description of the perturbation scheme or the combination logic would greatly improve clarity for readers.
Page 12. Lines 290-291. The statement from “A perturbation…” to “…simulation.”
To clarify the workflow: was the perturbation bias correction applied before generating and running the ensemble members?
Page 12. Lines 294-295. The statement from “…multi-year average” to “…croplands in Europe.”
Given the description of Section 3.2.2, I was anticipating that the authors might report the multi-year average ensemble standard deviation for CC, alongside biomass and root-zone soil moisture.
While I recognized that the primary objective of this showcase is biomass (as stated on Page 9, Line 221), the inclusion of root-zone soil moisture, despite it not being explicitly part of the stated objective, suggests that the analysis is not limited strictly to B. Was there a reason that CC was excluded from this uncertainty assessment?
Page 12. Lines 297-299. The statement from “The spread…” to “…water content (RSW).”
The explanation of the spread in B as directly resulting from perturbations in soil moisture and radiation (through ETo) is conceptually sound - but doesn't it simplify AquaCrop’s internal behavior? As a process-based model, AquaCrop includes multiple threshold-driven stress functions and nonlinear feedbacks among canopy cover, transpiration, and biomass. Consequently, the ensemble variability in B likely reflects not only the imposed input perturbations but also the model’s inherent stress-response interactions.
The statement that “The ensemble spreads of soil moisture and B are related to each other, and to environmental conditions, more specifically to RSW” would benefit from additional mechanistic clarification. Although RSW is a physically meaningful indicator of soil wetness, AquaCrop’s stress functions translate this variable into nonlinear physiological responses. Hence, the relationship between B and soil moisture spreads cannot be attributed to RSW alone without considering these embedded physiological thresholds and nonlinear responses. Clarifying this connection would likely make the interpretation more consistent with AquaCrop’s process-based structure.
Page 13. Table 2.
I could not find where the choice of probability distributions for the ensemble perturbations (normal for additive variables & lognormal for multiplicative variables) was explicitly justified in the manuscript. It would be useful to explain the rationale behind these selections, as the assumed distribution directly influences the shape and magnitude of the propagated uncertainty.
It was also a little unclear whether the perturbed parameter values remain within physically plausible ranges. Depending on the perturbation amplitude and distribution, it is possible that the sampled values may fall outside the realistic domain for the study region. Such unrealistic values could introduce artificial spread in the outputs and potentially complicate interpretation of the resulting uncertainty patterns. Clarifying whether constraints, bounds, or validity checks were applied to the perturbed parameters would help readers assess the robustness of the ensemble design.
Page 13. Lines 321-323. The statement from “The cumulative…” to “…assimilation frequency.”
It is unclear whether this was implemented in your experiment or just mentioned conceptually.
Page 14. Lines 329-330. The statement from “Except for…” to “…updating in AquaCrop.”
I believe there are at least two published studies that have reported on the assimilation of satellite-based fractional vegetation cover into AquaCrop for CC state updating. For example:
https://www.mdpi.com/2073-4395/11/11/2265
https://www.mdpi.com/2073-4395/9/7/404
The authors might want to review these works and perhaps revise the statement on Page 14, Lines 329-330.
Page 14. Lines 346-347. The statement from “moisture state variables…” to “…in Table 2.”
Was there a reason why soil moisture state variables were excluded from the perturbation setup in Showcase 3? Although Section 4.4 provides justification, specifically the lag between soil and crop responses, it would strengthen the clarity of this section if that rationale were also mentioned here.
Page 15. Equation 11.
Equation 11 lists both CC and B as state variables updated by the EnKF. However, as far as I understood, FCOVER was assimilated primarily to update CC. It would be helpful if the authors clarified how B is handled during the update step: is B explicitly updated by the EnKF, or does it adjust only indirectly through model propagation after CC is updated?
This point is related to the earlier comment on Page 13, Lines 321-323, and addressing both would improve clarity regarding the DA update mechanism.
Page 15. Lines 383-387. The statement from “At this…” to “…the year 2018.”
There is no issue having this qualitative evaluation of AquaCrop estimates - but to make the evaluation more meaningful, you might also include some quantitative error metrics, such as RMSE, bias, or other standard performance measures, to assess the discrepancy between AquaCrop time-series estimates and the independent satellite data. If possible, incorporating these metrics would likely strengthen the evaluation and provide a clearer picture of how well the model performs at coarse scale.
Page 16. Line 391. The statement “…responds to the temperature pattern.”
The authors might consider including a map (either into Figure 5 or placed in an appendix) illustrating the temperature pattern referenced here.
Page 16. Lines 397-398. The statement from “For ΔB…” to “….part of the domain.”
For CC, the manuscript provided possible explanations for the strong performance reduction in Norway, but no similar interpretation was offered for ΔB. The authors might like to discuss some of the potential reasons why simulations using GDD crop parameterization did not yield improvements in other regions of Europe (particularly western and northern areas).
(2) General Comments:
Appendix C. AquaCrop Crop Parameters.
What is the difference between the parameter (Total length of crop cycle in growing degree-days) and the parameter (GDDays: from transplanting to maturity)?
Based on their physical meaning, they appear to represent the same concept, which suggests they should be assigned the same value. However, in the second column of the table (Generic GDD), the authors gave Total length of crop cycle in growing degree-days a value of 3123 GDD, while GDDays: from transplanting to maturity was assigned F, meaning each had a different value. This differentiation did not appear in the third column (Winter wheat GDD), where both inputs had the same value (2694 GDD). Might be helpful if this were clarified.
Showcase 1. Crop Parameterization.
The authors described how inputs related to climate, crop, soil, and field management were parameterized (for field management; only one input - soil fertility level). However, there was no explanation of how other inputs, such as those related to irrigation management, groundwater table, or additional field management, were handled? How were these inputs/parameters parameterized, especially given that the simulations were conducted at a coarse scale where substantial spatial variation exists across the study area?
Showcase 3. Satellite-based Data Assimilation.
A simple exploration of the rationale for choosing fractional vegetation cover (FCOVER) as the assimilated variable in AquaCrop i.e. why was FCOVER selected specifically, rather than other variables - might prove useful here.

Citation: https://doi.org/10.5194/egusphere-2025-4417-RC2
- AC3: 'Reply on RC2', Gabriëlle De Lannoy, 17 Dec 2025
  
  We thank Matt McCabe for the timely and constructive comments. They have helped us further improve our paper. A list with all comments and our answers is attached.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4417-AC3

Model code and software

AquaCrop v7.2 Gabriëlle J. M. De Lannoy et al. https://github.com/KUL-RSDA/AquaCrop

Viewed

Total article views: 609 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
342	230	37	609	17	17

HTML: 342
PDF: 230
XML: 37
Total: 609
BibTeX: 17
EndNote: 17

Views and downloads (calculated since 06 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	151	43	10	204
Nov 2025	107	65	12	184
Dec 2025	84	122	15	221

Cumulative views and downloads (calculated since 06 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	151	43	10	204
Nov 2025	107	65	12	184
Dec 2025	84	122	15	221

Viewed (geographical distribution)

Total article views: 586 (including HTML, PDF, and XML) Thereof 586 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 Dec 2025

Short summary

To facilitate regional crop growth simulations at any spatial resolution, with a range of different input sources for meteorology, soil and crop parameters, we have incorporated the AquaCrop model into the NASA Land Information System. This system also facilitates the assimilation of satellite data to update the crop and water conditions during model simulations. We present three exploratory applications to highlight the possibilities and pathways for future research on crop estimation.


Total:	0
HTML:	0
PDF:	0
XML:	0