Building a random forest machine learning model for carbon budget estimation in agricultural fields using discontinuous atmospheric Eddy Covariance measurements
Abstract. Atmospheric CO2 exchanges in agroecosystems are strongly controlled by plant phenology, management and microclimatic conditions, yet obtaining finely resolved flux estimates across heterogeneous agricultural landscapes remains challenging. This study evaluates the ability of a single Eddy Covariance (EC) system combined with wind-sector partitioning and Random Forest (RF) modelling to estimate annual carbon budgets of adjacent fields (wheat, mixed-grain, permanent grassland) in the Marais Poitevin wetland. Fluxes were measured over the period 2023-01-01–2024-01-31 and attributed to the different fields by wind sectors. Two modelling strategies were compared (i) a single global RF trained on all sectors and (ii) sector-specific RFs for each adjacent field. RF models globally showed good overall performance (R2 ≈ 0.68–0.95 depending on sector), while the sectoral approach better reproduced phenological dynamics and responses to management events (harvest, grazing) than the global model, which tended to smooth site-specific signals. Annual carbon budgets estimated from the sectoral models indicate that the permanent grassland and the wheat field acted as net sinks (-259 and -216 g C m-2 yr-1;, respectively), whereas the mixed grain and the hybrid field behaved as net sources (+182 and +231 g C m-2 yr-1). Main limitations include spatial attribution uncertainty related to the EC footprint under stable conditions, flux disturbances during stormy episodes, and the limited one-year observation period. This study highlights the novelty and practical value of coupling a single EC system with wind-sector partitioning and machine learning approaches to resolve carbon fluxes at the field scale within heterogeneous agricultural landscapes. This integrated approach provides a cost-effective alternative to traditional multi-tower setups, offering new opportunities to monitor spatial carbon dynamics and management effects in real agricultural mosaics. Beyond methodological innovation, the goal of this work is to establish a comprehensive carbon budget not merely for a single agroecosystem, but for the terrestrial component of a wetland area, capturing the complexity of its ecological and biogeochemical interactions.
This manuscript presents a novel gap-filling method for CO₂ fluxes measured by a single eddy covariance tower located in the midlle of four fields with different vegetation types. The proposed approach relies on machine learning techniques, specifically a random forest algorithm, to reconstruct fluxes when wind direction does not originate from the target area. The random forest model is trained using a set of environmental variables, including temperature, radiation, soil moisture, as well as information related to crop management practices and the day of year. This approach shows strong potential for reconstructing CO₂ fluxes, as well as energy fluxes, in heterogeneous landscapes using measurements from a single flux tower, and is therefore of great interest for extending the spatial representayiveness of Eddy Covariance.
However, although the method is promising and deserves particular attention, the manuscript would benefit from more details regarding the technicality of the method. It would in particular benefit from the computation of the uncertainties in reconstructed/gap-filled CO2 flux in each wind sector. This woul allow to assess the statistical significance of the differences in CO2 fluxes observed in each wind sector (specific to a vegetation/soil type). The authors could also try to use spatially explicit data to evaluate if the observed CO2 flux dynamics is sound. The use of NDVI or EVI from Sentinel data may be of interest. Due to the lack in uncertainty and signficance analysis, in its current form, the discussion remains somewhat speculative regarding the reconstructed CO₂ fluxes for the plots surrounding the tower.
I would therefore recomend major revision by integrating uncertainties and significance analysis, before interpreting and discussing potential differences between crop types.
Specific comments:
L155-164: in equation ρ should be "dry" air density (ρd). and there is a conversion from g to mol missing . Also give units of w and s.
L190: why using a time delay with colocated CO2/H2O and sonic instruments?
L200: the u* threshold is usually adapted to local conditions. Jusitify the use of a single value.
Figure 3 (L219): you could select only neutral conditions here to have a closer relationship with the roughness z0.
L244-250: this is somewhat redundant with previous paragraph.
L265-270: the temporal subdivision, and all parameters in the RF appraoch, should be explicited in a table (in appendix eventually).
L280: could you explicit what is a qualitative day-night indicator?
L282: PAR also captures seasonal changes in radiation regimes? Is this not redundant?
Section 2.3.3 : in the predictive model, a measure of the leaf area index would be very valuable and would integrate spatial variations. I would recomend trying using an NDVI or EVI high resolution satellite product (from Sentinelle data).
Secion 2.3.4: this is the critical point in this manuscript. I would suggest the authors to compute an uncertainty for the RF reconstructed CO2 fluxes. This would be necessary to interpret the results and determine if significant differences can be retreived from each wind sector. I am not an expert in RF modelling but I guess some uncertainty could be computed. At least from the comparison between model and observed data uncertainties could be deduced.
L314-318: I would recommend avoiding any use of GPP here as NEEday is not GPP. This paragraph could be rephrased to clarify that NEE was simply split between day and night
3.1 section: The Figures 7 and 8 would benefit from showing the difference between the modelled and observed CO2 flux rather than each flux on top of each other. Actually the two may be shown : the modelled fluxes and the difference with observations. This graph should include uncertainties in the modelled fluxes.
The rest of the results and the discussion provides valuable arguments but would really need to have uncertainties to evaluate if the differences discussed are significant or not, especially when comparing annual budgets.