the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Building a random forest machine learning model for carbon budget estimation in agricultural fields using discontinuous atmospheric Eddy Covariance measurements
Abstract. Atmospheric CO2 exchanges in agroecosystems are strongly controlled by plant phenology, management and microclimatic conditions, yet obtaining finely resolved flux estimates across heterogeneous agricultural landscapes remains challenging. This study evaluates the ability of a single Eddy Covariance (EC) system combined with wind-sector partitioning and Random Forest (RF) modelling to estimate annual carbon budgets of adjacent fields (wheat, mixed-grain, permanent grassland) in the Marais Poitevin wetland. Fluxes were measured over the period 2023-01-01–2024-01-31 and attributed to the different fields by wind sectors. Two modelling strategies were compared (i) a single global RF trained on all sectors and (ii) sector-specific RFs for each adjacent field. RF models globally showed good overall performance (R2 ≈ 0.68–0.95 depending on sector), while the sectoral approach better reproduced phenological dynamics and responses to management events (harvest, grazing) than the global model, which tended to smooth site-specific signals. Annual carbon budgets estimated from the sectoral models indicate that the permanent grassland and the wheat field acted as net sinks (-259 and -216 g C m-2 yr-1;, respectively), whereas the mixed grain and the hybrid field behaved as net sources (+182 and +231 g C m-2 yr-1). Main limitations include spatial attribution uncertainty related to the EC footprint under stable conditions, flux disturbances during stormy episodes, and the limited one-year observation period. This study highlights the novelty and practical value of coupling a single EC system with wind-sector partitioning and machine learning approaches to resolve carbon fluxes at the field scale within heterogeneous agricultural landscapes. This integrated approach provides a cost-effective alternative to traditional multi-tower setups, offering new opportunities to monitor spatial carbon dynamics and management effects in real agricultural mosaics. Beyond methodological innovation, the goal of this work is to establish a comprehensive carbon budget not merely for a single agroecosystem, but for the terrestrial component of a wetland area, capturing the complexity of its ecological and biogeochemical interactions.
- Preprint
(3121 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-524', Anonymous Referee #1, 17 Mar 2026
-
RC2: 'Comment on egusphere-2026-524', Anonymous Referee #2, 17 Apr 2026
The manuscript “Building a random forest machine learning model for carbon budget estimation in agricultural fields using discontinuous atmospheric Eddy Covariance measurements” by Pery et al. applies Random Forest models to eddy covariance CO₂ flux data from a heterogeneous agricultural landscape over a full year, comparing a single global model with sector and crop specific models for different crop types. Both approaches use the same meteorological and soil predictors and are evaluated over one year using standard performance metrics. Results show that both strategies capture broad seasonal NEE patterns with generally similar predictive performance, while the crop specific model outperforms the global model when representing phenological stages and harvest events. A major strength of the study is the very thorough and careful analysis of wind sectors and turbulence characteristics, which provides a solid and transparent foundation for footprint attribution and sector based modelling. At the same time, a major limitation of the study is that several key conclusions are driven by differences in model structure, data availability, and temporal segmentation rather than independent evidence of process representation, while incomplete carbon accounting, sparse data coverage for many vegetation types, and the absence of uncertainty quantification limit the strength and generality of the interpretations.
The document has (very) few very small grammatical or spelling errors (i.e., line 58 Gaz, instead of Gas, Campbellsci, instead of Campbell Scientific, Logan Utah, etc.). Furthermore, several sections reintroduce acronyms such as NEE or GPP, which only need to be defined at their first mention.
Major concerns
The comparison between the global and sector specific Random Forest models is difficult to interpret. The sector specific models are trained on more homogeneous data and are independently optimized, which inherently increases their flexibility. Their improved performance is therefore largely expected and does not provide evidence for distinct mechanistic controls on NEE. Rather, the comparison demonstrates spatial and management driven heterogeneity within the flux footprint and should be interpreted as descriptive rather than explanatory. This issue is compounded by the strong imbalance in data coverage across vegetation types. Vegetation classes with sparse observations, representing only 2 to 4 percent of the dataset and showing limited variability, exhibit the highest R² values, whereas wheat, which represents the largest and most dynamic subset, shows substantially lower performance, particularly post harvest. High R² values in sparsely sampled systems are therefore more likely to reflect limited signal complexity than superior predictive skill, and cross vegetation comparisons should be interpreted with caution.
The use of 10 minute averaging intervals instead of the standard 30 minute eddy covariance averaging period requires stronger justification. Shorter averaging windows may exclude low frequency flux contributions and bias NEE estimates, particularly under stable or disturbed conditions. The manuscript does not provide supporting diagnostics such as ogives or spectral analyses to demonstrate that flux carrying frequencies are adequately retained within the 10 minute interval. In the absence of such evidence, the validity of the calculated fluxes and their subsequent interpretation remains uncertain.
The separation of NEE into daytime and nighttime subsets does not constitute a valid flux partitioning. Daytime NEE cannot be assumed to represent GPP, and nighttime NEE cannot be equated with ecosystem respiration without an explicit partitioning approach. Respiration occurs continuously throughout the day and night, and eddy covariance does not directly measure GPP. Consequently, cumulative NEEday and NEEnight sums cannot be interpreted as physiological components of the carbon balance. The justification based on changing wind direction does not resolve this conceptual limitation, and the resulting interpretation of metabolic fluxes and annual component budgets therefore requires substantial revision.
Given the uneven data coverage, the lack of flux partitioning, and the absence of uncertainty quantification, the study would benefit from a clearer discussion of limitations. Without uncertainty estimates, it is difficult to assess whether differences between crops or modelling strategies are meaningful, particularly during disturbance and post harvest periods when eddy covariance data are highly non stationary.
Finally, some of the literature cited in support of the interpretations is more than 15 years old. While not inherently problematic, more recent studies, particularly those addressing grazing systems, may provide more relevant context given the management practices present at the site.
Specific notes
Line 43: Intensive agricultural is typically associated with heavy inputs, which generally don’t correlate with increased carbon storages in soils. Perhaps a clarification of which systems are meant here would help
Line 135 please indicate how many cattle were present, as their density might influence the CO2 balance as well.
L 161, measured fluxes were averaged every 10 minutes. This needs more expansion
Section 2.3.2 The previous section describes in great detail how different fields were isolated based on turbulence wind sector analysis, which included some areas such as shading by the anemometer, etc, but in this section the first scenario RF is phrased as if all sectors were now included. But shouldn’t at least the ditch and the tower shaded area be excluded from this model? Perhaps this is what was meant here. Further explanation is needed.
Results
Section 3.1 The manuscript states that vegetation type is included in the global RF model via a qualitative variable (Scenario 1), but this variable is not clearly defined in the predictor list. Clarification is needed, which should be included in the methods section 2.3.3.
Line 369: The term maximum measured productivity could be misleading, please use the standard NEE instead. Further, since this is mentioned for Scenario 1, it should also be mentioned for Scenario 2, as the goal is to compare their performance.
Line 373: sentence starting with “However, “ do you mean here actually observed in the field, or in a dataset. If dataset, which one is referenced here?
Line 377: less productive is again a bit misleading, perhaps better “reduces photosynthetic activity”
Section 3.3 Given the noise and non‑stationarity of EC data, particularly during disturbance and post‑harvest periods, the absence of uncertainty quantification limits interpretation and makes it difficult to assess the significance of differences between vegetation types and modelling strategies.
Line 422 The description of cumulative NEEdaytime is confusing and potentially misleading. The reported “peak” value refers to a cumulative minimum rather than a flux, and the shift from -844 to -764 gC m⁻² indicates a reversal toward daytime carbon release after harvest, not merely a slowdown in accumulation. Moreover, interpreting daytime NEE as productivity or photosynthetic uptake is conceptually incorrect without flux partitioning.
Section 3.4 Again for this section including uncertainties is very important in order to distinguish between sinks and sources.
Discussion
Line 509 The interpretation that reduced model performance during storm periods is “not problematic” because the RFs do not reproduce “physically unrealistic fluxes” is not justified. Failure of a Random Forest model to predict certain flux values indicates that these values are not explainable by the chosen predictors, but does not demonstrate that they are artefacts rather than genuine storm‑driven ecosystem responses or unmodelled processes. Using model behaviour to infer data validity is circular and should be avoided. This passage would benefit from substantially more cautious language and explicit acknowledgement that these periods represent increased uncertainty rather than invalid fluxes.
Line 516 This should be noted in the methods, that this period was excluded from cum sum evaluations in section 2.4. Further description is also needed regarding how the models were run with their exclusion.
Line 521 The discussion of static and automated chamber measurements is confusing, as no such data are presented or used in this study. While chamber studies from the literature are cited to illustrate potential agreement with EC fluxes, this does not constitute validation of the EC measurements or RF model applied here. This section should be removed or rephrased.
Line 546 I’m not sure that Reichstein et al. 2005 is an appropriate citation here, since the focus is on riparian zones and higher soil moisture contents.
Line 596 The term “behaved” is imprecise and anthropomorphic when referring to a winter wheat carbon balance.
Line 596 the harvest data should be introduced in the results section.
Line 604 this sentence is very hard to understand and should be broken up into different sections. Further, it is hard to see the connection to the present study, when the measurement timeframes differ by at least 5 years. Further, different flux pathways, spatial domains, and time scales are mixed without clear signposting, giving the impression of a quantitative comparison with cumulative NEE that is not actually performed. Clearer framing is needed to distinguish contextual background from results included in the present carbon balance.
Conclusion
The concluding claim that the approach provides a “scalable template for integration into larger‑scale carbon inventories” appears overstated, as no spatial or temporal scaling is demonstrated and key components of full carbon accounting are acknowledged to be incomplete. This statement would benefit from more cautious, prospective wording.
Citation: https://doi.org/10.5194/egusphere-2026-524-RC2 -
CC1: 'Comment on egusphere-2026-524', xiangmin Sun, 24 Apr 2026
This study measured CO2 fluxes from one eddy covariance tower over a heterogeneous agricultural landscape, and separated fluxes from each sector based on wind direction and the associated radians. These CO2 fluxes, labeled by each land use, contained many gaps and were filled using a random forest algorithm for annual cumulative evaluation. A few of my questions related to eddy covariance data are:
Using a 10-minute averaging period for eddy covariance introduces random error, as it may fail to capture low-frequency turbulence. You may need to run an ogive analysis and a stationarity test for verification. In line 239, you mentioned that wind direction changes quickly. Can you validate based on your wind data?
Could you refer any study that used the u*/u to delineate or separate the sectors? I would suggest to include a brief interpretation of “u*/u” in the introduction section.
The ditch is very close to the tower and could be negligible. Have you investigated the along-wind distance providing the highest (peak) contribution x_peak and x_10% relative to turbulent fluxes? Footprint analysis along wind directions at different percentages can help you estimate whether the ditch’s contribution is negligible.
EC measurements have limitations, including energy budget closure, equipment costs, and technical challenges in data processing. But I am inclined to argue that short gaps during maintenance and rainfall are the limitations, as these gaps are trivial. The assumption of EC measurements is homogeneous field across the fetch; thus EC is not applicable to sites with high footprint heterogeneity. This is not a limitation of EC measurement itself; the challenge is separating the footprints for this complicated mosaic.
Since data availability is 39% (line 235) for your site. Thus, each land cover data will mostly be less than 20%. While the meteorological data is measured from the tower, not from each individual sector, the meteorological data carries the same gaps as the CO2 flux. How do you deal with gaps in meteorological variables from each sector/land use type?
Some minor questions are listed below for the first half of this manuscript. A detailed proofreading is recommended before you resubmit.
Line 43-44: I am not sure I understand the sentence. From my understanding, conventional agricultural practices, such as intensive tillage and heavy reliance on synthetic inputs, can lead to soil degradation and increased greenhouse gas emissions. On the other hand, sustainable farming practices, such as no-till farming and leaving crop residue (stalks, leaves) on the soil surface, can reduce soil erosion and boost organic matter/soil carbon. Residue retention can potentially reduce greenhouse gas emissions compared to traditional residue removal or burning.
Line 51-54: seasonal uptake of “carbon”? Forgot to add “carbon” before the “uptake”? What are “no-harvest practices”? Cover crop? Can you add some specifics? Maybe add a “comma ,” before “and local environment”? Does “social-economic factors determining land-use intensity” refer to “farming common practices”?
Line 53: Maybe add “response” after “fast”, or rephrase it to “high temporal frequency anemometer ”. It should be “gas”, not “gaz”.
Line 63: Could you be more specific about “dozens of”? There should be more representative articles besides “Baldicchi, 2020” to refer to for the developments in flux research.
Line 73: Any reason for the abbreviation of Q10? Scenario when the temperature increases by 10 degrees.
Line 105: not sure of the meaning of “retro-littoral”.
Line 114-115: If the meteorological and soil variables are measured at the tower location, you need to make sure they are representative of all three fields. Are these variables homogeneous among the three fields? Since soil cannot be measured at the tower, you may add “spot” or “location” after “the tower”.
Line 115: Not sure how many percent of gaps were filled for each parcel? I assume each site might have more than 50% gaps.
Line 128: “with 62% of its surface spent to …”, this expression seems not a common phrase.
Line 145: Please check the grammar of this sentence.
Figure 1: The left map can be smaller, showing the general location. Please mark the study site in a more visible and clear way in the left map. It might be helpful to add the wind rose map in this figure.
Line 153: the eddies might not be “homogeneous”, with different frequencies and rotation sizes.
Line 154-161: the basic principle equation of eddy covariance measurement is common in textbook, and can be skipped for journal paper. In addition, the equation is not numbered and not correct technically. Air density should be dry air density. There should be separate parts, bar over rho, and bar over (w’ and s’).
Line 177: I am afraid that I can’t see the rainfall sensor in Figure 2.
Line 205: for more details of data processing protocol?
Figure 3: Note the meaning of the dashed red lines in the figure.
Line 224: usually the riparian corridor and the ditch is very close to the tower and should not be the major contributor. Did you check the flux footprint and the flux peak distance?
Citation: https://doi.org/10.5194/egusphere-2026-524-CC1 -
RC3: 'Comment on egusphere-2026-524', Anonymous Referee #3, 30 Apr 2026
The manuscript by Pery et al. presents an interesting approach to gap-fill CO2 flux densities based on wind-sector partitioning, to account for field heterogeneity at the experimental site. This method employs the same set of predictors, while separating the input data to the model in wind sectors and gap-filling them separately. While the method is sound and it has potential to improve NEE estimates over heterogeneous sites, without the costs and logistic difficulties associated to the deployment of several EC stations, there are some hindrances in this approach and how it was validated and presented. Therefore, I would recommend major revisions before this manuscript is accepted for publication.
Major comments
1.- Were meteorological data gap-filled? How did you deal with these data for the specific wind sectors and periods? Please provide all these details as they are very relevant for the RF parametrization.
2.- Using 10-min resolution for processing high-frequency data would require stronger justification; heterogeneity itself could be one, as rapidly shifting wind conditions could drive measuring over different sectors and land covers, but this can come with hindrances that should be accounted for, unless it is adequately demonstrated.
3.- In the results section, the figures should be presented following a logical order and in a more complete but concise way. Currently, this section is a bit unorganized and mentions to Figures 5 or 6 are included before any explanation of Figure 4. Please try to re-write this to help readers to follow the story.
4.- Perhaps the RF could benefit from adding time cycle variables, for day/night and for seasons. This was implemented e.g. in Vekuri et al. (2023), using cyclical indicators day-night and for months of the year, as well as a linear indicator of the day of year. Maybe this removes in an easier way the uncertainty in gap filling related to changes in the ecosystem properties due to management events such as harvest.
5.- The inclusion of some events, like the November storm, could be interesting if these specific phenomena would be studied. However, as this is not the scope of the text and this can bias the results of the modeling, I would suggest to either remove it, or to address the specific effect it may have on the gap-filled data. Furthermore, it the storm was relevant for CO2, but no physiological activity was taking place at the moment, could it be an effect of a bad performance of the analyzer which then justifies to remove it from the time series? Or which explanation did you find for it?
6.- The manuscript could benefit from adding the partitioning of NEE into GPP and RECO using either Reichstein et al. (2005) or Lasslop et al. (2010), or both. This would allow to skip the separation into nighttime and daytime NEE. These models could be applied to the filled data for the whole dataset and for the different wind sectors and periods. Please also be careful with naming daytime NEE as productivity or production, because this is the net CO2 flux density and not photosynthesis.
7.- Carbon budgets usually account for terms like carbon export through harvest, carbon input through organic fertilizer, etc. The most important are typically NEE and carbon export. Please avoid the term "carbon budget" throughout the text if no other lateral C transport is considered. Additionally, if there are data on C export through harvest, they should be presented in the Results section, together with annual NEE sums, and then discussed appropriately later on. If no other terms are considered, then also the title of the manuscript should change to avoid confusion.
8.- The results and method described in the paper could be put in context with more literature describing how to address heterogeneity effects over EC sites. There are plenty of studies using either footprint modeling (e.g. Chen et al., 2011) or multiple towers (e.g. Oren et al., 2006), or a distinct approach (e.g. Wang et al., 2016). Furthermore, related to a comment in the conclusions, (line 632), if the focus is the cost of the EC stations, there have been in recent years developments of lower-cost eddy covariance systems; some studies validated the systems (Hill et al., 2017; van Ramshorst et al., 2024; Rannik et al., 2026) and some focused on their application to improve spatial representativity of measurements (Cunliffe et al., 2022; Callejas-Rodelas et al., 2025). The text could put this also in context.
8.- The manuscript would benefit from re-wording many sentences and sections for technical completeness and coherence, particularly in the Methods section. Some of these comments are included in what follows, but they do not cover the whole text.
Minor comments
1.- Section 2.2.2 would benefit from a more clear structure and presenting more details about the EC data processing; particularly, as this is a paper presenting a novel method, all important details in data processing are necessary. For instance, which method did you apply for high- and low-frequency spectral losses?
2.- Line 154: Please try to write the description of EC method in a more correct way, such as "This method measured vertical wind speed and molar density or fraction of the gas of interest at high frequency. The flux density is then derived by calculating the covariance between w and s, and multiplied by air density".
3.- Line 161: Please rewrite - Fluxes are not measured directly, but calculated from high-frequency measurements.
4.- Line 183: "to ensure accurate CO2 flux measurements" - this sentence is not necessary. Other software (EddiSoft, EddyUH, etc.) also produce standardized and similarly accepted calculations; furthermore, the fluxes are not measurements, but calculations from raw measurements.
5.- Line 194: "calculated using the linear detrending method" - this is not a method for calculating fluxes, but just one of the steps along the processing routine. Please rewrite this.
6.- Line 196 to line 204: Please try to re-organize this section; it is not conceptually wrong, but maybe it would help for clarity to organize the different filters that were applied. "After flux processing, data were filtered to remove outliers using MAD (explain), to exlucde periods with weak or non-existent turbulence (ustar filter, explain) and to ()".
7.- Section 2.3.1: Please include information on the footprint model. How were footprints calculated? Did you parametrize something specific, like aerodynamic canopy height (as in Chu et al., 2018)?
8.- Line 227: Do you have an explanation why this happens? Is this the case for all these anemometers? Please justify it.
9.- Lines 248 to 252: this should be discussed in Discussion; other methods could complement this, like footprint modeling with land use/cover attribution.
10.- Lines 254-257: this paragraph belongs to discussion or conclusions.
11.- Line 266: "key temporal subdivision" - do you mean the separate models were applied to wind sector and specific periods? Please make this more clear in this sentence.
12.- Line 267: "Specifically, for each crop parcel ..." -- this should be commented in Discussion in the context of whether this is accurate also for several years and how the transition from senescence or dormancy to growing season... by daily sum of NEE, by meteorological inputs, by phenology, etc. Generally, crops' carbon sink during growing season starts to decline before harvest, as crop ripes, therefore it could have an impact on when and how the growing season is defined.
13.- Line 280: How was the qualitative day-night indicator defined? And why was it qualitative? Couldn´t it be better to define day or night based on, e.g., radiation? Please clarify this.
14.- Line 306: "This evaluation framework..." - this sentence is used in similar forms in other subsections in Methods; maybe it could be avoided or just write it once in discussion to summarize why this approach was adopted.
15.- Line 310: "Predictions were made ..." - this is not necessary, as it was clear from before that the time resolution is 10 min, and the units of CO2 flux densities.
16.- Line 311: The factor 0.0072 comes from the number of seconds in a year, multiplied by molar mass of carbon and the conversion from umol to mol. However, either this should be clarified for unexperienced readers or be left out as it is common knowledge in EC studies and it is just a simple calculation.
17.- Line 314: Please rewrite the whole paragraph for clarity.
18.- Line 326-329: This paragraph could be in Methods.
19.- Line 342: particularly at small flux values.
20.- Line 343: what is meant by unmodeled management practices? Please clarify. As I understand it, no specific management was introduced in the RF models, just the predictor variables and the classification in wind sectors and periods.
21.- Line 368: This is only a particular 30-min period, so it shouldn´t be taken as the example; furthermore, a flux density of -74 umol m-2 s-1 seems too large. Looking at the Figure, aren´t these values occurring in the hybrid parcel? Maybe it is my mistake, but I do not see these values happening in the wheat parcel, but in the hybrid.
22.- Line 504: "This lower performance...". Why is this happening? It shouldn´t be the case.
23.- Line 512: if there was noise introduced by some fluxes that were not filtered, why not trying a different filtering approach, maybe more strict, or using e.g. hard limits?
24.- Line 516: from this sentence it is not clear whether the spurious values were included or not in the dataset that was used to train the model. Please clarify.
25.- Line 576: The annual NEE sums (not carbon budgets unless carbon export was considered, as commented previously) should be reported in results.
26.- Line 514: maybe this table could go in Appendix, as it is just a reference for literature values.
27.- Generally, the past tense is used for Methods and Results, and partially on Discussion when referring to results. Although this is an election from the author, I would recommend to check this for consistency across the manuscript.
Technical corrections
Line 37: remove "as".
Line 42: suggestion to re-write as "as both sink or sources of atmospheric CO2".
Line 78: meteorological.
Line 93: write "a RF-based" instead of "an RF-based" as RF is read as "random forest".
Line 154: These eddies occur/are located.
Line 172: ... includes the following sensors positioned at the top of a 3-m tower mast (Fig. 2b): an ultrasonic anemometer...
Line 175: The station also included additional sensors...
Line 261: The first strategy (Scenario 1) involved...
Line 265: In the second strategy (Scenario 2) separate RF models were trained for...
Line 354: Box in red represents November's storm.
Line 400: Please avoid statements like "clear".
REFERENCES
Chen, B., Coops, N. C., Fu, D., Margolis, H. A., Amiro, B. D., Barr, A. G., Black, T. A., Arain, M. A., Bourque, C. P.-A., Flanagan, L. B., Lafleur, P. M., McCaughey, J. H., and Wofsy, S. C.: Assessing Eddy-Covariance Flux Tower Location Bias across the Fluxnet-Canada Research Network Based on Remote Sensing and Footprint Modelling, Agricultural and Forest Meteorology, 151, 87–100, https://doi.org/10.1016/j.agrformet.2010.09.005, 2011.
Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N., Gilmanov, T., Granier, A., Grünwald, T., Havránková, K., Ilvesniemi, H., Janous, D., Knohl, A., Laurila, T., Lohila, A., Loustau, D., Matteucci, G., Meyers, T., Miglietta, F., Ourcival, J.-M., Pumpanen, J., Rambal, S., Rotenberg, E., Sanz, M., Tenhunen, J., Seufert, G., Vaccari, F., Vesala, T., Yakir, D., and Valentini, R.: On the Separation of Net Ecosystem Exchange into Assimilation and Ecosystem Respiration: Review and Improved Algorithm, Global Change Biology, 11, 1424–1439, https://doi.org/10.1111/j.13652486.2005.001002.x, 2005.
Levy, P., Drewer, J., Jammet, M., Leeson, S., Friborg, T., Skiba, U., and Oijen, M. V.: Inference of Spatial Heterogeneity in Surface Fluxes from Eddy Covariance Data: A Case Study from a Subarctic Mire Ecosystem, Agr. Forest Meteorol., 280, 107783, https://doi.org/10.1016/j.agrformet.2019.107783, 2020.
Wang, H., Jia, G., Zhang, A., and Miao, C.: Assessment of Spatial Representativeness of Eddy Covariance Flux Data from Flux Tower to Regional Grid, Remote Sens., 8, 742, https://doi.org/10.3390/rs8090742, 2016.
Lasslop, G., Reichstein, M., Papale, D., Richardson, A. D., Arneth, A., Barr, A., Stoy, P., and Wohlfahrt, G.: Separation of Net Ecosystem Exchange into Assimilation and Respiration Using a Light Response Curve Approach: Critical Issues and Global Evaluation, Global Change Biology, 16, 187–208, https://doi.org/10.1111/j.1365-2486.2009.02041.x, 2010.
Chu, H., Baldocchi, D. D., Poindexter, C., Abraha, M., Desai, A. R., Bohrer, G., Arain, M. A., Griffis, T., Blanken, P. D., O’Halloran, T. L., Thomas, R. Q., Zhang, Q., Burns, S. P., Frank, J. M., Christian, D., Brown, S., Black, T. A., Gough, C. M., Law, B. E., Lee, X., Chen, J., Reed, D. E., Massman, W. J., Clark, K., Hatfield, J., Prueger, J., Bracho, R., Baker, J. M., and Martin, T. A.: Temporal Dynamics of Aerodynamic Canopy Height Derived From Eddy Covariance Momentum Flux Data Across North American Flux Networks, Geophys. Res. Lett., 45, 9275-9287, https://doi.org/10.1029/2018GL079306, 2018.
Cunliffe, A. M., Boschetti, F., Clement, R., Sitch, S., Anderson, K., Duman, T., Zhu, S., Schlumpf, M., Litvak, M. E., Brazier, R. E., and Hill, T. C.: Strong Correspondence in Evapotranspiration and Carbon Dioxide Fluxes Between Different Eddy Covariance Systems Enables Quantification of Landscape Heterogeneity in Dryland Fluxes, J. Geophys. Res.-Biogeo., 127, e2021JG006240, https://doi.org/10.1029/2021JG006240, 2022.
Callejas-Rodelas, J. Á., Knohl, A., Mammarella, I., Vesala, T., Peltola, O., and Markwitz, C.: Does increased spatial replica
tion above heterogeneous agroforestry improve the representativeness of eddy covariance measurements?, Biogeosciences, 22,
4507–4529, https://doi.org/10.5194/bg-22-4507-2025, 2025.Vekuri, H., Tuovinen, J.-P., Kulmala, L., Papale, D., Kolari, P., Aurela, M., Laurila, T., Liski, J., and Lohila, A.: A Widely Used Eddy Covariance Gap-Filling Method Creates Systematic Bias in Carbon Balance Estimates, Scientific Reports, 13, 1720, https://doi.org/10.1038/s41598-023-28827-2, 2023.
van Ramshorst, J. G. V., Knohl, A., Callejas-Rodelas, J. Á., Clement, R., Hill, T. C., Siebicke, L., and Markwitz, C.: Lower cost eddy covariance for CO2 and H2O fluxes over grassland and agroforestry, Atmospheric Measurement Techniques, 17, 6047-6071, https://doi.org/10.5194/amt-17-6047-2024, 2024.
Hill, T., Chocholek, M., and Clement, R.: The Case for Increasing the Statistical Power of Eddy Covariance Ecosystem Studies: Why, Where and How?, Global Change Biology, 23, 2154–2165, https://doi.org/10.1111/gcb.13547, 2017.
Rannik et al., Good performance of low-cost carbon dioxide sensor based on
intercomparisons with the standard eddy-covariance system, preprint, https://doi.org/10.5194/egusphere-2026-144.Citation: https://doi.org/10.5194/egusphere-2026-524-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 314 | 152 | 23 | 489 | 64 | 69 |
- HTML: 314
- PDF: 152
- XML: 23
- Total: 489
- BibTeX: 64
- EndNote: 69
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a novel gap-filling method for CO₂ fluxes measured by a single eddy covariance tower located in the midlle of four fields with different vegetation types. The proposed approach relies on machine learning techniques, specifically a random forest algorithm, to reconstruct fluxes when wind direction does not originate from the target area. The random forest model is trained using a set of environmental variables, including temperature, radiation, soil moisture, as well as information related to crop management practices and the day of year. This approach shows strong potential for reconstructing CO₂ fluxes, as well as energy fluxes, in heterogeneous landscapes using measurements from a single flux tower, and is therefore of great interest for extending the spatial representayiveness of Eddy Covariance.
However, although the method is promising and deserves particular attention, the manuscript would benefit from more details regarding the technicality of the method. It would in particular benefit from the computation of the uncertainties in reconstructed/gap-filled CO2 flux in each wind sector. This woul allow to assess the statistical significance of the differences in CO2 fluxes observed in each wind sector (specific to a vegetation/soil type). The authors could also try to use spatially explicit data to evaluate if the observed CO2 flux dynamics is sound. The use of NDVI or EVI from Sentinel data may be of interest. Due to the lack in uncertainty and signficance analysis, in its current form, the discussion remains somewhat speculative regarding the reconstructed CO₂ fluxes for the plots surrounding the tower.
I would therefore recomend major revision by integrating uncertainties and significance analysis, before interpreting and discussing potential differences between crop types.
Specific comments:
L155-164: in equation ρ should be "dry" air density (ρd). and there is a conversion from g to mol missing . Also give units of w and s.
L190: why using a time delay with colocated CO2/H2O and sonic instruments?
L200: the u* threshold is usually adapted to local conditions. Jusitify the use of a single value.
Figure 3 (L219): you could select only neutral conditions here to have a closer relationship with the roughness z0.
L244-250: this is somewhat redundant with previous paragraph.
L265-270: the temporal subdivision, and all parameters in the RF appraoch, should be explicited in a table (in appendix eventually).
L280: could you explicit what is a qualitative day-night indicator?
L282: PAR also captures seasonal changes in radiation regimes? Is this not redundant?
Section 2.3.3 : in the predictive model, a measure of the leaf area index would be very valuable and would integrate spatial variations. I would recomend trying using an NDVI or EVI high resolution satellite product (from Sentinelle data).
Secion 2.3.4: this is the critical point in this manuscript. I would suggest the authors to compute an uncertainty for the RF reconstructed CO2 fluxes. This would be necessary to interpret the results and determine if significant differences can be retreived from each wind sector. I am not an expert in RF modelling but I guess some uncertainty could be computed. At least from the comparison between model and observed data uncertainties could be deduced.
L314-318: I would recommend avoiding any use of GPP here as NEEday is not GPP. This paragraph could be rephrased to clarify that NEE was simply split between day and night
3.1 section: The Figures 7 and 8 would benefit from showing the difference between the modelled and observed CO2 flux rather than each flux on top of each other. Actually the two may be shown : the modelled fluxes and the difference with observations. This graph should include uncertainties in the modelled fluxes.
The rest of the results and the discussion provides valuable arguments but would really need to have uncertainties to evaluate if the differences discussed are significant or not, especially when comparing annual budgets.