Exploring Alternative SMAP Level-4 Carbon Model Formulations for the North American Arctic&ndash;Subarctic Growing Season

Madelon, Rémi; Endsley, K. Arthur; Kimball, John S.; De Lannoy, Gabriëlle J. M.; Sonnentag, Oliver; Alcock, Haley; Mavrovic, Alex; Williamson, Scott N.; Maire, Vincent; Mialon, Arnaud; Roy, Alexandre

doi:10.5194/egusphere-2026-720

Preprints

https://doi.org/10.5194/egusphere-2026-720

Preprints

16 Feb 2026

| 16 Feb 2026

Exploring Alternative SMAP Level-4 Carbon Model Formulations for the North American Arctic–Subarctic Growing Season

Rémi Madelon, K. Arthur Endsley, John S. Kimball, Gabriëlle J. M. De Lannoy, Oliver Sonnentag, Haley Alcock, Alex Mavrovic, Scott N. Williamson, Vincent Maire, Arnaud Mialon, and Alexandre Roy

Abstract. The Soil Moisture Active Passive Level-4 Terrestrial Carbon Flux model (hereafter referred to as the L4C model) provides daily estimates of net ecosystem CO₂ exchange (NEE), gross primary production (GPP), and ecosystem respiration (ER) at a global scale. The model is based on direct mechanistic forcing–response relationships between CO₂ fluxes and energy proxies (absorbed photosynthetically active radiation and temperature) and moisture proxies (soil moisture and vapor pressure deficit). Although the L4C model aims to provide a representative estimation of the CO₂ budget of Arctic and Subarctic (AS) environments, a deeper understanding of carbon cycle processes and targeted refinements are needed to improve its accuracy. In this study, alternative model formulations are proposed for the North American AS regions during the growing season. These formulations are calibrated and evaluated using NEE-derived GPP and ER from 20 eddy covariance towers across western Canada and Alaska, covering the period from 2015 to 2022. Refinements in the representation of energy proxies resulted in greater improvements in model performance than adjustments to moisture proxies. Specifically, implementing a light-response curve in GPP estimation reduced unbiased root mean squared error and bias, while incorporating growing degree days improved correlation. Adjustments to rootzone and surface soil moisture in GPP and ER estimation, respectively, did not yield conclusive performance improvements. Vapor pressure deficit showed limited importance as a driver of GPP in upland tundra and wetlands, whereas it had a stronger impact in taiga forests. Finally, the litterfall scheme used to represent SOC dynamics in the L4C ER model formulation in version 8 demonstrated improved performance relative to version 7. These results highlight opportunities to enhance the accuracy of the L4C model for the North American AS growing season but also underscores the need for further research on ER modeling.

Received: 06 Feb 2026 – Discussion started: 16 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Rémi Madelon, K. Arthur Endsley, John S. Kimball, Gabriëlle J. M. De Lannoy, Oliver Sonnentag, Haley Alcock, Alex Mavrovic, Scott N. Williamson, Vincent Maire, Arnaud Mialon, and Alexandre Roy

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-720', Anonymous Referee #1, 29 Mar 2026
This paper by Madelon et al. tests whether adjusting, how SMAP L4C model responds to light, temperature, soil moisture, and vapor pressure deficit can improve GPP and ER estimates across Arctic-Subarctic tundra, taiga forests, and wetlands in North America. Twenty eddy covariance towers across western Canada and Alaska provide the reference data, covering 2015 to 2022. The study is well-timed, and the incremental formulation design is genuinely useful because it lets you see what each modification actually does, rather than attributing a combined effect to several changes at once. The finding that energy proxies, specifically the nonlinear light-response and growing degree days, consistently outperform moisture adjustments across all three ecosystem types is a clean and practically useful result. The discussion in section 6.5, where the authors frankly acknowledge the circular relationship between flux-partitioning methods and model calibration targets, is one of the stronger parts of the paper and a point that rarely gets enough attention in similar work. That said, several issues need to be addressed before the manuscript is ready for publication. The following comments are for further improvement.

Major Comments
the evaluation in Tables 4 and 5 uses all available data, including data the model was calibrated on. The 100-iteration resampling with 70% subsets reduces but does not solve this problem. There is no truly held-out test set anywhere in the study. For simpler formulations this is less of a concern, but GPP5 has 9 free parameters and ER4 has 6, both substantially more than their baselines, and the reported performance improvements for these models cannot be clearly attributed to better generalization rather than better calibration fit. The authors should either implement a proper train/test split, for example by withholding complete site-years per ecosystem type or provide a much more explicit and quantified discussion of overfitting risk for the higher-complexity formulations.

There is a deeper conceptual issue worth flagging. The entire calibration and evaluation framework assumes that GPPEC and EREC from flux partitioning are reliable targets. But these quantities are not measured; they are modeled from NEE using methods that themselves rely on temperature and light as primary drivers. This means the AS-adapted formulations are being trained to reproduce outputs of algorithms that share some of the same structural assumptions as the L4C model itself. The strong performance of APAR and GDD adjustments may partly reflect this shared structure rather than genuine independent improvement. The authors touch on this in section 6.5, but do not go far enough. It is worth asking openly whether the model is getting better at capturing carbon dynamics or simply getting better at agreeing with a partitioning algorithm it already resembles.

Equation 9c appears to contain a typographical error. The denominator of the SRZSM logistic ramp reads g(MNTmin), which looks like a copy-paste error from the SMNT equation directly above it. It should presumably read g(RZSMmin). This needs to be verified and corrected, since it directly affects reproducibility of the RZSM formulation in GPP3.

The scoring system in Equation 14 is difficult to defend as currently described. Ranks are used instead of raw metric values, which throws away information about how much better one formulation is relative to another. A formulation that narrowly beats another gets the same rank benefit as one that beats it by a wide margin. The penalty factor is a simple linear ratio of parameter counts applied as a multiplier on the average rank, with no comparison to established model selection criteria such as AIC or BIC. The authors should either justify this design explicitly or test whether the final formulation selection changes under alternative scoring approaches.

For wetlands, the temporal correlation of NEE drops from 0.50 in the original L4C model to 0.33 in the AS-adapted formulation. This is not a small degradation. It means the adapted model tracks wetland carbon exchange less accurately in time than the model it is supposed to improve. The authors mention error compensation in section 6.5 but do not examine which sites or years drive this result, or whether the scoring-based selection of GPP4 and ER3 for wetlands is itself contributing to the problem. This finding needs a dedicated discussion, not a brief mention.

ER1 and ER2 use a single SOC pool rather than the original three, because reference SOC data were not available for recalibration. This is a reasonable practical decision but it means the comparison between ER1/ER2 and ERL4C is not a clean test of the Lfall allocation scheme. It simultaneously tests a different pool structure. The authors should acknowledge this confounding more clearly in the methods and in the discussion of section 6.2.

GDD is normalized annually per site using each year's own minimum and maximum values. For data points early in the growing season, this normalization requires information from later in the same year. The model's seasonal shape scalar for April implicitly uses what happened in August. The authors should clarify whether this creates an information leakage issue in the evaluation and whether a climatological or cross-year normalization was considered.

GPP and ER formulations are selected independently based on their individual performance scores and then combined to produce NEE. But minimizing GPP error and minimizing ER error separately does not guarantee minimizing NEE error, since the two error terms can be correlated or offsetting. The authors acknowledge this briefly in section 6.5 but do not test whether selecting formulations based directly on NEE performance would produce different combinations or better results, particularly for wetlands where the current approach visibly underperforms.

Minor Comments
The term "AT" appears in section 6.4 referring to air temperature but is not defined in the abbreviation table and does not appear anywhere else in the paper. The manuscript consistently uses MNT for minimum air temperature throughout. This should be made consistent.

Figure 8 is cited repeatedly in sections 5.1 and 5.2 without specifying which panel is being referred to. With six subpanels across three ecosystem types, the reader is left guessing. Panel-level citations, for example Figure 8A1 or Figure 8B2, would help significantly.

line 452. "does not provides any benefits neither" should read "does not provide any benefits either."

the 10th percentile threshold used to exclude shoulder-season data in section 4.2 is described by the authors themselves as arbitrary. A brief note on sensitivity to this threshold, or at least a reference to comparable choices in prior work, would add confidence.

The description of EC flux-partitioning methods across lines 97 to 111 runs quite long for a modeling paper. Most of this is standard material. Trimming it or moving essential detail to supplementary information would improve the flow of the introduction.

the justification for using MERRA-2 instead of GEOS-5 FP for VPD and MNT is reasonable, but a brief confirmation that the two products agree closely at the study sites would close a potential concern about whether this substitution introduces systematic differences between the AS-adapted models and the operational L4C configuration.

The abbreviation table is a helpful addition given the notation density of the paper, but "AT" and "SOC pool" structure (labile, structural, recalcitrant) are referred to in the text without full entries in the table. A quick check for completeness would be worthwhile.
Citation: https://doi.org/10.5194/egusphere-2026-720-RC1
- AC1: 'Reply on RC1', Rémi Madelon, 12 May 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-720/egusphere-2026-720-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-720-AC1
RC2:
'Comment on egusphere-2026-720', Anonymous Referee #2, 22 Apr 2026

I have reviewed the manuscript ‘Exploring alternative SMAP Level-4 Carbon Model Formulations for the North American Arctic-Subarctic Growing Season’. This work explored several different formulations to predict GPP and ER using data from Eddy Covariance (EC) measurements, evaluated performance of different formulations, and made comparison among different formulations as well as with the ‘baseline’ from SMAP L4C model. After the comparison, the authors found the best model formulation by plant functional type, and discussed and pointed out directions to improve GPP and ER predictions with SMAP data.
Major comments
The manuscript is in general well-written with good details and clarity. The discussion is also helpful for the community by giving directions on optimizing formulations to improve prediction of GPP, ER, and NEE based on this study and literature from the field.
However, a major concern I have, which is not addressed or mentioned at all, is the spatial-temporal auto-correlation. The work is based on the assumption that one or multiple proposed model formulations showed improved accuracy compared to SMAP L4C product. The evaluation results indeed support this assumption. However, it seems the training and evaluation data were randomly spitted as 70%:30% for training:testing. Given that the authors take each unique combination of EC measurement and day as a data point (Lines 110-111), there is high spatial and temporal auto-correlation between the training and evaluation dataset. Because of this, the output accuracy metrics are impacted by this auto-correlation, suggesting the accuracy we are seeing is more like training accuracy rather than testing accuracy. For our own work, I ever saw testing accuracy dropped from R² = 0.72 to R²= 0.15 after removing the impact of auto-correlation. This suggests the improved accuracy metrics from the proposed model formulations might be a result of this auto-correlation rather than real model improvement compared to the SMAP L4C baseline. The authors need to clarify this.
It is also unclear what data were used to evaluate and compare the NEE estimation between NEE_L4C and NEE_AS(Table 6). The author mentioned in Lines 350-351 without specifying what data was used. The 70%:30% split approach was used to optimize model parameters of the proposed formulations, after which there seems no independent data left to evaluate NEE with optimized GPP and ER formulations.
I encourage the authors to provide line number for each line. Also there are too many abbreviations making the paper difficult to follow. Are abbreviations like B for bias and L_fallfor Litterfall necessary?
Below I provide detailed comments.
Methods
Lines 348-350 fit better for Section 4.3 Model formulation calibration. I was wondering how NEE was calculated when reading 4.3
Lines 341 equation 14, the n_min should be a fixed number based on the description, please specify the number.
Lines 345-346, the authors quickly mentioned the evaluation on temporal performance without enough details, is the median across EC tower were used for Figure 8? Also the authors were mentioning spatio-temporal performance all the time and the temporal performance several times in results and discussions, but the whole manuscript did provide any temporal dynamics of the GPP/ER/NEE predictions? This type of figure would greatly help readers to understand the research.
Results
Lines 354 – 355: are these two lines necessary given the section titles like 5.1, and 5.1.1-5.1.3 are making the content pretty clear already.
Discussions
Lines 509-512: I believe you need to add references there to support your discussion and the following recommendation.
Lines 523-525: you need to add references as the comparison between V8 and V7 is apparently not from this work
Lines 527-529: not sure why aboveground biomass (AGB) is mentioned here, as the whole paper didn’t give any context to discuss about AGB.
Lines 534-535: the writing is confusing and contradicts the Table 4, as Table 4 clearly suggests GPP3 is better than GPP2. I understand the difference of metrics are not that big, but the authors kind of rely on those small differences to pick up the better formulation between GPP4 and GPP5.
Lines 537-538: the writing ‘but the added value may not justify the increased complexity required to implement this adjustment’ is not straightforward, are you trying to see the model is too complicated and over-fitted? I suggest the authors to increase clarity and conciseness throughout the paper. Another example where the clarity can be improved is in Lines 549-550: ‘clear evidence is lacking to suggest that GPP in wetlands does not exhibit diminishing returns under high RZSM conditions’
Conclusions
Lines 630-634, the importance of winter and should seasons is only mentioned in Conclusions, which is not good writing practice. I recommend authors to add this into the section 6.5 Limitations to acknowledge this research didn’t focus on these seasons. The authors then can briefly mention this in Conclusions.

Citation: https://doi.org/10.5194/egusphere-2026-720-RC2
- AC2: 'Reply on RC2', Rémi Madelon, 12 May 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-720/egusphere-2026-720-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-720-AC2

Rémi Madelon, K. Arthur Endsley, John S. Kimball, Gabriëlle J. M. De Lannoy, Oliver Sonnentag, Haley Alcock, Alex Mavrovic, Scott N. Williamson, Vincent Maire, Arnaud Mialon, and Alexandre Roy

Viewed

Total article views: 1,341 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
786	481	74	1,341	130	222

HTML: 786
PDF: 481
XML: 74
Total: 1,341
BibTeX: 130
EndNote: 222

Views and downloads (calculated since 16 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	229	202	43	474
Mar 2026	424	200	17	641
Apr 2026	90	49	5	144
May 2026	39	28	9	76
Jun 2026	4	2	0	6

Cumulative views and downloads (calculated since 16 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	229	202	43	474
Mar 2026	424	200	17	641
Apr 2026	90	49	5	144
May 2026	39	28	9	76
Jun 2026	4	2	0	6

Viewed (geographical distribution)

Total article views: 1,362 (including HTML, PDF, and XML) Thereof 1,362 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 04 Jun 2026

Short summary

This study aims to improve estimates of carbon dioxide release and uptake in the North American Arctic and subarctic regions. Several modeling approaches were tested, showing that a better representation of sunlight and temperature effects on ecosystems leads to improved estimates. This work provides new perspectives to better assess whether these regions act as sources or sinks of greenhouse gases and how they may influence the climate system by amplifying or slowing global warming.


Total:	0
HTML:	0
PDF:	0
XML:	0