Spatiotemporal dynamics of atmospheric CO2 across China revealed by long-term, high-resolution satellite-derived data
Abstract. Understanding the spatiotemporal dynamics of atmospheric carbon dioxide (CO2) is fundamental for advancing climate change research and designing effective mitigation strategies. Yet current analyses are constrained by two key limitations: sparse observations that hinder intra-urban assessment and relatively short monitoring periods that limit long-term consistency. To overcome these challenges, we developed a long-term atmospheric CO2 hindcast modeling framework that generates daily 1-km column-averaged dry-air mole fraction of CO2 (XCO2) across China for 2000–2020. The framework adapts the proven PM2.5 hindcast approach to CO2 estimation by training an Extremely Randomized Trees model on the residuals between OCO-2 observations and CarbonTracker simulations. The model integrates a comprehensive set of physically interpretable predictors—including MAIAC aerosol optical depth, NO2, peroxyacetyl nitrate, meteorological variables, and land-use indicators—linking CO2 variability to co-emitted tracers and boundary-layer processes. Rigorous evaluation demonstrated high reliability (cross-validation R2 = 0.94–0.97, RMSE = 0.82–1.29 ppm; independent validation R2 = 0.82–0.97). The resulting long-term, high-resolution dataset reveals distinct carbon hotspots and their evolution: the North China Plain remained persistently elevated with rapid increases during 2000–2010, while southern China exhibited accelerated growth after 2010. Enhancement analyses identified consistent intra-regional hotspots in southeastern Beijing-Tianjin-Hebei and northern Zhejiang, with emissions declining after 2012 and rebounding after 2018. During the Wuhan COVID-19 lockdown, urban cores showed sharper reductions than suburban areas. The proposed XCO2 hindcast modeling framework and the resulting dataset provide a valuable foundation for advancing carbon-neutrality assessments and guiding climate policy across multiple spatial scales.
"Spatiotemporal Dynamics of Atmospheric CO2 across China Revealed by Long-Term, High-Resolution Satellite Data"
General Comments
This manuscript presents a long-term, high-resolution satellite-derived XCO2 dataset over China using a machine learning approach. While the topic is timely and the dataset potentially valuable, the manuscript in its current form suffers from several fundamental methodological concerns that undermine confidence in the validity and interpretability of the results. Key issues include: (1) an inadequately described and physically questionable methodology (2) an overstated and poorly validated claim that the model improves upon existing products; (3) a misuse of the XCO2 enhancement framework, which cannot be straightforwardly interpreted as a proxy for CO2 emissions without accounting for wind speed and other confounding factors; and (4) several factual errors, inconsistent terminology, and unclear or internally inconsistent figure descriptions. Taken together, these issues represent substantial weaknesses that require major revision before the manuscript can be considered suitable for publication. Given the number and severity of the concerns raised below, the manuscript is not recommended for publication in its current form.
Specific Comments
Lines 37–40:
The argument that atmospheric CO2 data can be used to assess mitigation effectiveness and inform sustainable development is stated too broadly and lacks specificity. Connecting column-averaged XCO2 retrievals to surface emissions is scientifically challenging due to CO2's long atmospheric lifetime, its associated large and variable background signal, a low signal-to-noise ratio relative to emission-driven enhancements, and the difficulty of separating anthropogenic signals from biospheric fluxes. The authors should provide a more rigorous and nuanced discussion of how their dataset could be used for these purposes—acknowledging these limitations.
Line 50:
The manuscript states that OCO-2 has a "daily revisit capability." This is incorrect. According to the OCO-2 mission documentation, the satellite operates on a 16-day repeat cycle. This should be corrected, and the relevant citation should be verified accordingly.
Lines 99–101:
The citation of Crisp et al. (2017) as a reference for the CarbonTracker (CT) XCO2 product is inappropriate. Crisp et al. (2017) describes the algorithm theoretical basis for OCO-2 Level 2 retrievals, not the CarbonTracker data assimilation system. The authors should replace this with an appropriate reference for CarbonTracker CT2022 (e.g., Peters et al. or the relevant NOAA/GML documentation).
Lines 101–104:
The description of how the coarse-resolution CarbonTracker data (3° × 2°) are resampled to a 0.01° grid is insufficient. Also, the authors should clarify how the final XCO2 estimate is reconstructed—specifically, whether the predicted residual is added back to the CT XCO2 value.
Lines 115–117:
The use of daily-mean meteorological and air pollution data to predict XCO2 at the satellite overpass time (approximately 13:30 local time) requires physical justification. Atmospheric properties such as planetary boundary layer height, temperature, humidity, and trace gas concentrations can vary substantially throughout the day. Using daily-mean values rather than values contemporaneous with the satellite overpass may introduce systematic biases. The authors should either provide a physical justification for this approach or conduct a sensitivity analysis to evaluate how the temporal sampling of input predictors affects model performance for the target overpass time.
Lines 163–167:
The leave-one-year-out cross-validation strategy described here appears to withhold one year from within the 2015–2020 period for evaluation. However, the authors claim this approach allows assessment of model performance for the pre-2015 period. This logic is not convincing: withholding a year from the middle of the training period does not simulate the extrapolation challenge posed by hindcasting to years before the OCO-2 era, which may involve structural differences in the predictor-XCO2 relationships. The authors should clarify the validation approach and, if the goal is to assess pre-2015 performance, adopt a more appropriate out-of-sample evaluation strategy (e.g., training exclusively on post-2015 data and evaluating against ground-based observations in pre-2015 years).
Lines 183–185:
The use of a 10:30–16:30 time window to average ground-based observations for comparison with OCO-2 retrievals (overpass time ~13:30) is overly broad and requires justification. Atmospheric CO2 concentrations at surface sites exhibit pronounced diurnal variability driven by boundary layer dynamics and biospheric fluxes, meaning that measurements taken in the early morning or late afternoon may differ substantially from those at solar noon. Averaging over a six-hour window centered loosely on the overpass time may introduce significant biases in the validation. The authors should either narrow the averaging window (e.g., ±1–2 hours around the overpass time) or provide a sensitivity analysis demonstrating that this choice does not materially affect the validation statistics.
Lines 196–199:
The XCO2 enhancement method, while widely used in exploratory analysis, cannot be directly interpreted as an indicator of surface CO2 emissions without accounting for atmospheric transport, particularly wind speed and direction. An XCO2 enhancement above a background value reflects a combination of upstream emissions, atmospheric dilution (governed by wind speed), boundary layer height, and biospheric signals. High XCO2 enhancements under calm wind conditions may not correspond to higher emissions than lower enhancements under strong winds. The authors should either (1) incorporate a wind-speed correction or apply a more physically rigorous emission estimation framework, or (2) explicitly characterize the temporal variability of wind conditions over the study regions and discuss the extent to which this limits the interpretation of enhancements as emission proxies.
Figure 1g:
A visible data gap appears in the estimated XCO2 time series for approximately 2002–2003. Is it attributable to missing input data (e.g., predictor variables), or a deliberate data exclusion decision? This should be addressed explicitly in the text.
Section 3.1.3:
This section discusses feature importance but focuses almost exclusively on MAIAC AOD, neglecting several other highly ranked predictors. Notably, total column water vapour emerges as the most important predictor in both the YRD and PRD regions, yet this is not discussed. The authors should provide a physical explanation for this result—is this relationship physically meaningful or potentially spurious? Similarly, Day of Year appears to be among the most important features across regions, raising the question of whether the model's representation of XCO2 seasonality is driven primarily by this temporal index rather than by physically meaningful predictors. The implications of this for spatial generalization and for hindcast periods should be discussed.
Lines 270–272:
There is a typographical error in the figure caption: "Figure 2. of each …" should be corrected.
Lines 276–280:
The comparison of R² and RMSE values across different models to argue that the present model outperforms previous studies is methodologically inappropriate. Model skill metrics are highly sensitive to the spatial domain, temporal coverage, and resolution of the evaluation dataset, as well as the choice of validation sites and periods. Without a controlled, identical evaluation framework applied to all compared models, such inter-model comparisons cannot support strong claims of superiority. The authors should reframe this comparison more cautiously, noting that direct performance comparisons are not possible across studies with differing spatiotemporal coverage and evaluation protocols.
Lines 294–296:
The claim that previous machine-learning studies producing daily 1-km XCO2 estimates have been limited to post-2015 data is made without citation. Please add appropriate references to support this statement.
Lines 303–305 and 309–310:
The claim that the high-resolution product captures intra-urban XCO2 variations "with greater accuracy" than CT is not substantiated. To support this claim, the authors must provide a direct comparison of CT XCO2 and the model-estimated XCO2 against independent validation data (i.e., data not used in training), with evaluation metrics reported for both. Without this, the claim that the machine learning model improves upon CT at fine spatial scales remains unverified.
Figure 3a:
A sharp spatial gradient in XCO2 is visible around the Taklamakan Desert region, which appears physically implausible for a remote arid region with minimal anthropogenic activity. The authors should investigate whether this feature is present in the OCO-2 retrieval data or the CarbonTracker product, and if not, identify which input variable(s) are driving this artifact.
Lines 351–352:
The terms "CO2," "XCO2," and "mixing ratio/concentration/level" are used inconsistently throughout the manuscript. XCO2 is a column-averaged dry-air mole fraction expressed in parts per million (ppm), not a concentration in the physical chemistry sense (e.g., mol/m³). The authors should adopt consistent and scientifically precise terminology throughout the manuscript and avoid referring to ppm values as "concentrations."
Figures 5c and 5d:
The interpretation of XCO2 enhancements as indicators of urban CO2 emissions in these figures is problematic. If a single background XCO2 value is used for each region, the spatial patterns shown primarily reflect the climatological XCO2 gradient across the region, which integrates regional wind transport, biospheric fluxes, and the regional XCO2 gradient—not local emission differences between cities. For example, the larger enhancements observed in the southern portion of the BTH region do not necessarily imply greater emissions than those in Beijing; they may simply reflect more favorable transport or boundary layer conditions. The authors should clarify the background correction methodology and substantially revise the interpretation of these figures.
Lines 383–385:
The description of Figure 5d is inconsistent with its caption. The text refers to "a striking difference in the spatial patterns of XCO2 enhancements between the two years" and reports specific percentage reductions during the COVID-19 lockdown in Wuhan, but the figure caption states it shows the "20-year mean XCO2 enhancements for the YRD region." It is unclear which figure the text is actually referring to. The authors should ensure that all in-text figure references are accurate and that figure captions correctly describe the displayed content.