Inferring European Fossil Fuel CO2 Emissions using TROPOMI NO2 Data and Sector-Based NOx:CO2 Emission Ratios
Abstract. Accurate monitoring of fossil fuel CO2 (ffCO2) emissions is essential for tracking climate mitigation, yet natural carbon-cycle fluxes often obscure human-induced signals in atmospheric observations. This study presents a satellite-driven data assimilation framework that uses nitrogen oxides (NOx = NO + NO2) − short-lived trace gases co-emitted with CO2 − to estimate ffCO2 emissions. We estimate European NOx emissions for 2021 by assimilating TROPOMI NO2 observations into an Ensemble Kalman Filter (EnKF) framework, optimised within the GEOS-Chem atmospheric transport model. We use a computationally efficient offline treatment of NOx chemistry, enabling large-ensemble inversions while retaining sensitivity to changes in photochemistry. Assimilating these data leads to a systematic reduction in the state vector uncertainty, with a mean ensemble-based error reduction of 4.2–5.7 %, and an overall improvement in model agreement with observations that corresponds to an annual correlation increase, ∆r =0.12. By leveraging sector-specific NOx:CO2 emission ratios, we translate our posterior NOx flux estimates into improved ffCO2 estimates that capture enhanced seasonal variability. Our inferred ffCO2 emissions exhibit elevated values in autumn and winter, spatially concentrated over major source regions and consistent with surface temperature variability. Independent evaluation against in situ measurements confirms significant improvements in mean error statistics. While OCO-2 CO2 column data remain dominated by biogenic signals, our NO2-driven approach successfully isolates the fossil fuel component. This study demonstrates the potential of ensemble data assimilation and reduced-complexity chemistry to provide physically consistent constraints on European ffCO2 estimates, establishing a vital foundation for future joint NO2–CO2 inversion systems.
Inferring European Fossil Fuel CO2 Emissions using TROPOMI NO2 Data and Sector-Based NOx:CO2 Emission Ratios
Overall Comments
This manuscript addresses a topic that is both timely and scientifically important. The use of TROPOMI NO2 observations within an ensemble Kalman filter framework to infer European fossil fuel CO2 emissions is a promising direction, and the reduced-complexity NOx chemistry module is a useful technical contribution. I appreciate the authors' transparency in acknowledging several key limitations.
That said, I have some concerns about the framing and interpretation of the results. Some parts of the manuscript seem to overstate the capability of the proposed technique, while the methodology as presented seems to be at the proof-of-concept demonstration stage. This is not a criticism of the science itself, and I think the manuscript would benefit from more clearly reflecting the current limitations and the gap between what is demonstrated here and what would be needed for practical operational application. In particular, the assumed temporally constant NOx:CO2 emission ratios across sectors, seasons, and countries is a significant simplification whose implications for the posterior CO2 estimates are not fully explored. The authors do acknowledge this limitation toward the end of the paper, but given how central it is to the reliability of the results, it deserves more discussion and supporting analyses.
I am also concerned about the large posterior adjustments in national ffCO2 emissions, which range from 20% to 91% relative to the prior. While I recognize that bottom-up inventories can carry significant uncertainties, adjustments of this magnitude at national scales are surprising for a state-of-the-art inventory used in this study, and the manuscript currently provides no way to assess whether these reflect genuine inventory biases or artifacts of the inversion framework itself. I think this finding deserves more careful and critical discussion, including comparison against independent bottom-up emission estimates or other top-down inversion studies over Europe. Without such benchmarking, it is difficult to evaluate the reliability of the main quantitative results of the paper.
I recommend major revision before this manuscript is suitable for publication.
Specific Comments
Lines 6-7: “We estimate European NOx emissions for 2021 by assimilating TROPOMI NO2 observations into an Ensemble Kalman Filter (EnKF) framework, optimised within the GEOS-Chem atmospheric transport model.”
Although this statement is true, the manuscript never presented posterior NOx emissions. Consider adding a new figure dedicated for NOx emissions or add NOx layer onto an existing figure (i.e., Figure 6).
Lines 20–22: "Accurately monitoring these emissions is essential for tracking progress toward national mitigation goals, yet it is complicated by the difficulty of isolating fossil fuel signals from large, seasonally varying biogenic fluxes."
As written, this sentence implies that atmospheric observation-based monitoring is the primary or only method for tracking mitigation progress. In practice, self-reported national inventories, global bottom-up emission datasets, and other approaches all contribute to this goal. I suggest revising to acknowledge the broader monitoring landscape and to clarify that the atmospheric top-down approach complements rather than replaces these efforts.
Figure 1b: The blue (ICOS) and green (DECC) triangles are difficult to distinguish visually. Please consider changing the color or marker style of one of the two site types.
Figure 1b (continued): One of the NOAA in situ CO2 measurement sites appear to be located over a marine area. Could the authors confirm whether this is the Ocean Station Norway (STM) site? If so, my understanding is that observations at this site were terminated in 2009, and it would be important to clarify which dataset is actually being used here.
Lines 84–91: The authors describe the EEA NO2 network and the ICOS/DECC/NOAA CO2 sites used for evaluation, but there is no discussion of site representativeness. Many ICOS and NOAA tall tower sites are specifically designed to sample regional background air, which may make them poorly suited for detecting near-source ffCO2 emission changes. Similarly, some EEA NO2 sites may be located far from major combustion sources. Please add a brief characterization of the site types used here — urban, suburban, rural background — and discuss how their representativeness affects the interpretation of the model evaluation results.
Lines 101–103: "Prior combustion emissions of NOx and CO2 are taken from the CAMS-REG v8.1 emissions inventory at 0.05°×0.1° resolution."
Given how central this dataset is to the entire study, a brief description of how these emissions are estimated would be helpful — for example, whether they are based on national inventory reports, fuel consumption statistics, or activity data. Also, the CO2 and NOx maps in Figure 2 appear to show patterns that align with national boundaries. Is this a result of aggregating independent national inventories? If so, what are the implications for the inversion results? (i.e., do uncertainties vary by country?)
Lines 105–106: "Emissions from non-combustion sources in the CAMS-REG v8.1 inventory — which include fugitives, waste, solvents, and agriculture — are kept fixed at their prior values."
What is the justification for fixing non-combustion NOx emission sources while only adjusting combustion sources? This decision deserves more discussion. In particular, biogenic soil NOx emissions can represent a significant fraction of total NOx over agricultural area during summer, and fixing these at potentially biased values could systematically drive spurious adjustments in the posterior combustion NOx estimates. Please justify this choice more explicitly or acknowledge it as a limitation with a discussion of its potential seasonal implications.
Lines 110–114: The offline NOx chemistry parameterization is described and validated in Schooling et al. (2025) for emission perturbations on the order of +-20%, If I understand correctly. However, the posterior results reported in Section 3.1 show adjustments of up to +91% at the national scale and widespread winter increases of 10–177% at the grid level. It is unclear whether perturbations of this magnitude fall within the validated range of the linear scaling approximation. Please add a discussion of the expected error in the offline chemistry scheme under these larger perturbations, and assess whether this could introduce systematic biases in the posterior results.
Lines 123–125: The stability of the NO2:NOx partitioning ratio under emission perturbations is cited as a key assumption enabling the conversion of modeled NOx to NO2 columns for comparison with TROPOMI. However, similar to the point above, the range of perturbation magnitudes over which this stability holds is not discussed. What is the uncertainty introduced by this assumption, and how is it accounted for in the inversion framework? Given the large posterior adjustments found in this study, this assumption warrants more explicit validation or at minimum a sensitivity discussion.
Line 165: "the values of σn, σc and r are prescribed from the CAMS-REG v8.1 uncertainty estimates and the CORSO cross-species correlation product, and are assumed to be constant in time."
A few questions on this. First, which emission sector carries the largest uncertainty for NOx and CO2 in CAMS-REG v8.1? Based on Figure 3, northern Africa appears to have substantially higher uncertainty for both species, which seems to drive larger error reductions (Figure 4a) and larger CO2 flux increments (Figure 5b) in that region — yet no in situ evaluation is available there. Please discuss the implications of this large uncertainty for the NOx inversion and the derived CO2 estimates over both North Africa and the broader European domain. Second, do NOx and CO2 prior flux uncertainties have seasonality? If so, how would they affect the inversion result? What are the implications for interpreting the seasonality in monthly posterior increment (Figure 5)? Please address this.
Figure 3: Please clarify the units of the color bar. Do all four panels share the same units?
Lines 211–212: The authors note that MAE and bias improvements are largest in winter and attribute this to "challenges in bottom-up inventories for domestic heating." While plausible, this explanation is presented as a general conclusion without supporting evidence or citation. Does the prior error covariance have a seasonal component that could contribute to this pattern? Also, the seasonal pattern in prior model vs. TROPOMI agreement (higher errors in winter, lower in summer) is interesting, but is not explained. Please expand this discussion and provide supporting references for the domestic heating hypothesis, if available.
Lines 214–215: It is unclear how the posterior ffCO2 fluxes are actually computed from the posterior NOx scaling factors. The current methods section does not include an explicit description of this step. I strongly recommend adding a dedicated subsection describing the NOx-to-CO2 conversion. Specifically: is the same scalar scaling factor applied uniformly to all sectors at each grid cell? What are the implications of this approach when different sectors co-locate on the same grid cell with very different NOx:CO2 ratios? This NOx-to-CO2 conversion works when the inventory error originates from activity data but breaks down when the error comes from emission factors. This distinction is important for interpreting the posterior CO2 results and should be made explicit in the paper.
Lines 223–226: Please clarify whether "emissions" here refers specifically to ffCO2 emissions or NOx emissions, as the text could be read either way.
Figure 6: A few comments. First, the log-scale y-axis for CO2 flux makes it difficult to visually assess the magnitude of prior-to-posterior adjustments across countries. For example, the corrections for Turkey and Sweden appear visually similar to those for other countries despite being much larger in relative terms. Unless the log-scale axis is required for some reason, consider using either a linear scale or showing the relative adjustment explicitly. Second, for readers unfamiliar with European geography, adding country labels to at least one of the map figures (Figures 4 and 5) would make it much easier to connect the spatial patterns in those maps to the country-level results in Figure 6.
On the large posterior adjustments (Lines 229–231): The reported annual increases of 20–91% across all analyzed countries are surprising and require deeper discussion. CAMS-REG v8.1 is a state-of-the-art inventory, and systematic errors of this magnitude would be unusual and would likely have been identified in prior literature. Changes exceeding 6σ of prior uncertainty for Spain and Turkey should prompt careful examination of whether the inversion framework is functioning as intended — including possible issues with boundary condition, NOx chemistry, NOx:CO2 ratio assumptions under large perturbations. Please provide any independent evidence or plausible physical hypothesis that could support adjustments of this magnitude, and discuss alternative explanations more explicitly. A comparison against other bottom-up inventories (i.e., EDGAR) would provide additional context to this result.
On the large posterior adjustments and surface bias (Figure 7): Relatedly, the posterior barely improves the surface NO2 bias (-18.8 to -16.4 ug/m³) despite national emission increases of 20–91%. This is quite surprising and is not directly addressed. How would the evaluation look if restricted to in situ sites located in or near the countries with the largest adjustments (i.e., near cities or combustion sources)? This analysis would help assess how the posterior emission increases are reflected in near-surface concentrations.
On the temperature–ffCO2 relationship: The negative correlation between ffCO2 flux and surface temperature is a physically intuitive and visually compelling result. This relationship is one of the key scientific findings of the paper and would benefit from additional support, such as from bottom-up inventory analyses. A potential point of discussion: the temporal emission profiles in the prior (Figure 2, bottom row) already encode higher winter emissions for heating sectors. If the inversion is primarily amplifying a seasonal signal that was already present in the prior, the strengthened temperature-flux relationship in the posterior may partly reflect prior structure being scaled up rather than genuinely new information from TROPOMI? Also, how should energy demand for summer cooling (air conditioning) affect the ffCO2–temperature relationship?
On Figure A2 and the weekday-weekend signal: The appendix figure showing that the prior has two distinct branches in the temperature-flux scatter — corresponding to weekday and weekend emission regimes — that merge into a single coherent relationship in the posterior is an interesting finding that deserves more prominence and discussion in the main text. Is this merging physically meaningful, suggesting the inversion is correctly capturing a smoothed emission signal? Or could it reflect the Savitzky-Golay smoothing washing out the existing weekly cycle? It would also be informative to show the equivalent NOx flux vs. temperature scatter plot (a NOx version of Figure 6) to assess whether the same pattern holds for the directly constrained species.
On the OCO-2 evaluation (Figure 9): The near-absence of improvement in the OCO-2 comparison is acknowledged and expected given the dominance of biogenic signals in XCO2. However, the positive OCO-2 model bias that worsens in autumn and winter in the posterior is worth examining more carefully — what does this imply about the biospheric fluxes used in the inversion? More broadly, including OCO-2 as an evaluation dataset when it demonstrably cannot detect the signal being estimated risks creating a misleading impression of multi-dataset validation. Please either clarify the specific role and limitations of the OCO-2 evaluation more explicitly, or reconsider whether it adds sufficient scientific value to warrant inclusion.
On spatial evaluation: The model evaluation is conducted as temporally aggregated statistics across all available sites. Given that the inversion produces different results across countries — large increases in Turkey and Spain, modest changes in Netherlands and Belgium — a spatially resolved evaluation would provide much more diagnostic insight than domain-wide statistics alone. Or even a simple map of site-level prior vs. posterior bias would help assess whether the regional patterns in the posterior are supported by independent observations.
On the TROPOMI overpass time: TROPOMI observes at approximately 13:30 local time, meaning the inversion effectively constrains a single daily snapshot of emissions. The extrapolation to full daily totals relies on the sector-based diurnal scaling factors, and any errors in the assumed diurnal profile shape propagate directly into the posterior flux estimates. Please add a discussion of this limitation and its implications, and consider how complementary approaches such as geostationary satellite observations (e.g., TEMPO, which is directly relevant here), ground-based monitoring networks, or bottom-up activity data could help constrain the diurnal cycle more robustly.
On the broader methodological context: Several complementary approaches have emerged in recent years for inferring ffCO2 emissions from satellite observations, including direct plume detection methods, machine learning-based approaches, and observed cross-tracer ratio techniques. A brief discussion situating the present approach within this landscape — including its relative strengths and limitations compared to these alternatives — would help readers better understand the contribution and applicability of the method.