the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
TROPOMI/WFMD v2.0: Improved retrievals of XCH4 and XCO with XGBoost-based quality filtering
Abstract. The TROPOspheric Monitoring Instrument (TROPOMI) on board the Sentinel-5 Precursor satellite provides daily global observations of atmospheric methane (CH4) and carbon monoxide (CO) at relatively high spatial resolution. The dense spatial and temporal coverage is achieved by the instrument’s wide swath, which permits detailed mapping of the worldwide distribution of these important atmospheric constituents. The adaptation and optimisation of the Weighting Function Modified Differential Optical Absorption Spectroscopy (WFMD) algorithm for the simultaneous retrieval of the column-averaged dry-air mole fractions XCH4 and XCO from TROPOMI’s shortwave infrared (SWIR) radiance measurements has proven to be a valuable complement and alternative to the operational TROPOMI products.
The latest release of the TROPOMI/WFMD product (version 2.0) includes several improvements expanding its suitability for a wider range of scientific applications. Data yield at mid and high latitudes has increased, accompanied by improved accuracy and precision according to the validation with the ground-based Total Carbon Column Observing Network (TCCON). These advancements are primarily due to more refined quality filtering that has been accomplished by replacing the previous Random Forest Classifier with the more efficient and potentially higher performing Extreme Gradient Boosting (XGBoost) algorithm in conjunction with improved training data incorporating an updated cloud product from the Visible Infrared Imaging Radiometer Suite (VIIRS) and the TROPOMI Aerosol Index. This enhanced training data set enables more reliable identification of cloudy scenes and mitigates issues related to specific aerosol events over bright surfaces. Importantly, as with previous product versions, the actual quality classification does not depend on the real-time availability of these external data products, which are only required during the training phase.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Atmospheric Measurement Techniques.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(52549 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Feb 2026)
- RC1: 'Comment on egusphere-2025-5422', Anonymous Referee #1, 10 Feb 2026 reply
-
RC2: 'Comment on egusphere-2025-5422', Anonymous Referee #2, 22 Feb 2026
reply
This manuscript describes the TROPOMI/WFMD v2.0 product, specifically highlighting the integration of an XGBoost-based quality filter and a number of related retrieval improvements. The manuscript is well written and technically sound. The validation with TCCON and regional results are detailed, and the enhancement of data density, especially at mid and high latitudes, while preserving or improving bias and precision, is well demonstrated. This is a strong and relevant manuscript. The following comments are intended to enhance clarity and address a number of methodological issues.
Main comments
1. Motivation for replacing the Random Forest classifier
The manuscript describes the implementation of the XGBoost classifier in detail, but the motivation for replacing the previous Random Forest (RF) classifier could be stated more explicitly. While XGBoost is introduced as efficient and potentially higher performing, it is not entirely clear what concrete limitations of the RF-based filter prompted this transition.
A short summary of the main shortcomings of the previous RF approach (e.g. overly conservative filtering in certain regimes, misclassification behaviour, computational aspects, etc.) would help frame the update more clearly in terms of scientific added value.
2. Choice of classification threshold (p₀ = 0.5)
The decision threshold is set to p₀ = 0.5. While this is a common default in binary classification, in the context of atmospheric retrieval filtering the two types of misclassification do not have equal consequences. In particular, retaining cloud-affected scenes may introduce systematic biases in the retrieved columns, whereas rejecting valid scenes mainly reduces data yield. It would therefore be helpful to clarify whether alternative thresholds were evaluated and whether the selected value was assessed with respect to downstream geophysical performance (e.g. bias and scatter relative to TCCON).3. Representativeness of the training dataset
The classifier is trained on 38 randomly selected days from 2020–2021. Although an independent year is used for validation, it would be helpful to provide some additional information on the representativeness of this training sample. In particular, a brief summary of:
- the seasonal distribution of the selected days,
- their geographical coverage (including high latitudes),
- and whether specific regimes such as strong aerosol or dust events are included,
would help the reader better assess the generalisation capability of the model.
4. Transparency of the feature set and model generalisation
Since the quality filter is the central methodological component of this paper, the complete list of input features should be explicitly provided in the manuscript, even if inherited from earlier versions. I understand that the feature set is described in previous publications; however, for clarity and to keep the flow of the manuscript self-contained, it would be helpful to include the complete list of input features directly in the present paper.
In this regard: some features (e.g. surface altitude, lat, lon etc.) may correlate with climatological cloudiness or regional characteristics. It would therefore be useful to briefly discuss how the model avoids learning region-specific patterns rather than physically meaningful indicators of retrieval quality. A short discussion of generalisation across regimes would strengthen confidence in the robustness of the approach. An analysis of feature importance could be useful in this context.
Additional comment(s)Attribution of improvements in v2.0: Version 2.0 introduces several updates simultaneously (spectral window adjustments, hybrid vertical grid, post-processing refinements, and the new XGBoost classifier). While the validation results clearly show improvements relative to v1.8, it would be helpful to briefly clarify to what extent these gains can be attributed specifically to the updated quality filter versus the retrieval physics changes.
Citation: https://doi.org/10.5194/egusphere-2025-5422-RC2
Data sets
TROPOMI/WFMD XCH4 and XCO v2.0 Oliver Schneising https://www.iup.uni-bremen.de/carbon_ghg/products/tropomi_wfmd/
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 538 | 643 | 25 | 1,206 | 29 | 28 |
- HTML: 538
- PDF: 643
- XML: 25
- Total: 1,206
- BibTeX: 29
- EndNote: 28
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript is well written and extremely informative. It provides a thorough explanation of how the filter was trained, along with a robust assessment of its performance through validation using TCCON data and spatial analyses demonstrating improved filtering relative to the previous version. The manuscript also includes an analysis of potential albedo-related biases, which helps address concerns raised by some users regarding possible albedo effects seen in the operational product, where stringent filtering substantially reduces data throughput, particularly at high latitudes.
If the authors can address the following comments, I see no reason why this study should not be published.
Major comment:
Information and analysis regarding the features used as input to the XGBoost algorithm are missing, even though this is a key component of the work. The manuscript refers to Schneising et al. (2023) for the feature set, which in turn references Schneising et al. (2019), where a list of 25 features is provided in Section 2.5.2 Quality filter.
I understand that in previous studies the filtering was one of several methodological improvements and therefore described in less detail. However, in the present manuscript the quality filter is the central focus, and the specific input features should be explicitly listed and described to ensure reproducibility.
The manuscript should include a list of the features used as the input to the algorithm, along with their definitions and a brief justification of how each feature relates to cloud-contaminated scenes.
It would also be very helpful to include an analysis of feature importance (e.g., a plot analogous to Figure 2 of Keel et al., 2023). One option is to use the SHAP (SHapley Additive exPlanations) package to quantify the marginal contribution of each feature in the XGBoost model (see the SHAP documentation: https://shap.readthedocs.io/en/latest/).
Minor comments:
On line 139, the manuscript states that 26 features are used. Is this a typo? In Schneising et al. (2019), a total of 25 features are listed.
Assuming that the features are exactly the same as those used in Schneising et al. (2019), many of them appear to be reasonable proxies for cloud-related information. However, it is unclear why latitude, longitude, and altitude are included. These variables could bias the algorithm toward regions that are climatologically cloudy, rather than toward features that directly indicate cloud-contaminated retrievals. If these variables are still included in the current feature set, please justify their use.
Did the authors consider eliminating existing features or introducing new features to improve the filtering performance of the updated algorithm? If so, please describe this process and its impact on performance. If not, could this be an area of improvement for the next version which could be discussed in the conclusion?
The filtering approach focuses specifically on cloud contamination, are there other retrieval limitations that could introduce biases and should therefore be considered in the filtering process? For example, could sub-footprint altitude variability impact retrieval quality? Accurately modeling surface pressure in the presence of significant altitude variations within a footprint may be particularly challenging and could affect the reliability of the retrieval.
The absolute biases at North American TCCON sites are larger than at most other sites (e.g., Eureka ~13 ppb, ETL ~5 ppb, Park Falls ~8 ppb, Lamont ~7 ppb, Edwards ~2 ppb, and Caltech ~4 ppb). In contrast, most European sites show absolute biases below ~2 ppb, with Garmisch being the exception. Can the authors comment on the reasons for this discrepancy?
Some TCCON sites are impacted by wildfires in the summer months (can be clearly seen at ETL, Park Falls etc.). Even with the tight coincident criteria there is a good chance one instrument might be impacted by wildfire emissions and the other not. Maybe some additional filtering can be done to mitigate impacts, perhaps filtering of some summer months during years where fires were particularly strong.
Typo:
line 402 measurements from the TCCON should be measurements from TCCON