the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: DACNO2 – A Multi-Constraint Deep Learning Framework for High-Resolution 3D NO2 Field Estimation
Abstract. Accurate, high-resolution 3D fields of nitrogen dioxide (NO2) are critical for air quality management and satellite retrievals, yet traditional chemistry-transport models (CTMs) face challenges in fine-scale modeling. Machine learning (ML) alternatives often struggle with generalization and transferability, inheriting biases from CTMs or being limited by sparse surface measurements. We present the Deep Atmospheric Chemistry NO2 model (DACNO2), a deep learning model that generates daily 2 km 3D NO2 fields over Western Europe. The model's three-phase and multi-constraint training strategy begins by pre-training on European Copernicus Atmosphere Monitoring Service (CAMS) reanalysis data to learn large-scale atmospheric patterns, then fine-tunes with both CAMS and in-situ European Environmental Agency (EEA) surface data to correct biases and refine local detail, and completes with an adaptive fine-tuning to capture evolving trends. An evaluation for 2023 shows that DACNO2 reproduces broad-scale 3D CAMS fields (R2 = 0.90) while improving agreement with independent EEA stations over the CAMS reanalysis (R2 enhanced from 0.61 to 0.66; bias reduced from -1.15 to -0.38 µg/m3). The model resolves more spatial detail and learns physically interpretable relationships. This hybrid training approach fuses the physical consistency of a process-based model with the real-world accuracy of surface measurements, overcoming the limitations of using either constraint data alone. Applying DACNO2 a-priori profiles to TROPOMI retrievals increases tropospheric NO2 columns by 3 % on average over those using European CAMS profiles, with larger enhancements over emission hotspots. These results demonstrate the framework's potential to advance air quality monitoring and satellite remote sensing.
Competing interests: The co-author, Michel Van Roozendael, is a member of the editorial board of Atmospheric Chemistry and Physics. The other authors declare no competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(2528 KB) - Metadata XML
-
Supplement
(1029 KB) - BibTeX
- EndNote
Status: open (until 05 Jan 2026)
- RC1: 'Comment on egusphere-2025-4259', Anonymous Referee #1, 24 Nov 2025 reply
-
RC2: 'Comment on egusphere-2025-4259', Anonymous Referee #2, 08 Dec 2025
reply
This technical note presents a new machine-learning-based method for estimating high-resolution 3D NO₂ concentrations across the atmosphere. The topic is important and has strong potential for applications in satellite retrievals, exposure assessment, and air quality research. However, several issues need clarification for better understanding.
Major Comments
1. Justification for the 2 km Resolution
The authors claim the model provides high-resolution estimates “up to 2 km,” but the choice of 2 km resolution is not fully justified. The input features vary widely in native resolution—from ~100 m (geography) to ~25 km (meteorology). Why was 2 km chosen rather than 1 km or 500 m? The original CAMS 10 km grid is relatively coarse, and may not match well with EEA ground observations. Is 2 km sufficient to address this representativeness mismatch? Since the model applies a fine-tuning procedure assuming that EEA observations represent ground-truth conditions at the target 2 km grid, the manuscript should provide evidence or rationale demonstrating that matching EEA stations to 2 km grids is appropriate. This justification is important for establishing the validity of the downscaling strategy.
2. Downscaling from 10 km to 2 km: Informational Limitations
The manuscript should more clearly explain which features actually provide meaningful spatial information for downscaling from 10 km to 2 km. Only a few indicators—such as geography, nighttime lights, and population—have resolution finer than 10 km, and these are largely time-independent. More dynamic and influential features (e.g., emissions, meteorology; shown in Figure 5) remain at 10–25 km resolution. Given this, it is unclear how the model captures high-resolution temporal variability.
Table 2 shows r/R², but the manuscript does not describe how r and R² were computed. I assume the metrics were calculated across all available records (i.e., combining space and time). To demonstrate that the model captures temporal variation—not only spatial variation—the authors should evaluate site-specific time-series performance, e.g.: compute r/R² for each site over time, then average these values across all sites. This would clarify whether the downscaling approach provides meaningful temporal improvements.
3. Model Generalization and the Role of Fine-Tuning
For the best-performing model (Phase 3), the approach resembles a machine-learning data fusion method, since it directly fine-tunes using current observations. This raises concerns about whether it remains a fair comparison to Phase 1 and Phase 2, which do not use test data for training. The loss function appears unconstrained, relying heavily on ground measurements. This may lead to limited generalization, especially with sample-imbalance—EEA sites are often concentrated in high-pollution areas. I recommend analyzing the EEA observation distribution relative to the full domain, including: concentration distributions, vertical profiles, spatial representativeness.
Because ground stations are mostly located in urban or high-NO₂ regions, the fine-tuning may bias the model toward overestimating suburban and rural concentrations. This may explain: the higher Phase-2 biases in suburban and rural areas (Table 2), and why DACNO₂ performs worse than CAMS-2km in these regions (Tables 2 and 3). Addressing this issue may require physical constraints, emission priors, or sample rebalancing strategies. Even if not feasible within the current work, it warrants further discussion.
4. Vertical Profiles and 3D Validation
Since the main goal is to develop a 3D NO₂ field, the lack of vertical validation is a concern. Are there any available observed vertical profiles (e.g., MAX-DOAS, aircraft, lidar) that could be used? The manuscript should include a comparison of vertical profiles, not only absolute values but also relative changes across urban, suburban, and rural environments. Physically, one would expect that enhanced horizontal resolution should: preserve regional total columns, but increase near-surface NO₂ in urban areas, and decrease upper-layer NO₂ in rural areas due to plume dynamics. However, Figure 6 shows DACNO₂ (which version?) with substantially higher concentrations across all vertical levels over Paris. Has the vertical profile been influenced by data fusion with observations? Clarifying the source of this behavior is important for evaluating the scientific validity of the 3D fields.
Minor Comments
(Line 21) Abstract: “Applying DACNO2 a-priori profiles to TROPOMI retrievals increases tropospheric NO2 columns by 3% on average over those using European CAMS profiles,” would benefit from a brief explanation. Is the increase primarily due to improved spatial resolution, enhanced vertical accuracy, or another factor?
(Line 156) “Notably, satellite-derived NO2 products were deliberately excluded from the input features for two key reasons.” Since satellite-derived NO₂ products are excluded, the training effectively relies on learning relationships of from emissions and meteorology to concentrations, similar to process-based CTMs. Please clarify whether there are sufficient training samples to support this, and discuss how the performance under this setup is validated.
(Line 176) “CAMS NO2 was processed by averaging hourly data to daily values and by bilinearly interpolating its horizontal resolution from 10 km to 8 km to match the model's scaling scheme.” to 2km? Is bilinear interpolation to 2 km scientifically reasonable here? This approach appears to rely solely on mathematical interpolation without additional physical or high-resolution informational support. Please justify this choice.
(Line 320) “DACNO2-Phase-2 was fine-tuned using only EEA NO2 data from training stations during the test period (2023 in this study),” This raises concerns regarding information leakage and fairness when comparing Phase-2 to other models that do not use test-year data for training. Clarification or additional justification is needed.
Table 2-3 are difficult to interpret due to its complexity. Consider simplifying the presentation, e.g., using a comparison bar chart or reorganizing the table to more clearly highlight performance differences across models.Citation: https://doi.org/10.5194/egusphere-2025-4259-RC2
Data sets
Technical Note: DACNO2 – A Multi-Constraint Deep Learning Framework for High-Resolution 3D NO2 Field Estimation Wenfu Sun et al. https://doi.org/10.5281/zenodo.16986854
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 138 | 23 | 11 | 172 | 16 | 6 | 8 |
- HTML: 138
- PDF: 23
- XML: 11
- Total: 172
- Supplement: 16
- BibTeX: 6
- EndNote: 8
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
See attached .pdf