the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
High-resolution mapping of urban NO2 concentrations using Retina v2: a case study on data assimilation of surface and satellite observations in Madrid
Abstract. Urban air pollution poses a significant health risk, with over half the global population living in cities where air quality often exceeds World Health Organization (WHO) guidelines. A comprehensive understanding of local pollution levels is essential for addressing this issue. Recent advancements in low-cost sensors and satellite instruments offer cost- efficient complements to reference stations but integrating these diverse data sources in useful monitoring tools is not straightforward. This study presents the updated Retina v2 algorithm, which generates high-resolution urban air pollution maps by assimilating heterogeneous measurements into a portable urban dispersion model. Tested for NO2 concentrations in Madrid during March 2019, it shows improved speed and accuracy over its predecessor, with the ability to incorporate satellite data. Retina v2 balances performance with modest computational demands, delivering similar or better results compared to complex dispersion models and machine learning approaches requiring extensive datasets. Using only TROPOMI satellite data, citywide NO2 simulations show an RMSE of 19.3 μg/m3, with better results when hourly in-situ measurements were included. Relying on data of a single ground station can introduce biases, which can be mitigated by incorporating satellite data or multiple ground stations. Including more stations improves accuracy, with 24 stations yielding a correlation of 0.90 and an RMSE of 13.0 μg/m3. The benefit of TROPOMI diminishes when data from five or more ground stations is available, but it remains valuable for many cities which have limited monitoring networks.
- Preprint
(3928 KB) - Metadata XML
-
Supplement
(1695 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-202 - No compliance with the policy of the journal', Juan Antonio Añel, 21 Mar 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on a Git server. However, Git servers are not suitable repository for scientific publication. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories (see a list in our policy) and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please include the relevant input CAMS data used for your work, and include a statement on it in the "Code and Data Availability" section of your manuscript, not only a generic mention in the data section of the manuscript.Please, note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-202-CEC1 -
AC1: 'Reply on CEC1', Bas Mijling, 28 Mar 2025
Dear Executive Editor,
Thank you for pointing this out. We have stored the source code of the Retina model and the associated input data in a new repository at Zenodo, available at https://doi.org/10.5281/zenodo.15096616. Note that the CAMS data used for background concentrations as decribed in Section 2.2.1 is included in the tables "background_NO2" and "background_O3" in the SQLite database "madrid_observations_2019.sqlite". For convenience, we added excerpts of these tables in CSV format and included them in the repository as "madrid_background_cams_no2.csv" and "madrid_background_cams_o3.csv".
The "Code and Data Availability" has been rewritten to the following:
The source code of the Retina v2 model used in this study is available at https://doi.org/10.5281/zenodo.15096616 (Mijling, 2025). The necessary input data to reproduce the results in this study can also be found here, such as meteorology from ECMWF, background concentrations derived from the CAMS regional ensemble, and hourly traffic data in Madrid.The reference of Mijling (2025) in the Reference section has been changed to:
Mijling, B.: High-resolution mapping of urban NO2 concentrations using Retina v2: a case study on data assimilation of surface and satellite observations in Madrid (v1.0). Zenodo. https://doi.org/10.5281/zenodo.15096617, 2025The updated manuscript has been sent to the responsible editor. We hope that with these changes we now fully comply the journal's policy.
Citation: https://doi.org/10.5194/egusphere-2025-202-AC1
-
AC1: 'Reply on CEC1', Bas Mijling, 28 Mar 2025
-
RC1: 'Comment on egusphere-2025-202', Anonymous Referee #1, 10 Apr 2025
This paper presents a thorough update on previous work, highlighting several important improvements in the Retina algorithm. The topic is relevant, and the paper is both well-structured and clearly written. As the authors emphasize, the most significant advancement over their earlier work is the integration of satellite data into the data assimilation scheme, which apparently would only have an added value when less than 5 monitoring stations are available in a specific city.
Specific comments:
- Since the main novelty of the presented methodology lies in the integration of satellite data (TROPOMI) for urban NO₂ modeling, a longer validation period would be highly valuable. The current one-month evaluation period, may not fully capture the seasonal variability in satellite retrieval quality, atmospheric dynamics, and emissions. Statistical performance metrics may therefore exhibit seasonal dependence, potentially leading to less robust conclusions regarding the added value of satellite data.
- The method used to estimate background NO₂ concentrations via a line integral over the municipal perimeter raises several questions: (i) Does $\vec{e}_v$ represent the local wind direction? If so, at what altitude or vertical level is the wind taken from? (ii) Eq. 1 resembles a mass conservation approach, but it lacks a temporal term—how is accumulation of pollutants within the domain accounted for? (iii) Since the method depends on integrating along the perimeter, does this imply that the background concentration depends on the chosen perimeter? (iv) In the special case where $\vec{e}_v \cdot \vec{n} > 0$ for the entire perimeter (i.e. all wind is outflow), the integral appears ill-posed. How is this handled in the analysis? (v) Is this a novel approach? If so, could the authors justify its use and provide a comparison with background concentrations derived from station data within the domain?
- The manuscript suggests that the added value of TROPOMI measurements becomes negligible when data from 5 or more stations are available. However, this rule of thumb may not be sufficiently robust, as it oversimplifies the issue. Other factors (such as city size, NO₂ concentration levels, local meteorological conditions, …) can significantly influence this threshold.
- The manuscript states that traffic flow between counting locations is estimated using inverse-distance weighting interpolation, applied separately for highways and primary roads. However, since traffic volumes can vary significantly over short distances, especially in complex urban settings, this method might lead to unrealistic flow patterns. Could the authors justify the use of this interpolation approach and provide information on how its performance was assessed? Specifically, has any cross-validation been performed (e.g., removing some sensors and comparing interpolated vs. observed counts)?
- Pg. 27 line 518 : The manuscript compares the Retina model’s performance in Madrid with that of Kim et al. (2021), who trained a model using TROPOMI and 340 reference stations in Switzerland and northern Italy, obtaining a similar spatio-temporal correlation (0.79). However, several important differences limit the validity of this comparison: (i) Kim et al.'s study covers a much longer period (June 2018 to May 2020), including winter months, when satellite data is more frequently missing due to cloud cover—especially in complex alpine orography, which also affects the satellite’s ability to translate column densities into surface concentrations. (ii) Elevated regions like the Alps can introduce systematic biases in satellite-derived NO₂ due to vertical gradients in NO₂ distribution and reduced sensitivity near the surface. (iii) Additionally, the amount of stations is much higher in the Kim et al. study (340 stations vs. 24 in Madrid), making their results spatially and statistically more robust. I suggest the authors reconsider the framing of the comparison or add more nuance to highlight the limitations and contextual differences that affect model performance in each case.
Minor comments:
- pg 9 line 199: “See 0”
- pg 17 line 360 “Sect. 0.”
- pg 21 line 447: correct “Sect. 0”
- pg 26, the manuscript states that "Direct assimilation of NO₂ satellite observations is not very useful due to the relatively short lifetime of NO₂ [...]" I would suggest that the issue may not lie in the inherent utility of the data, but rather in how the data is adapted and integrated into the model.
Citation: https://doi.org/10.5194/egusphere-2025-202-RC1 - AC2: 'Reply on RC1', Bas Mijling, 02 Jul 2025
-
RC2: 'Comment on egusphere-2025-202', Anonymous Referee #2, 10 May 2025
Dear authors,
I would like to thank you for a very interesting read! I have a couple of questions that I'd like you to reflect on, but overall I am very happy with the quality of the manuscript and the described research.
Eq.1
Looking back at Figure 2, the assumption that b can be assumed constant along the perimeter of the city seems a bit optimistic? There is a factor of six difference in the concentration along the border in the north and the south of the city in March 2019, which suggests that a westerly or easterly wind would cause a much higher flux across the border in the south than in the north.And relatedly, you write (L179): “Other sectoral emission, e.g. from industry, will be accounted for indirectly in either an increased background field or in additional residential emissions.” Such industrial sites are likely not equally distributed along the border, further increasing inhomogeneities in the background border flux.
So my question is the following: Why not instead discretize the border along the l and z, and apply the same dispersion kernel that is used inside the city?
L204
Is it reasonable to assume that residential emissions are similar during weekdays and in the weekend?L230-L234
Can you explain a bit about how AERMOD treats dispersion through street canyons? If only one dispersion kernel is calculated for each combination of wind speed and direction, stability and boundary layer height, the model cannot deal with variations in the built environment or even in roughness length (which I imagine can vary a lot from the sparsely populated northern area to downtown Madrid), correct? Do you expect this to result in large errors?L269-L270
The dispersion model calculates concentrations of NOx but rather than assimilating NOx measurements, you assimilate assimilate NO2 measurements. You get these from the XGBoost algorithm, which introduces non-linearity to the system. While you explicitly mention that you ignore the dependence on O3 (L85-L86 of the SI), there is also the dependence on e.g. the temperature and the SEA (L264-L264). How do you reconcile this with the fact that a Kalman Filter assumes a linear measurement operator H?Minor comments
L199: “See 0?”
L575: Mijling (2000) should be Mijling (2020)
Mult: Please also check the manunscript for many different occurrences of Sec. 0.Citation: https://doi.org/10.5194/egusphere-2025-202-RC2 - AC3: 'Reply on RC2', Bas Mijling, 02 Jul 2025
Data sets
Retina v2 code and input data for Madrid case study Bas Mijling https://doi.org/10.21944/retina-v2-madrid-2019
Model code and software
Retina v2 code and input data for Madrid case study Bas Mijling https://doi.org/10.21944/retina-v2-madrid-2019
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
254 | 81 | 18 | 353 | 26 | 13 | 22 |
- HTML: 254
- PDF: 81
- XML: 18
- Total: 353
- Supplement: 26
- BibTeX: 13
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1