the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Interpolating station quantile biases for tropospheric ozone MDA8 bias correction
Abstract. Chemistry transport models (CTMs) consistently exhibit systematic errors in ozone concentrations, which can be partly compensated by bias correction. There are several bias correction strategies suitable for using station data, but they are likely to introduce statistical artifacts when applied in high resolution. We propose a new bias correction strategy based on parametric interpolation of quantile biases (PIQB) suitable for high resolution simulations, which is designed to avoid such artifacts. In this study, we evaluate and compare the performance of our strategy with other older strategies with a focus on ambient maximum daily 8-h average ozone concentrations (MDA8). Our experimental setup consisted of two simulations from the CTMs WRF-Chem and CAMx in horizontal resolution of 9 km within the time period of 2007–2016 and 165 ground-based stations in central Europe. Our results show that each strategy brings the simulated MDA8 closer to observations, but PIQB performs the best in terms of mitigating systematic errors while retaining the modeled fine resolution structure of spatial variability. We conclude that out of the considered strategies, PIQB is the most suitable one for bias correction in high resolution, suggesting its possible applications for correcting climate projections of ozone MDA8.
- Preprint
(10006 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6218', Anonymous Referee #1, 23 Mar 2026
-
AC1: 'Reply on RC1', Jan Peiker, 31 Mar 2026
Dear Referee #1,
We are pleased to hear that you find our manuscript worthy of publication with minor revisions and we thank you for your remarks. Below, we address your valuable comments as available in the public discussion.
1) Lines 170-173: The authors appear to be removing very low hourly concentrations from observations: it is not uncommon for ozone levels to be zero or close to zero overnight – these are not unrealistic and therefore should not be being removed purely for quality control reasons. Similarly, there is no removal of high values, leading to an artificial increase in the seasonal mean following the removal of these values. Please can the authors justify the reasons for removing low, but not high, hourly values.
Response: We acknowledge the plausibility of measuring very low ozone concentrations at various station sites, especially at night-time hours. Firstly, we would like to clarify that we did not remove low values, but rather the stations measuring them, thus avoiding the artificial increase in the seasonal mean you mention. The goal was to select only stations suitable for our purposes, i.e., stations the character of which would correspond to the central European rural background plausibly captured by the model of 9 km horizontal resolution. Secondly, if a station measures low values too often, it is an indicator of either (i) a close proximity of a source of NOx emissions (NOx-limited regime occurs more naturally in central Europe although we are aware that not necessarily exclusively) or (ii) some form of a local effect impossible to resolve by our simulations (effect of local fine orography, specific fine scale circulation patterns, etc.). In the former case, the station may had been classified as rural background and such a classification did not apply anymore in our studied period, while in the latter case we would be penalizing our simulations in the validation phase for its resolution and inflict the simulations with local errors in the correction phase. Lastly, the choice of stations is based on the choice of Karlický et al. (2024), thus making the presented manuscript consistent with a previous study. We believe that minor changes in the text would be satisfactory and certainly possible.
2) Line 167 + Lines 173-175: It appears the authors started with 165 stations (line 167), removed some stations (line 174), but then ended with the same number of 165 stations (line 175). Please can these numbers be checked, or clarified as to what process has been undertaken to limit to 165 stations – are all the starting stations still used?
Response: We apologize for the misunderstanding. Out of all EEA stations in our domain, 298 stations passed the test of having at least 50% of valid values within the period. Then, 27 stations were removed due to being unsuitable for model validation for aforementioned reasons (thus 271 remained) and the remainder was then iteratively reduced according to the relative position to obtain the final number of 165 stations. This is consistent with Karlický et al. (2024), which considered the exact same choice of stations. We can easily incorporate this clarification into the paragraph.
3) Line 181: The value of α varies monthly. What impact would this have on the results given the choice to look at daily MDA8 values, in particular around the change of months (e.g. 31st compared to 1st of a month), which would presumably impact the predicted values considerably?
Response: Generally, in climate model oriented studies, it is considered a standard to perform analyses and bias correction per climatological seasons, i.e., 3 months long periods, thus making our choice of monthly corrections already above the general standard. We are also afraid that 10 years would not be enough for the correction to be performed in higher temporal resolution than 1 month. Discontinuous jumps around the change of months thus present inevitably possible artifacts which are luckily less notable than they would have been if the corrections were made seasonally (as done by, e.g., Rieder et al., 2015). On the other hand, as seen in Fig. 3, the minima of HD for PIQB gauss in particular are not so sharp and so the result is less sensitive to the choice of the specific value of the parameter.
4) Figure 9,10: It may be helpful to guide the reader by distinguishing between exceedances of observations and of the simulation contours in different colours (e.g. use a different outline colour for observations).
Response: Such a change is certainly possible. We are attaching a figure with an updated color scheme including a different outline color for observations. We also double-checked the attached figure for color-blindness tests.
5) Lines 433-434: How confident are the authors that the systematic errors of the models in the present-day climate would still exist under future climate projections, or will model quantile bias characteristics and spatial variabilities change over time?
Response: In our opinion, this question concerns the limitations of climate modeling in general. In other studies, this is typically an unspoken implicit assumption, which we explicitly stated in the introduction (line 37). We are very confident that bias correction can reliably mitigate present-day systematic errors originating from model resolution (e.g., uncertainties regarding emission inputs, boundary conditions, etc.) and errors of this type are bound to occur in corresponding projections as well, regardless of the definition of an error (e.g., quantile bias, mean bias, etc.). However, as also stated in the introduction (line 41; Liu et al., 2022), the bias patterns may differ depending on the chemical processes taking place. To our knowledge, a bias correction strategy involving the individual model processes has not yet been introduced, therefore, one must rely on corrections of the type which follows some definition of an error, which were compared by, e.g., Staehle et al. (2024). For completeness, we may state that the reliability of the correction should be high for similar climate conditions (e.g., near future projections) and reliable only in terms of resolution-induced deficiencies and otherwise potentially unreliable for projections in distant future (a trait shared with any other concurrent bias correction strategy due to potentially different bias patterns). We agree that some statements on this matter throughout the manuscript could have their phrasings softened.
Additionally, we thank you for notifying us on several technical errors which we believe we can resolve easily.
Thank you once again and we sincerely hope that we addressed all your comments well enough.
Kind regards,
Jan Peiker and the co-authors
References:
Karlický, J., Rieder, H. E., Huszár, P., Peiker, J., and Sukhodolov, T.: A cautious note advocating the use of ensembles of models and driving data in modeling of regional ozone burdens, Air Quality, Atmosphere & Health, 17, 1415–1424, https://doi.org/10.1007/s11869-024-01516-3, 2024.
Liu, Z., Doherty, R. M., Wild, O., O’Connor, F. M., and Turnock, S. T.: Correcting ozone biases in a global chemistry–climate model: implications for future ozone, Atmospheric Chemistry and Physics, 22, 12 543–12 557, https://doi.org/10.5194/acp-22-12543-2022, 2022.
Rieder, H. E., Fiore, A. M., Horowitz, L. W., and Naik, V.: Projecting policy-relevant metrics for high summertime ozone pollution events over the eastern United States due to climate and emission changes during the 21st century, Journal of Geophysical Research: Atmospheres, 120, 784–800, https://doi.org/https://doi.org/10.1002/2014JD022303, 2015.
Staehle, C., Rieder, H. E., Fiore, A. M., and Schnell, J. L.: Technical note: An assessment of the performance of statistical bias correction techniques for global chemistry–climate model surface ozone fields, Atmospheric Chemistry and Physics, 24, 5953–5969, https://doi.org/10.5194/acp-24-5953-2024, 2024.
-
AC1: 'Reply on RC1', Jan Peiker, 31 Mar 2026
-
RC2: 'Comment on egusphere-2025-6218', Anonymous Referee #2, 24 Mar 2026
The manuscript introduces the Parametric Interpolation of Quantile Biases (PIQB) as a novel strategy for ozone bias correction in high-resolution simulations. While the methodology is promising, several areas regarding temporal continuity, statistical reliability for policy application, and the depth of spatial analysis require further clarification and improvement.
- The authors optimize the interpolation parameter on a discrete monthly basis. While this captures seasonal shifts, it risks introducing artificial "jumps" or discontinuities at the boundaries of each month. Since the study already utilizes a 3-month "moving season" for cross-validation, it is unclear why a similar sliding window was not applied to the evolution of to ensure temporal smoothness.
- Figure 5 shows a significant decrease in the Pearson correlation coefficient after correction. The authors attribute this to the "shuffling" or permutation of data when using neighboring months for calibration. For policymakers, the timing of peak ozone events is as critical as the absolute magnitude. If the model loses its ability to capture when pollution events occur, the reliability of the correction for real-world health alerts is compromised. The authors should discuss the implications of this "temporal de-correlation" in more detail.
- The text frequently mentions that traditional methods (like Adjoint PDFs) introduce statistical artifacts. However, the results section lacks a direct, high-contrast visual comparison that explicitly highlights these artifacts versus the PIQB results. Adding zoomed-in panels for complex terrain (e.g., the Alps) would better substantiate the claim that PIQB avoids these pitfalls.
- The core of the PIQB strategy relies on a hybrid formulation (Eq. 4–7). For readers with a non-mathematical background, a flowchart or conceptual diagram illustrating how "model support" and "station quantile biases" are fused would greatly enhance the paper’s accessibility.
- In Section 3.3, the discussion of spatial gradients (zonal/meridional) in Figures 7 and 8 is largely qualitative. While the authors note that Obs. IDW "completely smooths out" variability, they should provide quantitative metrics—such as spatial correlation coefficients or spatial RMSE—to rigorously compare the "Spatial Integrity" of PIQB against the other strategies.
- The current Section 3.3 is quite dense and covers multiple validation dimensions simultaneously. To improve readability and logical flow, I recommend subdividing this section into the following thematic subsections: 1)Statistical Fidelity: Focus on , NMB, and PDF matching at station sites;2)Temporal Dynamics: Analyze the annual cycle and the impact of correction on temporal correlation;3) Spatial Integrity: Quantitatively evaluate the preservation of spatial gradients and model-resolved features. 4) Policy-Relevant Metrics: Focus on the exceedance days (MDA8 ) and the success rates of the confusion matrix.
Citation: https://doi.org/10.5194/egusphere-2025-6218-RC2 -
AC2: 'Reply on RC2', Jan Peiker, 31 Mar 2026
Dear Referee #2,
We thank you for your valuable comments. Below, we tried to provide answers to the best of our abilities. We also provide two figures in the attached .zip file.
1) The authors optimize the interpolation parameter on a discrete monthly basis. While this captures seasonal shifts, it risks introducing artificial "jumps" or discontinuities at the boundaries of each month. Since the study already utilizes a 3-month "moving season" for cross-validation, it is unclear why a similar sliding window was not applied to the evolution of to ensure temporal smoothness.
Response: In general climatology, it is considered a standard practice to evaluate the model performance, conduct bias correction and postprocessing, etc., on a discrete seasonal basis. We opted for monthly basis to increase the temporal resolution of our corrections precisely to capture shifts also at the boundaries of the seasons. The 3-month moving season was introduced to prevent overfitting, since performing purely monthly bias correction on 10-year long simulations may introduce statistical errors. Regarding temporal smoothness, we find the annual cycles of the optimal parameters to be already smooth enough for climatological purposes and the residual errors in the current Fig. 3 show a clear annual cycle as well. Additionally, we are unsure what you meant to be the assurance of temporal smoothness as we suspect there is a word missing in your comment, but regardless we hope that our answer would be satisfactory.
2) Figure 5 shows a significant decrease in the Pearson correlation coefficient after correction. The authors attribute this to the "shuffling" or permutation of data when using neighboring months for calibration. For policymakers, the timing of peak ozone events is as critical as the absolute magnitude. If the model loses its ability to capture when pollution events occur, the reliability of the correction for real-world health alerts is compromised. The authors should discuss the implications of this "temporal de-correlation" in more detail.
Response: Chemistry-climate simulations are typically designed to provide the policy-makers with information on the plausible seasonal span of values at an area. Our model experiments were designed similarly to a body of literature (see references in our manuscript, e.g., Schnell et al., 2014; Rieder et al., 2015; Mar et al., 2016; Staehle et al., 2024), i.e., regardless of the specific inter-diurnal evolution captured by the simulations. The work of Karlický et al. (2024) describes our model setup in more detail and it is evident that the meteorological and chemical boundary conditions of the simulations often do not even share the same meteorology, which affects the correlation accordingly. Consistently, in our study, we only wish to address the plausible spans of values before and after correction, and to show that various strategies perform such adjustments differently. Furthermore, we believe that the exact dates of possible peaks of MDA8 in climate projections (e.g., 1st Jul 2070) have little to no relevance to policy-makers in the present day, which is what our work is focused on the most – the demonstration of the tools in the historic simulations to show the possibility of using them to correct climate projections. We find the bias correction of such possible peaks and the season of occurrence to be of much higher interest as that provides realistic values in corresponding seasons, which has been achieved. For these reasons, we do not find the need to discuss the implications any further in the manuscript and we hope that we provided clarifications for any possible misunderstandings.
3) The text frequently mentions that traditional methods (like Adjoint PDFs) introduce statistical artifacts. However, the results section lacks a direct, high-contrast visual comparison that explicitly highlights these artifacts versus the PIQB results. Adding zoomed-in panels for complex terrain (e.g., the Alps) would better substantiate the claim that PIQB avoids these pitfalls.
Response: We believe that zoomed-in panels would not provide the readers with any new information. Instead, we suggest displaying fields of quantile biases as predicted by each strategy for the quantiles of 5, 50 and 95 (i.e., subtracting the corrected quantiles from the original quantiles). This way, it can be shown that, e.g., Adjoint PDFs lower the already underestimated central European MDA8 while overshooting the already overestimated median MDA8 in the Po Valley. We are attaching a sample figure for your compliance and we are open to any further suggestions.
4) The core of the PIQB strategy relies on a hybrid formulation (Eq. 4–7). For readers with a non-mathematical background, a flowchart or conceptual diagram illustrating how "model support" and "station quantile biases" are fused would greatly enhance the paper’s accessibility.
Response: We understand the issue of accessibility and we agree to include an illustration of the methods used. We are attaching a possible draft for your compliance and, once again, we are open to any further suggestions. The chart displays 3 steps, the left one shows the definition of quantile bias, the middle one shows the difference between interpolating station data and station quantile biases and lastly the right figure demonstrates the optimization procedure.
5) In Section 3.3, the discussion of spatial gradients (zonal/meridional) in Figures 7 and 8 is largely qualitative. While the authors note that Obs. IDW "completely smooths out" variability, they should provide quantitative metrics—such as spatial correlation coefficients or spatial RMSE—to rigorously compare the "Spatial Integrity" of PIQB against the other strategies.
Response: We are afraid that comparing all other methods to PIQB (with whichever interpolator) would only confuse the reader, since we do not know what the “ground truth” mean MDA8 field looks like and doing so could result in a misleading conclusion that we claim PIQB to provide the actual MDA8 fields. This is the main reason for regressing to qualitative discussions as there is no quantity to be compared. On the other hand, the figure we suggested in regards to one of your previous remarks could serve to demonstrate such phenomena and we are willing to include such figures in our manuscript.
6) The current Section 3.3 is quite dense and covers multiple validation dimensions simultaneously. To improve readability and logical flow, I recommend subdividing this section into the following thematic subsections: 1)Statistical Fidelity: Focus on , NMB, and PDF matching at station sites;2)Temporal Dynamics: Analyze the annual cycle and the impact of correction on temporal correlation;3) Spatial Integrity: Quantitatively evaluate the preservation of spatial gradients and model-resolved features. 4) Policy-Relevant Metrics: Focus on the exceedance days (MDA8 ) and the success rates of the confusion matrix.
Response: We agree to subdivide the current section 3.3 into subsections, although into 3 instead of 4. As explained above, we do not find the discussion of “temporal dynamics” in regards to correlation to be fruitful, and so we would consequently move current Fig. 6 to the supplement, as it is unnecessary to have a subsection discussing a single figure. Furthermore, as also stated above, quantitative discussions are not possible for certain aspects of our work, or at least not in the fashion you suggest in one of your previous remarks - we consider it important to show comparisons which do not apriori highlight any of the presented strategies to retain objectivity. Other than that, we agree with the suggested overall layout and it is certainly possible to subdivide our current section 3.3.
We hope to have provided answers to all your remarks in a sufficient enough manner and we look forward to continuing the discussion.
We thank you once again.
Kind regards,
Jan Peiker and the co-authors
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 174 | 60 | 17 | 251 | 39 | 52 |
- HTML: 174
- PDF: 60
- XML: 17
- Total: 251
- BibTeX: 39
- EndNote: 52
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
This is a well written paper of good scientific quality and introduces a . The methods are well presented and easy to follow, while the results include all the analysis that would be expected for this type of work. However, some caution should be given in the ability to abstract the results to future climate projections as the characteristics of model quantile biases and their spatial patterns in the present day may be altered under future climate scenarios. This is however a minor outcome of the presented work with the methods used could also be applicable to other present-day uses.
I would therefore recommend minor revisions prior to publication.
Specific Comments
Technical Corrections