the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
BiXiao: An AI-dirven Atmospheric Environmental Forecasting Model with Non-continuous Grids
Abstract. High-precision and efficient atmospheric environmental forecasting is essential for protecting public health and supporting environmental management. However, traditional physics-based numerical models, while mechanistically interpretable, struggle to balance computational cost and forecast accuracy. Although artificial intelligence(AI) has advanced rapidly in meteorological forecasting, most existing AI models are not optimized for atmospheric environmental prediction and rely heavily on gridded inputs, limiting their ability to integrate site observations and their operational applicability. To overcome these limitations, we develop BiXiao, a new-generation AI-based atmospheric environmental forecasting model. BiXiao features a heterogeneous architecture with non-continuous grids, coupling independent meteorological and environmental modules for synergistic use of multi-source data. The meteorological module employs a 3D Swin Transformer(Swin3D) to process structured meteorological fields, while the environmental module directly assimilates discrete station data, enabling operational urban-scale forecasts. Testing in the Beijing-Tianjin-Hebei region shows that BiXiao completes 72-hour forecasts for six major pollutants across all key cities within 30 seconds. Compared with mainstream numerical models(CAMS and WRF-Chem), BiXiao achieves substantially higher computational efficiency and forecast accuracy, particularly during heavy pollution events.
- Preprint
(7458 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Jun 2026)
- RC1: 'Comment on egusphere-2025-5589', Anonymous Referee #1, 01 Jun 2026 reply
-
RC2: 'Comment on egusphere-2025-5589', Anonymous Referee #2, 06 Jun 2026
reply
Statement:
This manuscript presents BiXiao, an AI-based atmospheric environmental forecasting model that couples a 3D Swin Transformer meteorological module with an environmental module for pollutant prediction over the Beijing–Tianjin–Hebei region. The topic is timely and relevant, and the idea of combining meteorological background fields with air-quality observations is promising. The reported short-range forecast skill and computational efficiency also suggest potential operational value.
However, several central claims are not yet fully supported by the current experiments. In particular, the added value of the proposed AI architecture, the treatment of station observations, the fairness of the model comparisons, and the generality of the conclusions need to be clarified or strengthened. The manuscript is promising, but substantial additional evidence and more careful framing are needed before the main claims can be considered convincing.
Major comments:
1. More suitable baselines are needed.
The current evaluation mainly compares BiXiao with CAMS and WRF-Chem. These are useful references, but they do not show whether BiXiao outperforms simpler station-based forecasting methods. Since the environmental module uses pollutant observations at T+0 as input, short-term skill may partly come from persistence or temporal autocorrelation.
The authors should add simple baselines such as persistence, autoregressive regression, or simple LSTM/GRU/TCN models. These types of models are commonly used in air-quality time-series or station-level forecasting studies (Zheng et al., 2015; Li et al., 2017; Bai et al., 2018). A BiXiao variant without meteorological inputs would be especially useful.
2. The station-level claim needs clarification.
The manuscript states that BiXiao directly uses discrete station observations and predicts station-level/site-level concentrations. However, the actual processing maps 79 stations to 29 ERA5-aligned valid grids, with multiple stations in one grid averaged.
This weakens the claim of direct station-level prediction. The authors should either provide true station-level experiments or revise the wording to describe the output as station-derived, ERA5-aligned non-continuous grid predictions.
3. The generality of the claims should be toned down.
The experiments are limited to the Beijing–Tianjin–Hebei region and 29 valid environmental grids. This is a useful regional demonstration, but it is not enough to support broad claims such as a “new paradigm” for fine-scale urban environmental forecasting or future nationwide applications.
The authors should soften these claims or add broader tests, such as spatial holdout experiments across stations, grids, or cities.
4. Ablation experiments would improve interpretation.
The paper argues that meteorological background fields are important, but it does not show which meteorological inputs matter most. As an optional but valuable improvement, the authors could add grouped ablations, such as removing wind, temperature, humidity, surface variables, or T+1 meteorology.
These tests would help explain the physical relevance of the model and why performance differs among O3, PM2.5, PM10, and other pollutants.
Minor comments:
Line 1: There is a typo in the title: “AI-dirven” should be “AI-driven”.
Lines 249–253 / Fig. 3 caption: The text says Fig. 3 analyzes PCC at 6 h, 48 h, and 72 h, but the Fig. 3 caption labels the columns as 24 h, 48 h, and 72 h. Please make the lead times consistent.
Lines 342–344: Please check the spelling of the aerosol scheme “MADE/SOGARM”; it is commonly written as “MADE/SORGAM”.
Fig. 8 : the legend in the top panel should be “O3 Obs”.
Line 400: The sentence beginning with “Figure 12. Average pollutant concentrations...” appears to be a figure caption accidentally placed in the main text. Please move it to the Fig. 12 caption or rewrite it as normal prose.
Lines 407–412: The manuscript uses “non-continuous grids” elsewhere, but the conclusion switches to “non-uniform grid design”. These terms may imply different grid structures. Please use consistent terminology throughout the paper.
References:
Bai, S., Kolter, J. Z., and Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv:1803.01271, 2018.
Li, X., Peng, L., Yao, X., Cui, S., Hu, Y., You, C., and Chi, T.: Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation, Environmental Pollution, 231, 997–1004, https://doi.org/10.1016/j.envpol.2017.08.114, 2017.
Zheng, Y., Yi, X., Li, M., Li, R., Shan, Z., Chang, E., and Li, T.: Forecasting fine-grained air quality based on big data, in: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2267–2276, https://doi.org/10.1145/2783258.2788573, 2015.
Citation: https://doi.org/10.5194/egusphere-2025-5589-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 995 | 588 | 104 | 1,687 | 181 | 215 |
- HTML: 995
- PDF: 588
- XML: 104
- Total: 1,687
- BibTeX: 181
- EndNote: 215
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Statement:
This manuscript introduces BiXiao, an AI forecasting tool, with applications to air quality parameters in the Beijing-Tianjin-Hebei (BTH) region. The model features the ability to use non-continuous grids and considers not only static and 2D surface variables but also includes a 3D vision of the atmosphere, trained on 3D ERA5 and CAMS fields. It takes in multiple environmental datasets and contains independent meteorological and environmental modules to handle the combination of structured gridded data and discrete station measurements in the BTH area.
The article is interesting and engaging. Overall, it’s well written and informative. As a whole, I believe it is scientifically sound. However, the paper’s explanation of some of the methodology is too brief, making it a challenge to judge the scientific soundness and reproducibility of some of the specific methodological details. It also contains many technical terms that traditional environmental modellers may not be familiar with. Some of the sections are not clearly explained and need to be expanded. The manuscript also needs more background research, especially to be able to put it in greater context. The reference list is currently surprisingly short, and there should be more references to justify some of the statements made by the authors. I have a few major comments and several minor suggestions, mentioned below. I recommend this article be published after incorporating these revisions.
Major comments:
The manuscript needs more explanation on how BiXiao differs from other AI models, some of which can also accept non-continuous data.
Lines 192–193 state, “specific station information can be found in the Appendix.” However, no appendix or supplementary material has been included in the preprint. I am not able to provide a judgment on the use of this dataset since it is not explained. Please ensure the appendix is included in the final version.
The authors explain that this is a regional study, focused on the BTH region, and they justify this in the manuscript, for instance, by explaining the computational demands of the model. I would suggest that the regionality of this study be more clearly stated in the abstract and early in the introduction. I also suggest that the authors elaborate on the scalability of the model for future applications on a larger-scale.
Figure 10: Please indicate times on each of the panels.
Minor comments:
Throughout the manuscript, there are many erroneous quotation marks, including around the model name BiXiao, and they are used inconsistently. I noticed this, for example, on lines 68, 69, 126, 167, 206, 212, 249, 254, 259, 264, 410, 434, etc. These quotation marks are not needed, and I suggest the authors go through the manuscript and remove the unnecessary quotation marks for better readability and a more professional looking article.
Line 11: CAMS would be more accurately described as a gridded reanalysis dataset, not a numerical model. I suggest the authors adjust this terminology to be more accurate.
Line 16: I’m not seeing a connection between the statement made by the authors and the Meng et al. (2023) reference. I suggest that the authors find a more relevant reference.
Line 22: In the context of BiXiao, I would suggest the term “large-scale modelling” rather than “Earth system modelling”.
Line 39: Inness et al. (2019) is a good reference for the CAMS reanalysis datasets, but this sentence should also have a reference for Aurora.
The paragraph from lines 58-61 is not needed.
Lines 90–91: Additional references are needed to support this background info.
Please expand on section 2.2.3, as the current version is too brief.
Please also expand on section 2.2.4 and provide more references in this section. For instance, please explain what SwinTRansformerBlock3D module is.
Section 2.2.6 needs more clarification. Is this the part that outputs a 3D grid similar to WRF-Chem? Please expand on this section to be more clear about the methodology.
Section 2.2.7 is also too brief. Please provide some description/examples of the dynamic features captured by the model.
Lines 187–189: Can the authors please clarify, was the BiXiao model trained only on the lowest four ERA5 model levels, and are only the four lowest levels used in forecasting?
Section 3.1.3: Are these datasets from CNEMC? From the description, I think that is the case, but without the appendix, I can’t confirm. If these datasets are from CNEMC, then they should be properly cited. For example, Song et al. (2017) and Tao et al. (2016) are suitable references, and there are likely newer references also available.
Lines 198–201: How do the authors justify averaging all stations in a grid, even though the stations may not be evenly spaced in the grid and therefore may not be equally representative of the grid? Why did the authors not choose to use a weighted average or an ML-based method to determine the representative value for each grid box?
Line 234: The sentence starting with “These three metrics” is not necessary.
Lines 273–275: Is there a formatting error in the PDF for these lines? They appear to be out of place.
Figure 3: The results in this figure appear to be presented as one result for each district. However, the methodology of this is not clearly explained. I suggest the authors explain how the model output was regridded by district for presentation in this figure.
Lines 330–333: The WRF-Chem model should be properly cited in this section.
Lines 363–366: Underestimation of pollutants, especially during haze periods, is a well-known issue in WRF-Chem, as well as other large-scale models, such as CMAQ, GEOS-Chem, etc (Gao et al., 2022; Sokhi et al., 2022; Saide et al., 2020; Li et al., 2023). I would suggest that the authors discuss this in further detail because the improvement compared to WRF-Chem that BiXiao offers is an interesting result. Providing further details on what BiXiao improves would be of interest to the large-scale modelling community.
Line 400: Is this line supposed to be a figure caption, or is this a typo?
Lines 415–416: Most traditional, physics-based models are written to run on CPUs, not GPUs. Are the authors saying that the AI model, run on a GPU, is more efficient because of the fact that it is run on a GPU instead of a CPU? Please clarify this statement.
Figure 12: BiXiao seems to show better performance in northern areas and not quite as good of performance towards the east and south of the BTH region. Interestingly, WRF-Chem seems to show approximately the same pattern. Could the authors speculate on why this could be?
References:
Gao, C., Xiu, A., Zhang, X., Tong, Q., Zhao, H., Zhang, S., Yang, G., and Zhang, M.: Two-way coupled meteorology and air quality models in Asia: a systematic review and meta-analysis of impacts of aerosol feedbacks on meteorology and air quality, Atmos. Chem. Phys., 22, 5265–5329, https://doi.org/10.5194/acp-22-5265-2022, 2022.
Li, J., Zhang, H., Li, L., Ye, F., Wang, H., Guo, S., Zhang, N., Qin, M., and Hu, J.: Modeling Secondary Organic Aerosols in China: State of the Art and Perspectives, Current Pollution Reports, 9, 22–45, https://doi.org/10.1007/s40726-022-00246-3, 2023.
Sokhi, R. S., Moussiopoulos, N., Baklanov, A., Bartzis, J., Coll, I., Finardi, S., Friedrich, R., Geels, C., Grönholm, T., Halenka, T., Ketzel, M., Maragkidou, A., Matthias, V., Moldanova, J., Ntziachristos, L., Schäfer, K., Suppan, P., Tsegas, G., Carmichael, G., Franco, V., Hanna, S., Jalkanen, J.-P., Velders, G. J. M., and Kukkonen, J.: Advances in air quality research – current and emerging challenges, Atmos. Chem. Phys., 22, 4615–4703, https://doi.org/10.5194/acp-22-4615-2022, 2022.
Saide, P. E., Gao, M., Lu, Z., Goldberg, D. L., Streets, D. G., Woo, J.-H., Beyersdorf, A., Corr, C. A., Thornhill, K. L., Anderson, B., Hair, J. W., Nehrir, A. R., Diskin, G. S., Jimenez, J. L., Nault, B. A., Campuzano-Jost, P., Dibb, J., Heim, E., Lamb, K. D., Schwarz, J. P., Perring, A. E., Kim, J., Choi, M., Holben, B., Pfister, G., Hodzic, A., Carmichael, G. R., Emmons, L., and Crawford, J. H.: Understanding and improving model representation of aerosol optical properties for a Chinese haze event measured during KORUS-AQ, Atmos. Chem. Phys., 20, 6455–6478, https://doi.org/10.5194/acp-20-6455-2020, 2020.
Song, C., Wu, L., Xie, Y., He, J., Chen, X., Wang, T., Lin, Y., Jin, T., Wang, A., Liu, Y., Dai, Q., Liu, B., Wang, Y., and Mao, H.: Air pollution in China: Status and spatiotemporal variations, Environ. Pollut., 227, 334–347, https://doi.org/10.1016/j.envpol.2017.04.075, 2017.
Tao, M., Chen, L., Li, R., Wang, L., Wang, J., Wang, Z., Tang, G., and Tao, J.: Spatial oscillation of the particle pollution in eastern China during winter: Implications for regional air quality and climate, Atmos. Environ, 144, 100–110, https://doi.org/10.1016/j.atmosenv.2016.08.049, 2016.