the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OIRF-LEnKF v1.0: A Self-evolving Data Assimilation System by Integrating Incremental Machine Learning with a Localized EnKF for Enhanced PM2.5 Chemical Component Forecasting and Analysis
Abstract. Assimilating observational data into numerical forecasts is crucial for accurately estimating the spatiotemporal distribution of PM2.5 chemical components (NH4+, NO3-, SO42-, OC, and BC), which is beneficial to quantifying the impact of aerosols on the environment, climate change and human health. However, chemical transport model (CTM)-based data assimilation (DA) is computationally inefficient for large ensemble sizes and offers limited improvements in forecasting, as it solely provides optimal initial conditions. This paper introduces a machine learning (ML)-based self-evolving data assimilation system (OIRF-LEnKF v1.0) that achieves high efficiency and high quality in the forecast and analysis fields of chemical components. Computational efficiency tests indicate that the total time consumed by OIRF-LEnKF v1.0 constitutes only 11.41–16.60 % of that of CTM-based DA, particularly during the forecasting process (0.13–0.20 %). Sensitivity tests demonstrate that the self-evolution mechanism in our system enhances the Pearson correlation coefficient (CORR) and reduces the RMSE during the forecasting process by 2.28–11.75 % and 32.94–40.98 %, respectively, compared to the stationary training mechanism. A 2-month DA experiment reveals that the RMSE values of chemical components after DA are less than 7.80 µg m-3 and 2.36 µg m-3 during the forecasting and analysis processes, respectively, indicating reductions of at least 26.38 % and 68.99 % compared to values without DA. Notably, the RMSE values of our system during the forecasting process exhibit a significant reduction of 33.16–90.10 % compared to those of the CTM-based DA, highlighting the superior forecasting capability of our system. Furthermore, the spatial overestimation and underestimation of chemical components have been significantly mitigated following DA. Compared to multiple reanalysis datasets of inorganic salt aerosols (CORR: 0.56–0.89, RMSE: 2.55–8.52 μg m-3), the dataset generated by OIRF-LEnKF v1.0 (CORR: 0.97, RMSE: 1.12 μg m-3) demonstrates higher data quality.
- Preprint
(2542 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 25 Nov 2025)
-
CEC1: 'No compliance with the policy of the journal', Juan Antonio Añel, 11 Oct 2025
reply
-
AC1: 'Reply on CEC1', Ting Yang, 13 Oct 2025
reply
As attached.
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 13 Oct 2025
reply
Dear authors,
Thanks for the clarification. Unfortunately, this does not solve the problem. First, the declaration on the software and data used in your manuscript should be in the "Code and data availability" section, not in the Section 2.2 (“Data”). Secondly, from the sites listed in the table in your reply, only the one for NP2 is a repository. The others are not acceptable. Therefore, you must store all the data that you have used from the datasets mentioned in such table in a suitable repository according to our policy.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-3960-CEC2 -
AC2: 'Reply on CEC2', Ting Yang, 20 Oct 2025
reply
As attached.
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 21 Oct 2025
reply
Dear authors,
Many thanks for addressing the outstanding issues. We can consider the current version of your manuscript now in compliance with the Code and Data policy of our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-3960-CEC3
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 21 Oct 2025
reply
-
AC2: 'Reply on CEC2', Ting Yang, 20 Oct 2025
reply
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 13 Oct 2025
reply
-
AC1: 'Reply on CEC1', Ting Yang, 13 Oct 2025
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,202 | 36 | 15 | 1,253 | 8 | 12 |
- HTML: 1,202
- PDF: 36
- XML: 15
- Total: 1,253
- BibTeX: 8
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have published the ChinaHighPMC data in a restricted repository, and this does not comply with our policy, which requires that all the code and data used to produce a manuscript submitted to the journal is publicly available when submitting it. Therefore, the current situation with your manuscript is irregular. Please, publish the ChinaHighPMC data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repository.
I must note that if you do not fix this problem, we cannot accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor