the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatiotemporal Dual-Stream Transformers for Cloud Microphysical Parameterization
Abstract. Accurate precipitation forecasting is essential for mitigating weather-related disasters. Numerical Weather Prediction (NWP) precipitation forecasting accuracy is largely constrained by microphysical parameterization schemes, which rely on simplifying assumptions that introduce uncertainties. Deep learning provides a promising approach for data-driven modeling of complex microphysical relationships. We propose to model the cloud microphysical process via the Learned Microphysics Transformer (LMP-Tr). LMP-Tr employs a hybrid Convolutional Neural Network (CNN)–Transformer architecture that alternately integrates multi-scale convolutional modules and dual-pathway attention modules to capture both local cloud-scale features and long-range atmospheric dependencies. The key innovation lies in the systematic alternation of multi-scale convolutional modules for local feature extraction and dual-pathway attention modules for global dependency modeling. The proposed model enables progressive refinement of atmospheric representations through height-variable attention pathways and cross-module attention mechanisms. Extensive evaluation on a WRF simulation dataset demonstrates superior performance of the proposed method. LMP-Tr provides a practical and effective solution for enhancing cloud microphysics representation in operational NWP systems, offering improved accuracy and physical consistency compared to other Artificial Intelligence (AI)-based parameterization approaches.
- Preprint
(3581 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 13 Aug 2026)
-
CEC1: 'Comment on egusphere-2026-3243 - No compliance with the policy of the journal', Juan Antonio Añel, 27 Jun 2026
reply
-
AC1: 'Reply on CEC1', Ting Shu, 28 Jun 2026
reply
Dear Prof. Juan Antonio Añel,
Thank you for your professional comments. We are sorry that the last submitted manuscript did not include the right version of data and code. We have uploaded all data (https://doi.org/10.5281/zenodo.20965731) and code (https://doi.org/10.5281/zenodo.20481658) the paper utilized onto the Zenodo website, and also rewrote the "data and code availability" section in the new manuscript (please check the attachment):
The training and testing dataset used in this study was generated by post-processing output from Weather Research and Forecasting (WRF) model version 4.2.1 simulations. The archived dataset contains the machine-learning samples used for training, validation, and independent testing of the LMP-Tr model, including input features and target variables extracted and processed from the WRF simulation outputs. The dataset includes samples from 30 precipitation events over Southern China during 2018--2020 and is openly available at \url{https://doi.org/10.5281/zenodo.20965731} (\cite{huang_2026_20965731}). The source code, experiment scripts, and post-processing scripts for LMP-Tr are openly available at \url{https://doi.org/10.5281/zenodo.20481658} (\cite{huang_2026_20481658}). The initial and boundary conditions for the WRF simulations were derived from ECMWF analysis data; the original ECMWF data products are not redistributed in this dataset.
Best Regards,
Ting Shu (on behalf of all co-authors)
-
CEC3: 'Reply on AC1', Juan Antonio Añel, 28 Jun 2026
reply
Dear authors,
Thanks for your reply. Unfortunately, it does not address the issues noted in my previous comment, and we can not consider the situation solved. We must insist that you provide repositories for the WRF code and the ECMWF data used in your work. Also, as you provide part of the data in binary files that can only be accessed with the proprietary software Matlab, you must at least identify clearly the Matlab version necessary to open them. Matlab does not ensure compatibility between version to access old files. A better solution would be that you provide the data in a format that does not depend on proprietary software to be able to read the files.
Juan A. AñelGeosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2026-3243-CEC3
-
CEC3: 'Reply on AC1', Juan Antonio Añel, 28 Jun 2026
reply
-
AC1: 'Reply on CEC1', Ting Shu, 28 Jun 2026
reply
-
CEC2: 'Comment on egusphere-2026-3243 - No compliance with the policy of the journal', Juan Antonio Añel, 27 Jun 2026
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
To access the WRF code you cite a GitHub site. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. In addition, to access part of the data used you provide a generic link to the ECMWF data portal; however, the ECMWF data portal does not fulfil GMD’s requirements for a persistent data archive because:
- It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist). Actually, their Terms of Use clearly state that data access can be terminated at any point and without notice.
- It does not appear to issue a persistent identifier such as a DOI or Handle for each precise dataset.If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2026-3243-CEC2 -
AC2: 'Reply on CEC2', Ting Shu, 28 Jun 2026
reply
Please check the reply in "Reply on CEC1". Thx!
Citation: https://doi.org/10.5194/egusphere-2026-3243-AC2
-
AC2: 'Reply on CEC2', Ting Shu, 28 Jun 2026
reply
Data sets
Generating-MPS-Dataset-via-WRF-4.2.1 Ting Shu https://doi.org/10.5281/zenodo.19177453
Model code and software
LMP-Tr v1.0: Source code and experiment scripts for cloud microphysical parameterization Yijun Huang and Ting Shu https://doi.org/10.5281/zenodo.20481658
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
To access the WRF code you cite a GitHub site. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. In addition, to access part of the data used you provide a generic link to the ECMWF data portal; however, the ECMWF data portal does not fulfil GMD’s requirements for a persistent data archive because:
- It does not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist). Actually, their Terms of Use clearly state that data access can be terminated at any point and without notice.
- It does not appear to issue a persistent identifier such as a DOI or Handle for each precise dataset.
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive Editor