Python-Fortran Hybrid Programming for Deep Incorporation of AI and Physics Modeling and Data Assimilation (Hf2pMDA_1.0)

Zhu, Xianrui; Lin, Zikuan; Zhang, Shaoqing; Lu, Zebin; Wu, Songhua; Hou, Xiangyun; Xiao, Zhisheng; Ren, Zhicheng; Li, Jiangyu; Xu, Jing; Gao, Yang; Hao, Rixu; Yu, Xiaolin; Li, Mingkui

doi:10.5194/egusphere-2025-6479

Preprints

https://doi.org/10.5194/egusphere-2025-6479

Preprints

09 Mar 2026

| 09 Mar 2026

Python-Fortran Hybrid Programming for Deep Incorporation of AI and Physics Modeling and Data Assimilation (Hf2pMDA_1.0)

Xianrui Zhu, Zikuan Lin, Shaoqing Zhang, Zebin Lu, Songhua Wu, Xiangyun Hou, Zhisheng Xiao, Zhicheng Ren, Jiangyu Li, Jing Xu, Yang Gao, Rixu Hao, Xiaolin Yu, and Mingkui Li

Abstract. Artificial intelligence (AI) provides an unprecedented opportunity for advancing physics numerical modeling including data assimilation, which is a high-efficient and critically-important tool for advancing our understanding on Earth system and its applications. At the same time, deep incorporation of AI and physical modeling can make great driving to advance AI by injecting it rich physics from long time physics-based modeling development. However, since such physics models are conventionally coded in Fortran and AI algorithms usually are conveniently designed in Python, difficulties exist to directly incorporate AI algorithms into physics models, vice versa. Here, based on a f2py protocol, we have developed a procedure that implements an infrastructure which conveniently conducts Python and Fortran hybrid modeling and data assimilation (Hf2pMDA) to form a program entity so that AI algorithms and physical models can invoke mutually. As examples, within Hf2pMDA, a climate weakly coupled data assimilation (WCDA) system is naturally upgraded to a strongly CDA (SCDA) system, and a 1 km high-resolution weather DA system is conveniently implemented within a multi-layer downscaling model that has multiscale DA in different nesting layers. In the climate SCDA system, a coupled general circulation model (CGCM) and multiscale filtering algorithm is integrated by a Python main controller (PMC) that calls Fortran CGCM components and WCDA modules as well as a data-trained SCDA algorithm by latent space variational autoencoder (VAE) in Python. In the high-resolution weather DA system, the downscaled model consisting of traditional Fortran DA modules in all mother domains and Python VAE DA algorithm in the central child domain is integrated by a PMC that organizes these components. With convenient realization of deep incorporation of any AI algorithm and physics model, the Hf2pMDA has a great potential to make progresses on both AI and scientific modeling.

Received: 25 Dec 2025 – Discussion started: 09 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Xianrui Zhu, Zikuan Lin, Shaoqing Zhang, Zebin Lu, Songhua Wu, Xiangyun Hou, Zhisheng Xiao, Zhicheng Ren, Jiangyu Li, Jing Xu, Yang Gao, Rixu Hao, Xiaolin Yu, and Mingkui Li

Status: final response (author comments only)

CEC1:
'Comment on egusphere-2025-6479 - No compliance with the policy of the journal', Juan Antonio Añel, 28 Mar 2026

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
To access the WRF 3.7.1 code and the ERA5 and the OISST v2.1 datasets you link web pages which are not suitable repositories for scientific publication. They do not fulfil GMD’s requirements for a persistent data archive because:
* They do not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).

* They do not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,

* For the case of WRF, it does not appear to issue a persistent identifier such as a DOI or Handle for it.
If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-6479-CEC1
- AC1:
  'Reply on CEC1', Xianrui Zhu, 29 Mar 2026
  Dear Executive Editor,
  
  Thank you very much for your careful assessment of our manuscript and for drawing our attention to the requirements of the Geoscientific Model Development Code and Data Policy.
  
  We have now taken the following actions and completed the required archiving:
  We have archived the ERA5 and OISST data used in this study, together with our observation dataset, at https://doi.org/10.5281/zenodo.19272242.
  
  We have archived WRF v3.7.1 at https://doi.org/10.5281/zenodo.19271007.
  
  Together with previous archiving activities, these revisions ensure that the exact code and data used in this study are now available through persistent archives with permanent DOIs. We have revised the Code and Data Availability section as shown below, and we will add the corresponding to the manuscript statement to ensure full compliance with the GMD Code and Data Policy at the next round manuscript upload.
  
  Kind regards,
  Xianrui Zhu
  on behalf of all co-authors
  
  The revised Code and Data Availability section:
  
  The original ERA5 dataset (Hersbach et al., 2020) can be obtained from https://doi.org/10.24381/cds.adbb2d47.
  The original OISST v2.1 sea surface temperature dataset (Huang et al., 2021) can be obtained from https://www.ncei.noaa.gov/products/optimum-interpolation-sst.
  The CM2.1 model (Delworth et al., 2006a) can be obtained from https://github.com/mom-ocean/MOM5 and the CM2.1 model version with DA modules is also archived on Zenodo (https://doi.org/10.5281/zenodo.18883209; Delworth et al., 2006b).
  
  The Weather Research and Forecasting model version 3.7.1 (WRF v3.7.1; Skamarock et al., 2008) can be obtained from https://www2.mmm.ucar.edu/wrf/users/download/get_source.html. The exact version used in this study has also been archived to ensure reproducibility and long-term accessibility (https://doi.org/10.5281/zenodo.19271007; University Corporation for Atmospheric Research and NSF National Center for Atmospheric Research, 2015).
  
  The model code for Hf2pMDA-CM2CDA and Hf2pMDA-WRFDA developed in this study is archived at https://doi.org/10.5281/zenodo.18800167 (Zhu et al., 2026b). The datasets used in the experiments, including the observation data and the exact ERA5 and OISST data used in this study, are archived at https://doi.org/10.5281/zenodo.19272242 (Zhu et al., 2026a).
  
  Citation: https://doi.org/10.5194/egusphere-2025-6479-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 30 Mar 2026
    
    Dear authors,
    Thanks for addressing this issue so quickly. I have checked the repositories and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-6479-CEC2
RC1:
'Comment on egusphere-2025-6479', Anonymous Referee #1, 06 Apr 2026

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6479/egusphere-2025-6479-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-6479-RC1
- AC2: 'Reply on RC1', Xianrui Zhu, 28 Apr 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6479/egusphere-2025-6479-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-6479-AC2
RC2:
'Comment on egusphere-2025-6479', Anonymous Referee #2, 09 Apr 2026

The manuscript presents a Python-Fortran hybrid programming framework (Hf2pyMDA) designed to enable a more straight-forward integration between AI algorithms (python) and physics-based models (fortran). The framework makes use of f2py to conduct 2-way interaction between Python and Fortran components. The authors test their system using a coupled DA case based on VAE.
Overall, I do believe the topic is timely and relevant, given the growing interest in ML and our vast knowledge of physical modeling and DA. However, I found several issues in clarity, structure, and articulation of the main contributions and so I’m recommending rejection.
1. The manuscript is very difficult to follow and I don’t say that lightly. The authors have put in a lot of work into the manuscript yet the presentation significantly limits accessibility and understanding. If this is to be revised, the authors need to put in substantial effort on restructuring and language revision. Here are some of the issues I faced while reading:
a. The text contains many long, complex, and grammatically awkward (sometimes wrong) sentences.
b. Key ideas are introduced rather abruptly without sufficient explanation/context.
c. The logical progression between sections is unclear/misleading with frequent back-and-forth between concepts (e.g., infrastructure, DA method, ML details).

2. I also have an issue with the novelty of the work. At first, I thought this needs a software engineer (rather than a scientist like me) to understand all of the details. But then I saw an integration workflow and later a VAE application with strongly coupled DA. Yet, in all of these components I struggled to find novelty. The use of f2py is well-established. How is this different from existing Python-Fortran coupling approaches? You didn’t explain the relationship between the infrastructure and the SCDA application. Imagine I want to adapt this wrapper to an already existing DA library (say PDAF or DART); what are the steps needed? etc
3. The presented results show modest improvements (around 4% in some cases), but the significance of these improvements is not discussed in depth. It’s unclear whether the improvements are robust across different configurations/datasets. I suggest discussing the statistical significance (probabilistic metrics other than RMSE) and practical impacts of the results. Also, I’d include some text describing any limitations of the approach.
4. Figure 6 is kind of hopeless. There is a lot of data and numbers on the figure, making it impossible to read or understand the training procedure. I think this should be simplified or split into a figure and a table.
5. Line 310: Can you describe more in detail the Cross Attention Step? This seems to be an important detail within the general VAE framework.
6. Lines 345-347: I am confused on what is the background vs obs loss. The text contradicts the figure. In any case, I was expecting to decrease both loss functions during minimization. Why is the orange curve (not sure if it’s obs or background) increasing?
7. Section 4.2.1: Recompiling the model to make subroutine callable in a separate module seems intrusive to me. The beauty about ensemble DA systems is that they treat the physical models as black boxes. But now, it seems with the addition of ML this property is no longer available. In addition, some models cannot be recompiled into a subroutine callable library, so what happens in that case?
8. What is the classic multiscale DA? Please provide some details or references. There are many approaches to multi-scale DA.

Citation: https://doi.org/10.5194/egusphere-2025-6479-RC2
- AC3: 'Reply on RC2', Xianrui Zhu, 28 Apr 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-6479/egusphere-2025-6479-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-6479-AC3

Xianrui Zhu, Zikuan Lin, Shaoqing Zhang, Zebin Lu, Songhua Wu, Xiangyun Hou, Zhisheng Xiao, Zhicheng Ren, Jiangyu Li, Jing Xu, Yang Gao, Rixu Hao, Xiaolin Yu, and Mingkui Li

Data sets

Hf2p CM2.1_SCDA & WRF_LDA Dataset Xianrui Zhu, Zikuan Lin, Zebin Lu, Shaoqing Zhang, Songhua Wu https://doi.org/10.5281/zenodo.18799861

Model code and software

Hf2p CM2.1_SCDA and WRF_LDA Xianrui Zhu, Zikuan Lin, Shaoqing Zhang, Zebin Lu, Songhua Wu, Xiangyun Hou, Zhisheng Xiao, Zhicheng Ren, Jiangyu Li, Jing Xu, Yang Gao, Rixu Hao, Xiaolin Yu, Mingkui Li https://doi.org/10.5281/zenodo.18800167

CM2.1 Model Thomas L. Delworth, Anthony J. Broccoli, Anthony Rosati, Ronald J. Stouffer, V. Balaji, John A. Beesley, William F. Cooke, Keith W. Dixon, John Dunne, K. A. Dunne, Jeffrey W. Durachta, Kirsten L. Findell, Paul Ginoux, Anand Gnanadesikan, C. T. Gordon, Stephen M. Griffies, Rich Gudgel, Matthew J. Harrison, Isaac M. Held, Richard S. Hemler, Larry W. Horowitz, Stephen A. Klein, Thomas R. Knutson, Paul J. Kushner, Amy R. Langenhorst, Hyun-Chul Lee, Shian-Jiann Lin, Jian Lu, Sergey L. Malyshev, P. C. D. Milly, V. Ramaswamy, Joellen Russell, M. Daniel Schwarzkopf, Elena Shevliakova, Joseph J. Sirutis, Michael J. Spelman, William F. Stern, Michael Winton, Andrew T. Wittenberg, Bruce Wyman, Fanrong Zeng, and Rong Zhang https://doi.org/10.5281/zenodo.18883209

Xianrui Zhu, Zikuan Lin, Shaoqing Zhang, Zebin Lu, Songhua Wu, Xiangyun Hou, Zhisheng Xiao, Zhicheng Ren, Jiangyu Li, Jing Xu, Yang Gao, Rixu Hao, Xiaolin Yu, and Mingkui Li

Viewed

Total article views: 1,708 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,137	467	104	1,708	82	93

HTML: 1,137
PDF: 467
XML: 104
Total: 1,708
BibTeX: 82
EndNote: 93

Views and downloads (calculated since 09 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	885	293	87	1,265
Apr 2026	177	106	13	296
May 2026	58	59	2	119
Jun 2026	17	9	2	28
Jul 2026	0

Cumulative views and downloads (calculated since 09 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	885	293	87	1,265
Apr 2026	177	106	13	296
May 2026	58	59	2	119
Jun 2026	17	9	2	28
Jul 2026	0

Viewed (geographical distribution)

Total article views: 1,709 (including HTML, PDF, and XML) Thereof 1,709 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Jul 2026

Short summary

Deep integration of Artificial intelligence (AI) algorithms and traditional scientific models is crucial for progress, but Fortran-based scientific codes and Python-based AI are difficult to combine. We develop a Python–Fortran hybrid procedure that enables mutual invocation of AI and scientific modules. Applied to climate and weather models, it supports strongly coupled data assimilation and high-precision prediction, promoting future advances in both AI and scientific modeling.


Total:	0
HTML:	0
PDF:	0
XML:	0