Advancing data assimilation with the renewed Parallel Data Assimilation Framework (PDAF V3.1)
Abstract. The Parallel Data Assimilation Framework (PDAF) is a widely used open-source software for data assimilation (DA) with complex, high-dimensional Earth system models and other applications for research and operational use. PDAF is structured to provide a framework for ensemble integrations and to assimilate observations. For the DA, it provides ensemble Kalman filters and smoothers, nonlinear ensemble DA methods, and 3D variational methods. PDAF's observation module interface (PDAF-OMI) further provides a system for structured observation handling, enabling the management of large numbers of different observation types. With the recent upgrade to version 3, PDAF underwent significant code modernization and functionality enhancements to unify more than 20 years of developments since its first release. This study provides a comprehensive overview of PDAF's functionality and concepts, including the new features introduced with the major revision. These are, in particular, a new universal interface that allows for the application of any ensemble filter and smoother method, or any 3D variational method, without changes to the source code. Further, a class of ensemble Kalman filters that processes observations serially, new diagnostics for the ensemble and observations, model-independent support for incremental analysis updating (IAU), and an explicit mode for file-based offline coupling between the model and the DA were added in the major revision. PDAF can apply the same DA methods to idealized toy models as well as realistic high-dimensional models without recoding. This speeds up the development and testing of new DA methods. The implementation of a DA system with PDAF is performed utilizing a set of template files. These allow for the implementation in a very limited time, supported by extensive documentation, as is demonstrated by an example implementation with a toy model. For high-dimensional applications, PDAF allows developers to start the implementation of a new DA system with low complexity and to extend its functionality stepwise, enabling an easy start with DA. Couplings to more than 30 models have been implemented by the PDAF developers and the user community and many of them are publicly available. Further, PDAF's internal interface to DA methods provides a clear approach to add new algorithms that can leverage PDAF's framework functionality for observations, localization, and state vector handling. This allows developers of DA methods to make their methods available to the PDAF community.
This manuscript describes the most recent release of the PDAF data assimilation software system. PDAF is widely used and provides valuable functionality for Earth system science and so the material is a welcome addition to the literature. The first three sections do an admirable job describing the newest release, in particular how it can be interfaced to models and what capabilities it provides. I felt that sections 2 and 3 are quite mature and do not need significant modification before final publication.
I did find the examples in section 4 to be a bit less useful. I’m not sure exactly what readers were being targeted by this. The simple model results do little to provide evidence of the quality of the PDAF software. The level of detail provided in the description is too little to be helpful without additional reference to the PDAF documentation. Personally, I think the manuscript would be better without these simple model results.
I was a bit concerned about the lack of any description of the computational performance. A single sentence suggests that it is expected to be the same as earlier versions. I would hope that a revised manuscript could provide a little bit more information about the performance for some reasonably sized Earth system model.
The manuscript did not discuss how easily PDAF can be compiled on a variety of computing platforms. Many Earth system models require extensive effort to compile and run with compiler and hardware variants. Can the authors give any insight into how easy it is to port PDAF?
I was a bit surprised by the small volume of code required to implement a new model with arbitrary grids. Accurate interpolation on unstructured grids can often require sophisticated specific computation. This can be particularly important to avoid systematic bias when computing forward operators for Earth system models. This problem is more common in the vertical but can also occur in the horizontal for grids with spatially varying grid density. Could the authors provide a little bit more detail about how the default PDAF tools do interpolation for supporting forward operators?