Elucidating the performance of data assimilation neural networks for chaotic dynamics
Abstract. Recent work has shown that the analysis operator in sequential data assimilation designed to track chaotic dynamics, can be learned with deep learning from the sole knowledge of a true state trajectory and observations thereof. This approach to learning the analysis is computationally more challenging, yet conceptually more fundamental than approaches that learn a direct mapping from forecasts and observations to the corresponding analysis increments. Such learned scheme has been demonstrated to achieve accuracy comparable to that of the ensemble Kalman filter when applied to low-order dynamics. Strikingly, the same accuracy can be reached with a single state forecast instead of an ensemble, hence bypassing the need to explicitly represent forecast uncertainty.
In this study, we extend the investigation of such learned analysis operators beyond the preliminary experiments reported so far. First, we analyse the emergence of local patterns encoded in the operator, which accounts for the remarkable scalability of the approach to high-dimensional state spaces. Second, we assess the performance of the learned operators in stronger nonlinear regimes of the chaotic dynamics. We show that they can match the efficiency of the iterative ensemble Kalman filter, the baseline in this context, while avoiding the need for nonlinear iterative optimisation. Throughout the paper, we seek underlying reasons for the efficiency of the approach, drawing on insights from both machine learning and nonlinear data assimilation.
This paper tries to elucidate the reasons of the impressive results obtained by Data Assimilation Networks (DANs) in Bocquet et al. 2024 (Boc24 in the manuscript), by leveraging explainability techniques relying on the sensitivity of their Jacobian matrices. In particular, they provide a satisfactory proof that the scalability of the method to "unseen" higher dimensional model versions is due to its focus on local patterns.
The manuscript is well-written and interesting to read, and achieve the stated goal, providing a much needed explainability framework, a feature usually absent from most works using machine learning (ML). I recommend therefore its publication once the following comments have been addressed.
Major comments
- My main feeling is that the article go a bridge too far when it comes to discarding the usefulness of ensembles. For example lines 30-31: 'This result challenges the long-standing assumption that explicit ensemble representations are indispensable to estimate flow-dependent uncertainties in chaotic systems.' This may be true for Data Assimilation (DA), but it is not sure that this holds with respect to other analysis, where ensemble representations (or probabilities) might still present some usefulness.
- The authors should also comment on the fact that besides DA, the determination of flow-dependent uncertainties using ML has already been studied, with variable success. This raises the question of why it works so well here. One could conjecture that is due to the information on uncertainties (and instabilities) needed by the DA processes are actually suitable for its inference using ML, while determining actual precises quantities such as the Covariant Lyapunov Vectors (CLVs) and Lyapunov exponents is a more challenging task. The fact that a Multiplicative Ergodic Theorem (MET) exists for the underlying systems is clear, but this provides a mapping between the states of a system and its CLVs, it doesn't mean that this mapping between the CLVs at a given time can be determined alone from the state at the same time. For example, the Ginelli algorithm combines a forward and a backward pass, which take some "time" to converge (see F. Noethen studies to have an idea on this). Therefore it may explain why ML sometimes struggle to learn this mapping. Here, for DA, it works very well, and the DAN seems to be able to learn what is useful, even with just one member, but somehow this is an easier task than learning the MET mapping. In the end, it is probably connected to the fact that CLVs are non-local (and therefore dimensional scalability of algorithms computing them is not clear), contrary to the mapping between the forecast state and the analysis error covariance, as shown by your work.
- Lines 67-68: The analysis is an estimator of the conditional probability density function, or an estimator of its first moment ? Please clarify.
- Does the argument on translational invariance holds also because CNNs are know to be shift invariant, meaning the method used here would not be applicable to DNN without this invariance ?
- Basically the authors show that the CNNs DANs can be generalized to "unseen" cases (i.e. unseen higher dimensional version of the model at hand), because actually they focus on a subset of features (i.e. the local features) common to most of the model versions. Does that means that in the case of models encountering a dramatic change in its local properties (such as a change in the logarithmic slope of the energy spectrum for example) when the dimensionality is increased, DAN generalization would not work ?
- The manuscript is not self-contained enough, lots of details are simply mentioned as being from Boc24 and the reader has to go there to understand DANs details. Therefore, if in the future it becomes (more) difficult to find Boc24 for any reason, this manuscript would cease to be understandable. Could the authors incorporate a reasonable amount of the Boc24 setup description in this paper as well? Like for example a figure of the CNNs DAN schematics.
Minor comments
- There is a typo in the x label of fig 7.
- Line 131: 'memorises' is maybe a bit too much anthropomorphic here.