Representation learning with unconditional denoising diffusion models for dynamical systems

Finn, Tobias Sebastian; Disson, Lucas; Farchi, Alban; Bocquet, Marc; Durand, Charlotte

doi:https://doi.org/10.5194/egusphere-2023-2261

Preprints

https://doi.org/10.5194/egusphere-2023-2261

Preprints

20 Oct 2023

| 20 Oct 2023

Representation learning with unconditional denoising diffusion models for dynamical systems

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Abstract. We propose denoising diffusion models for data-driven representation learning of dynamical systems. In this type of generative deep learning, a neural network is trained to denoise and reverse a diffusion process, where Gaussian noise is added to states from the attractor of a dynamical system. Iteratively applied, the neural network can then map samples from isotropic Gaussian noise to the state distribution. We showcase the potential of such neural networks in experiments with the Lorenz 63 system. Trained for state generation, the neural network can produce samples, almost indistinguishable from those on the attractor. The model has thereby learned an internal representation of the system, applicable on different tasks than state generation. As a first task, we fine-tune the pre-trained neural network for surrogate modelling by retraining its last layer and keeping the remaining network as a fixed feature extractor. In these low-dimensional settings, such fine-tuned models perform similarly to deep neural networks trained from scratch. As a second task, we apply the pre-trained model to generate an ensemble out of a deterministic run. Diffusing the run, and then iteratively applying the neural network, conditions the state generation, which allows us to sample from the attractor in the run's neighboring region. To control the resulting ensemble spread and Gaussianity, we tune the diffusion time and, thus, the sampled portion of the attractor. While easier to tune, this proposed ensemble sampler can outperform tuned static covariances in ensemble optimal interpolation. Therefore, these two applications show that denoising diffusion models are a promising way towards representation learning for dynamical systems.

Received: 04 Oct 2023 – Discussion started: 20 Oct 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2606 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2606 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

19 Sep 2024

| Highlight paper

Representation learning with unconditional denoising diffusion models for dynamical systems

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Nonlin. Processes Geophys., 31, 409–431, https://doi.org/10.5194/npg-31-409-2024,https://doi.org/10.5194/npg-31-409-2024, 2024

Short summary Executive editor

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2261', Sibo Cheng, 12 Mar 2024
This research paper presents a study on using denoising diffusion models for data-driven representation learning of dynamical systems. The research demonstrates the utility of such networks with the Lorenz 63 system, showing that the trained network can produce samples almost indistinguishable from those on the attractor, indicating the network has learned an internal representation of the system. This representation is then used for surrogate modeling and generating ensembles out of a deterministic run.
Overall I found this paper very well written and the contribution of introducing diffusion model into dynamical systems in geoscience novel and of clear contribution. Here lists my comments before I can recommend acceptance of this manuscript:
Comments:
1. If I understand correctly, the objective of this study is to explore the possibility of using diffusion model for high-dimension systems in geoscience. The numerical experiments are carried out using a three dimensional Lorenz model. To enhance the discussion, It would be beneficial if the authors could explain how generalizable their approach is to a high-dimensional spatial temporal system (e.g. by adding CNN or transformer layers for feature extractions (encoding) and decoding etc).
2. As a consequence of the small dimension, the ‘latent space’ in your diffusion model (256) is much larger the one of the physics space (3). Therefore, you have little risk in losing any information when using the denoising network for surrogate modelling. The authors may consider adding a baseline of transfer learning from an untrained (randomly initialized denoising NN) in Fig 7. The authors have shown the results of untrained NN in Tab 3 but only with a linear fine-tuning. What happens if you fine-tune with a non-linear NN of an untrained denoising NN?

Minor questions:
In figure 7, it seems that the dense neural network with two layers trained from scratch outperforms your transfer learning from the diffusion model. Is that the case? In fact, results in tab 3 also show that the model trained from scratch (dense *3 and resnet) performs similarly to the fine-tuning from your diffusion model? The authors may want to add some comments regarding this

Page 3, ‘generative training is rarely used for pre-training and representation learning of high-dimensional systems’. There are some works tried to use diffusion model for contrastive models, e.g,

-Yang, X. and Wang, X., 2023. Diffusion model as representation learner. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 18938-18949).
- Mittal, S., Abstreiter, K., Bauer, S., Schölkopf, B. and Mehrjou, A., 2023, July. Diffusion based representation learning. In International Conference on Machine Learning (pp. 24963-24982). PMLR.
The authors may want to include some references and discuss the difference/similarity compared to the method used in this paper. This paper is probably the first one to propose diffusion-based representation learning in dynamical systems(?)
3. Page 9, ‘show that this representation is entangled’ why it is important for the learned features to be entangled?
4. Page 11, check the sentence ‘As we will see later, the bigger the Because of the state-dependency, the resulting distribution is implicitly represented by the ensemble and could extend beyond a Gaussian assumption’
5. Page 13, it seems that you have used a lot of training samples (1.6*E7) for your diffusion model for the Lorenz system of dimension 3. I was wondering if a standard surrogate model will require that much. That is saying maybe a standard surrogate model can outperform the diffusion-based one with less training data. I am curious to see the authors’ thought.
6. fig 5 (a) and 1(b). if I understand correctly, the x-axis is the pseudo time instead of the real time in the dynamical system. if it is the case, it would be benificial to add an x-axis label to avoid any confusion.
Citation: https://doi.org/10.5194/egusphere-2023-2261-RC1
- AC1: 'Reply on RC1', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC1
RC2:
'Comment on egusphere-2023-2261', Anonymous Referee #2, 03 Apr 2024

This is a very interesting and novel study on the use of denoising diffusion model for representation learning. The manuscript is well written and describes very nicely the context, how these approaches (rooted in image applications) can be adapted to geosciences, and illustrates two distinct relevant applications, surrogate modelling and ensemble generations, that are both extremely important in high dimensional settings.
I think the manuscript can be accepted almost as it is, but I have a few minor comments I would encourage the Authors to look at.
1) While there are little spaces for doubts, I would strongly suggest the Authors to specify that their approach applies to ergodic chaotic dynamics for which an invariant distribution exists that describe the state distribution on the system's attractor. An obvious counterexample would be a stable system having an equilibrium point (or a limit cycle) as attractor.
2) When mentioning the Schrodinger Bridge (page 2), you may want to refer to Reich S. 2019 (doi:10.1017/S0962492919000011) as an exemplar study of the same analogy but in the area of data assimilation.
3) Line 27. "..dynamical systemS ..."
4) In the caption of Fig1b, use (left/right) to point the reader.
5) Line 44. I think you should always order references chronologically.
6) Line 53--59. While I understand and I like the Authors narrative and choice of references. Nevertheless, and particularly for the readers of NPG, it would be appropriate to also mention the large bulk of work on the generation of ensemble members based on dynamical systems's theory and data assimilation. A good recent reference is 10.1029/2021MS002828
7) I am a bit of an inconvenience with the use of the term "latent". On the one side I agree with a comment from the other Reviewer. On the other I do also see in line 100 that you state z=x which makes one deduce the latent and actual state have the same dimension. Finally, while it is true that latent variables are defined in relation to their indirect (often hidden) relation with the observables quantities, with no reference to their number (or space dimension), in many practical applications the latent space is assumed/defined/used as being of smaller dimension.
8) Line 115. I would add ".... prior distribution FOR THE DENOISING PROCESS."
9) Equations (8). Wouldn't be better to (re)state clearly that we do not have access to x in practice?
10) Line 145. Is that because they do not depend on x?
11) Line 153. I think "Equation" must be written at the beginning of the sentence.
12) Line 176. Instead of "normally" I would suggest "most of the times".

Citation: https://doi.org/10.5194/egusphere-2023-2261-RC2
- AC2: 'Reply on RC2', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2261', Sibo Cheng, 12 Mar 2024
This research paper presents a study on using denoising diffusion models for data-driven representation learning of dynamical systems. The research demonstrates the utility of such networks with the Lorenz 63 system, showing that the trained network can produce samples almost indistinguishable from those on the attractor, indicating the network has learned an internal representation of the system. This representation is then used for surrogate modeling and generating ensembles out of a deterministic run.
Overall I found this paper very well written and the contribution of introducing diffusion model into dynamical systems in geoscience novel and of clear contribution. Here lists my comments before I can recommend acceptance of this manuscript:
Comments:
1. If I understand correctly, the objective of this study is to explore the possibility of using diffusion model for high-dimension systems in geoscience. The numerical experiments are carried out using a three dimensional Lorenz model. To enhance the discussion, It would be beneficial if the authors could explain how generalizable their approach is to a high-dimensional spatial temporal system (e.g. by adding CNN or transformer layers for feature extractions (encoding) and decoding etc).
2. As a consequence of the small dimension, the ‘latent space’ in your diffusion model (256) is much larger the one of the physics space (3). Therefore, you have little risk in losing any information when using the denoising network for surrogate modelling. The authors may consider adding a baseline of transfer learning from an untrained (randomly initialized denoising NN) in Fig 7. The authors have shown the results of untrained NN in Tab 3 but only with a linear fine-tuning. What happens if you fine-tune with a non-linear NN of an untrained denoising NN?

Minor questions:
In figure 7, it seems that the dense neural network with two layers trained from scratch outperforms your transfer learning from the diffusion model. Is that the case? In fact, results in tab 3 also show that the model trained from scratch (dense *3 and resnet) performs similarly to the fine-tuning from your diffusion model? The authors may want to add some comments regarding this

Page 3, ‘generative training is rarely used for pre-training and representation learning of high-dimensional systems’. There are some works tried to use diffusion model for contrastive models, e.g,

-Yang, X. and Wang, X., 2023. Diffusion model as representation learner. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 18938-18949).
- Mittal, S., Abstreiter, K., Bauer, S., Schölkopf, B. and Mehrjou, A., 2023, July. Diffusion based representation learning. In International Conference on Machine Learning (pp. 24963-24982). PMLR.
The authors may want to include some references and discuss the difference/similarity compared to the method used in this paper. This paper is probably the first one to propose diffusion-based representation learning in dynamical systems(?)
3. Page 9, ‘show that this representation is entangled’ why it is important for the learned features to be entangled?
4. Page 11, check the sentence ‘As we will see later, the bigger the Because of the state-dependency, the resulting distribution is implicitly represented by the ensemble and could extend beyond a Gaussian assumption’
5. Page 13, it seems that you have used a lot of training samples (1.6*E7) for your diffusion model for the Lorenz system of dimension 3. I was wondering if a standard surrogate model will require that much. That is saying maybe a standard surrogate model can outperform the diffusion-based one with less training data. I am curious to see the authors’ thought.
6. fig 5 (a) and 1(b). if I understand correctly, the x-axis is the pseudo time instead of the real time in the dynamical system. if it is the case, it would be benificial to add an x-axis label to avoid any confusion.
Citation: https://doi.org/10.5194/egusphere-2023-2261-RC1
- AC1: 'Reply on RC1', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC1
RC2:
'Comment on egusphere-2023-2261', Anonymous Referee #2, 03 Apr 2024

This is a very interesting and novel study on the use of denoising diffusion model for representation learning. The manuscript is well written and describes very nicely the context, how these approaches (rooted in image applications) can be adapted to geosciences, and illustrates two distinct relevant applications, surrogate modelling and ensemble generations, that are both extremely important in high dimensional settings.
I think the manuscript can be accepted almost as it is, but I have a few minor comments I would encourage the Authors to look at.
1) While there are little spaces for doubts, I would strongly suggest the Authors to specify that their approach applies to ergodic chaotic dynamics for which an invariant distribution exists that describe the state distribution on the system's attractor. An obvious counterexample would be a stable system having an equilibrium point (or a limit cycle) as attractor.
2) When mentioning the Schrodinger Bridge (page 2), you may want to refer to Reich S. 2019 (doi:10.1017/S0962492919000011) as an exemplar study of the same analogy but in the area of data assimilation.
3) Line 27. "..dynamical systemS ..."
4) In the caption of Fig1b, use (left/right) to point the reader.
5) Line 44. I think you should always order references chronologically.
6) Line 53--59. While I understand and I like the Authors narrative and choice of references. Nevertheless, and particularly for the readers of NPG, it would be appropriate to also mention the large bulk of work on the generation of ensemble members based on dynamical systems's theory and data assimilation. A good recent reference is 10.1029/2021MS002828
7) I am a bit of an inconvenience with the use of the term "latent". On the one side I agree with a comment from the other Reviewer. On the other I do also see in line 100 that you state z=x which makes one deduce the latent and actual state have the same dimension. Finally, while it is true that latent variables are defined in relation to their indirect (often hidden) relation with the observables quantities, with no reference to their number (or space dimension), in many practical applications the latent space is assumed/defined/used as being of smaller dimension.
8) Line 115. I would add ".... prior distribution FOR THE DENOISING PROCESS."
9) Equations (8). Wouldn't be better to (re)state clearly that we do not have access to x in practice?
10) Line 145. Is that because they do not depend on x?
11) Line 153. I think "Equation" must be written at the beginning of the sentence.
12) Line 176. Instead of "normally" I would suggest "most of the times".

Citation: https://doi.org/10.5194/egusphere-2023-2261-RC2
- AC2: 'Reply on RC2', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Tobias Finn on behalf of the Authors (20 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (12 Jul 2024) by Ioulia Tchiguirinskaia

AR by Tobias Finn on behalf of the Authors (16 Jul 2024) Manuscript

Journal article(s) based on this preprint

19 Sep 2024

| Highlight paper

Representation learning with unconditional denoising diffusion models for dynamical systems

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Nonlin. Processes Geophys., 31, 409–431, https://doi.org/10.5194/npg-31-409-2024,https://doi.org/10.5194/npg-31-409-2024, 2024

Short summary Executive editor

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Model code and software

cerea-daml/ddm-attractor Tobias Sebastian Finn https://doi.org/10.5281/zenodo.8406184

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Viewed

Total article views: 935 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
591	304	40	935	36	31

HTML: 591
PDF: 304
XML: 40
Total: 935
BibTeX: 36
EndNote: 31

Views and downloads (calculated since 20 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	74	43	5	122
Nov 2023	52	18	1	71
Dec 2023	51	47	5	103
Jan 2024	60	29	1	90
Feb 2024	57	21	2	80
Mar 2024	50	48	2	100
Apr 2024	73	23	6	102
May 2024	41	27	8	76
Jun 2024	61	18	3	82
Jul 2024	29	17	2	48
Aug 2024	29	6	3	38
Sep 2024	14	7	2	23

Cumulative views and downloads (calculated since 20 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	74	43	5	122
Nov 2023	52	18	1	71
Dec 2023	51	47	5	103
Jan 2024	60	29	1	90
Feb 2024	57	21	2	80
Mar 2024	50	48	2	100
Apr 2024	73	23	6	102
May 2024	41	27	8	76
Jun 2024	61	18	3	82
Jul 2024	29	17	2	48
Aug 2024	29	6	3	38
Sep 2024	14	7	2	23

Viewed (geographical distribution)

Total article views: 909 (including HTML, PDF, and XML) Thereof 909 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 19 Sep 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2606 KB)
Metadata XML

Short summary

We train neural networks as denoising diffusion models for state generation in the Lorenz 1963 system and demonstrate that they learn an internal representation of the system. We make use of this learned representation and the pre-trained model in two downstream tasks: surrogate modelling and ensemble generation. For both tasks, the diffusion model can outperform other more common approaches. Thus, we see a potential of representation learning with diffusion models for dynamical systems.


Total:	0
HTML:	0
PDF:	0
XML:	0

Representation learning with unconditional denoising diffusion models for dynamical systems

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.