LFD (v1.0): Latent-Compression-Free Generative Diffusion with Geological Priors and Geophysical Regularization for Implicit Structural Modeling
Abstract. Diffusion models provide a promising way to model the distribution of implicit structural models, potentially improving generalization across surveys. However, existing diffusion Transformer pipelines scale poorly to high-dimensional geophysical data because noise- or velocity-prediction objectives are often unstable at large patch sizes, forcing the use of small patches that lead to long token sequences and high computational cost. To reduce computation, most approaches rely on Variational autoencoders (VAEs) and latent diffusion, but robust pretrained VAEs are scarce in geophysics, and enforcing geological priors in latent space is difficult. To address these scalability bottlenecks and the difficulty of enforcing geological priors in latent space, we propose Latent-Compression-Free Generative Diffusion (LFD) with Geological Priors and Geophysical Regularization for implicit structural modeling. Built on flow matching, LFD generates implicit structural models directly in the data space, enabling efficient large-patch Vision Transformer (ViT) inference and allowing fault/horizon constraints and geophysical regularization to be applied explicitly during generation. To strengthen structural conditioning, we design a structure-enhanced Transformer that injects horizon and fault embeddings at multiple layers. We further introduce two prior-guided losses: a horizon loss to match the generated models to the input horizons, and a fault-aware bending-energy term that regularizes smoothness while ignoring stencils across faults. By enforcing these priors directly in the data space, the model is effectively constrained to generate geologically reasonable structures. Experiments on both synthetic data and real surveys validate the effectiveness of LFD for prior-guided implicit structural modeling. Benefiting from large-patch inference, LFD generates a 512x512 model in 1.56 s on an NVIDIA H20 GPU. With relative positional encoding, LFD can be extended to higher resolutions via simple adaptation without retraining. Overall, LFD offers new insights into deploying diffusion models for high-dimensional geophysical data, enabling efficient generation with interpretable, prior-guided constraints.
Dear Editor, Authors,
This manuscript introduces an interesting and novel framework for applying diffusion architectures to implicit structural modeling. It is an innovative and thought provoking approach, however I have several major comments that should be addressed before the manuscript is accepted for publication.
1) Geophysical vs. Geological Data: The manuscript frequently refers to "geophysical data," yet the model constraints and inputs appear to consist exclusively of geological interpretations (e.g., horizons and faults). While these are presumably derived from seismic data, they are geological interpretations rather than geophysical data. Please either clarify what is meant by "geophysical" or modify the text to remove references to "geophysical data" .
2) Ablation study and uncertainty: Please extend the results or discussion section to include an ablation study showing how the model's predictions change as the horizon constraints become increasingly sparse. A multi-realisation comparison (using metrics such as information entropy) to demonstrate how well the generated realisations cover the plausible, data-consistent model space is also needed. Ideally this uncertainty analysis should be explicitly linked to the sparse horizon ablation study, to illustrate how realisation variance increases as the amount of conditioning data is reduced.
3) Expanded Discussion: The current discussion is quite short. I suggest expanding it significantly, including to address the models generalisation capacity. How close (geologically speaking) do applications need to be to the synthetic training data for the model to perform reliably? How much variability can the model produce, especially when data is quite sparse? Does it / can it produce geologically impossible results (e.g., "bubbles")?
4) Reproducibility: I attempted to run the code provided in the Zenodo repository but was unable to do so. To ensure the presented method is FAIR and usable, please update it to include a comprehensive README.md file explaining the code structure and how to get started. Specifically, the documentation should explain how to setup the required data, dependencies, and paths. Currently, it is unclear where to download the required ImageNet model, how to set the IMAGENET_PATH, and where to download the pretraining checkpoint (PRETRAIN_CKPT). Where to download the training and testing datasets (and so set the associated paths) is also unclear.
5) Figure Placement: Please check that all figures appear after their first mention in the text. Currently, several figures seem to appear very early in the manuscript and are not discussed until much later. Additionally, please clarify in Fig. 1 that the implicit field is predicted as a continuous variable. It would be useful to plot this continuous field alongside the discrete lithology/color visualisations (in at least one figure), as this gives a clearer representation of the models output.
Minor Points
Abstract (Line 1): The phrase "diffusion models provide a promising way to model the distribution of implicit structural models" is awkward (do the models really model the distribution of implicit structural models? Which implicit structural models?). Consider rewording for clarity.
Abstract (Line 5): Change "Variational" to lowercase.
Abstract (Line 11): Change "Transformer" to lowercase.
Line 116: Please explicitly clarify if the model predicts continuous implicit field values or discrete lithology classes. I assume the former is true, but the text and figure captions should be updated to make this unambiguous.
Line 125: Please define what is meant by "geophysical data" in the context of this study, or remove the term (as the manuscript does not currently feature integration with geophysical datasets).
Figure 3: Please discuss what happens to the framework's performance when the fault mask is also sparse?
Figure 3: There appears to be an absence of unconformities in the demonstration data (and potentially the training set?). Please address this limitation or design choice in the discussion. Was the model able to accurately reproduce unconformitites? How were these parameterised in the implicit framework (as the implicit value above and below an unconformity is not comparable).
Line 210: When creating the structural data, was the timing of the structural events allowed to vary (e.g., scenarios involving faults that are truncated by an unconformity)? Please expand a little on the synthetic modeling logic.
Line 216: Please expand upon the explanation of the random subsampling of horizons. What percentage of the horizons were retained, and how were the removed horizons selected? This more detailed description should ideally feed directly into the major ablation study requested above.
I look forward to seeing an updated version of this interesting work.
Kind regards,
Sam Thiele