GeoGen3D 1.0: An LMM-Based Reasoning Agent Framework for 3D Geological Model Generation

Guo, Jiateng; Li, Junkun; Jessell, Mark; Liu, Zhibin; Wang, Luyuan; Wang, Xulei

doi:10.5194/egusphere-2026-1960

Preprints

https://doi.org/10.5194/egusphere-2026-1960

Preprints

20 May 2026

| 20 May 2026

Status: this preprint is open for discussion and under review for Geoscientific Model Development (GMD).

GeoGen3D 1.0: An LMM-Based Reasoning Agent Framework for 3D Geological Model Generation

Jiateng Guo, Junkun Li, Mark Jessell, Zhibin Liu, Luyuan Wang, and Xulei Wang

Abstract. 3D Geological models provide conceptual and specific frameworks for a range of theoretical and applied geoscience activities, from theoretical research to the search for new resources. In many scenarios, the scarcity of data is common, and researchers must infer corresponding 3D geological structures based only on textual descriptions or outcrop images, for which standard 3D modelling approaches are poorly suited. Although current generative artificial intelligence can already generate pictures, videos, and 3D object models as required, there is still no feasible method to directly convert geologists' ideas or real photos of rock outcrops into 3D geological models. Here we present GeoGen3D, an intelligent Agent for text-image multimodal-driven 3D geological modeling. (1) Based on an improved ReAct agent framework, and by constructing a comprehensive collection of Noddy-based agent tools, we leverage the deep text and image understanding capabilities of large multimodal models (LMMs) to enable intelligent generation of 3D geological models from textual or visual inputs. (2) We introduce MMGM-Eval, a multimodal 3D geological model generation benchmark, to systematically evaluate the ability of LMMs to generate geological models from multimodal prompts. Our analyses demonstrate that GeoGen3D significantly outperforms direct prompt engineering approaches combining LMMs on the MMGM-Eval benchmark. GeoGen3D thus provides an efficient and intelligent modeling paradigm for multimodal-driven 3D geological model generation, especially suitable for scenarios lacking sufficient data.

Received: 07 Apr 2026 – Discussion started: 20 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4219 KB)

Supplement (14913 KB)

Download & links

Preprint (4219 KB)
Metadata XML
Supplement (14913 KB)
BibTeX
EndNote

Jiateng Guo, Junkun Li, Mark Jessell, Zhibin Liu, Luyuan Wang, and Xulei Wang

Status: open (extended)

Post a comment Subscribe to comment alert

CEC1:
'Comment on egusphere-2026-1960 - No compliance with the policy of the journal', Juan Antonio Añel, 21 Jun 2026 reply

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
In the repository cited in the Code and Data Availability section, the data necessary to replicate your work is missing. The repository currently only contains data for a test case, and in your manuscript you seem to use multiple training cases.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.

Later, if the Topical Editor decides to continue with the review or publication process of your manuscript and you are requested to upload a new version of it, then The 'Code and Data Availability’ section of your manuscript must also be modified to cite the new repository locations, and corresponding references added to the bibliography.

I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.

Juan A. Añel
Geosci. Model Dev. Executive Editor

Reply

Citation: https://doi.org/10.5194/egusphere-2026-1960-CEC1
- AC1: 'Reply on CEC1', Jiateng Guo, 23 Jun 2026 reply
  
  Thanks for your remind.
  We will upload other data and fix this problem soon.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2026-1960-AC1
- AC2:
  'Reply on CEC1', Jiateng Guo, 24 Jun 2026 reply
  
  Dear Professor Añel,
  
  Thank you for bringing this matter to our attention. We sincerely apologize for the oversight in our initial submission.
  
  The missing test dataset (the primary input data mentioned in our manuscript) has now been uploaded to a new version of the repository. The dataset, named MMGM-Eval, is available at the following location:
  
  https://doi.org/10.5281/zenodo.20817650
  
  We have ensured that the data necessary to replicate our work is now fully accessible. We will also update the 'Code and Data Availability' section in the revised manuscript to cite this new repository location, along with the corresponding reference in the bibliography, should the Topical Editor decide to continue with the review process.
  
  Thank you for your understanding, and we appreciate your guidance in ensuring compliance with GMD's Code and Data Policy.
  
  Sincerely,
  
  Jiateng Guo
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2026-1960-AC2
  - CEC2: 'Reply on AC2', Juan Antonio Añel, 25 Jun 2026 reply
    
    Dear authors,
    Thanks for addressing this issue. I have checked the repositories and we can consider now the current version of your manuscript in compliance with the code policy of the journal.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Reply
    
    Citation: https://doi.org/10.5194/egusphere-2026-1960-CEC2
RC1:
'Comment on egusphere-2026-1960', Anonymous Referee #1, 16 Jul 2026 reply
This paper develops an AI-based tool that can create 3D geological models in response to a text or image prompt. The method uses three different AI agents interacting with each other, in accordance with the ReAct (reasoning and acting) framework. The agents also retrieve external knowledge from Wikipedia and generate the models using the Noddy geological modelling tool. The authors show that this multi-agent, ReAct framework can produce better models than simply prompting an LMM in one step. To do so, they create a benchmark set of example prompts, which can be used for testing this and other AI workflows.
The paper contributes to a rapidly growing interest in the use of AI in geological modeling (and everything else). I think its most significant contribution is likely to be in demonstrating the value of a ReAct-type multi-step, iteratively improving approach to this task over a single-step approach. However, I find some aspects of the paper unclear and feel that major revisions are necessary to clarify and strengthen it.
Comments:
Line 31: I don't think deep should be capitalized.
Figure 1: I have multiple comments on this figure:
Is it possible to show the whole 555 word prompt somewhere, even if in a supplement or appendix?

It appears that the prompts are different in a and b. To make a direct comparison, the same prompt should be used in both cases.

In figure a, Hunyuan3D Studio is used, but this tool is not mentioned anywhere else in the manuscript, and it is not one of the LMMs that GeoGen3D is compared to in section 3.2 (or the one underlying the agents in GeoGen3D). Why not show one of the same LMMs that is used in the comparison later?

Line 44: A comma is needed after "prompts".
Lines 71-73: It was not initially clear to me that T, Ioutcrop, His, and M are the variable names for the text descriptions, field outcrop images, set of geological activity events, and three-dimensional geological model, respectively. They should be set off with commas or parentheses to make this clear.
Line 80: Is "Anoddy" the name of the “geological event action space”? Maybe set it off with commas too.
Section 2.2: It should probably be stated in this section (or even before) which LMM is used to build the agents. From Tables 2-4, I do see that it is GPT-4o, but I think that readers will be looking for that information earlier.
Section 2.2.1: Is Wikipedia a sufficient source for geological information? It covers major geological concepts but often doesn’t go very deep. Also, it can change. There should be at least some discussion of the choice to use it.
Figure 3: This flow chart suggests (in the blue box) that the agent looks up a Wikipedia page called “Inverted fold”. But when I search on Wikipedia, there is no such page. Even the “Fold (geology)” page only uses the word “inverted” twice and without any real explanation.
Line 127: Is “Pplan” a typo? Or is that the name of something?
Line 139: Missing r in representation.
Lines 214-220: It is not clear to me how the scoring is done. Line 217 suggests that an LLM does the scoring? Is it the same one doing the generating? Is there any human labeling involved to make sure that the LLM gets it right? Also, what is the misfit function that is used to produce a numerical score?
Section 3.1: If I am looking at the correct files in the Zenodo dataset, the Text2Geo3D-Eval prompts all appear to be in Chinese. Since this is an English-language article, it would be helpful to also see the results from the equivalent prompts in English. Perhaps try them in both languages and report both scores.
Section 3.2: It would be helpful to show examples of the geological models generated by each of the AIs rather than just the scores. Maybe pick one of the evaluation datasets to use as an example. Or one image example and one text example. And refer the reader to the Zenodo repository for the rest.
Tables 2 and 3: What are the three different columns under Score? Three trials? Three different metrics? This should be labeled or explained somewhere.
Figure 5: The prompt asks to “restore” the structure shown in the picture, but what follows seems to be more a forward model than a restoration. Maybe a different term could be used here?

Reply
Citation: https://doi.org/10.5194/egusphere-2026-1960-RC1

Jiateng Guo, Junkun Li, Mark Jessell, Zhibin Liu, Luyuan Wang, and Xulei Wang

Supplement

https://doi.org/10.5194/egusphere-2026-1960-supplement

Data sets

Data Sets Jiateng Guo and Junkun Li https://doi.org/10.5281/zenodo.19493634

Model code and software

Source Code Jiateng Guo and Junkun Li https://doi.org/10.5281/zenodo.19493634

Video supplement

Video Demo Jiateng Guo and Junkun Li https://doi.org/10.5281/zenodo.19493634

Jiateng Guo, Junkun Li, Mark Jessell, Zhibin Liu, Luyuan Wang, and Xulei Wang

Viewed

Total article views: 306 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
203	85	18	306	19	10	10

HTML: 203
PDF: 85
XML: 18
Total: 306
Supplement: 19
BibTeX: 10
EndNote: 10

Views and downloads (calculated since 20 May 2026)

Month	HTML	PDF	XML	Total
May 2026	128	48	9	185
Jun 2026	41	14	4	59
Jul 2026	34	23	5	62

Cumulative views and downloads (calculated since 20 May 2026)

Month	HTML	PDF	XML	Total
May 2026	128	48	9	185
Jun 2026	41	14	4	59
Jul 2026	34	23	5	62

Viewed (geographical distribution)

Total article views: 285 (including HTML, PDF, and XML) Thereof 285 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 20 Jul 2026

Short summary

This study proposes a method for generating three-dimensional geological models from text and rock outcrop images. Through multiple experimental cases, it has been proved that by using this method, a three-dimensional geological model corresponding to the geological structure can be generated based on the language descriptions of geologists and the actual rock images. This method enables geologists to directly develop geological structural hypotheses from a three-dimensional perspective.


Total:	0
HTML:	0
PDF:	0
XML:	0