GeoGen3D 1.0: An LMM-Based Reasoning Agent Framework for 3D Geological Model Generation
Abstract. 3D Geological models provide conceptual and specific frameworks for a range of theoretical and applied geoscience activities, from theoretical research to the search for new resources. In many scenarios, the scarcity of data is common, and researchers must infer corresponding 3D geological structures based only on textual descriptions or outcrop images, for which standard 3D modelling approaches are poorly suited. Although current generative artificial intelligence can already generate pictures, videos, and 3D object models as required, there is still no feasible method to directly convert geologists' ideas or real photos of rock outcrops into 3D geological models. Here we present GeoGen3D, an intelligent Agent for text-image multimodal-driven 3D geological modeling. (1) Based on an improved ReAct agent framework, and by constructing a comprehensive collection of Noddy-based agent tools, we leverage the deep text and image understanding capabilities of large multimodal models (LMMs) to enable intelligent generation of 3D geological models from textual or visual inputs. (2) We introduce MMGM-Eval, a multimodal 3D geological model generation benchmark, to systematically evaluate the ability of LMMs to generate geological models from multimodal prompts. Our analyses demonstrate that GeoGen3D significantly outperforms direct prompt engineering approaches combining LMMs on the MMGM-Eval benchmark. GeoGen3D thus provides an efficient and intelligent modeling paradigm for multimodal-driven 3D geological model generation, especially suitable for scenarios lacking sufficient data.