InsNet-CRAFTY v1.0: Integrating institutional network dynamics powered by large language models with land use change simulation

Zeng, Yongchao; Brown, Calum; Byari, Mohamed; Raymond, Joanna; Schmitt, Thomas; Rounsevell, Mark

doi:https://doi.org/10.5194/egusphere-2024-2661

Preprints

https://doi.org/10.5194/egusphere-2024-2661

Preprints

21 Oct 2024

| 21 Oct 2024

InsNet-CRAFTY v1.0: Integrating institutional network dynamics powered by large language models with land use change simulation

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Abstract. Understanding and modelling environmental policy interventions can contribute to sustainable land use and management but is challenging because of the complex interactions among various decision-making actors. Key challenges include endowing modelled actors with autonomy, accurately representing their relational network structures, and managing the often-unstructured information exchange. Large language models (LLMs) offer new ways to address these challenges through the development of agents that are capable of mimicking reasoning, reflection, planning, and action. We present InsNet-CRAFTY (Institutional Network – Competition for Resources between Agent Functional Types) v1.0, a multi-LLM-agent model with a polycentric institutional framework coupled with an agent-based land system model. The numerical experiments simulate two competing policy priorities: increasing meat production versus expanding protected areas for nature conservation. The model includes a high-level policy-making institution, two lobbyist organisations, two operational institutions, and two advisory agents. Our findings indicate that while the high-level institution tends to avoid extreme budget imbalances and adopts incremental policy goals for the operational institutions, it leaves a budget deficit in one institution and a surplus in another unresolved. This is due to the competing influence of multiple stakeholders, which leads to the emergence of a path-dependent decision-making approach. Despite errors in information and behaviours by the LLM agents, the network maintains overall behavioural believability, demonstrating error tolerance. The results point to both the capabilities and challenges of using LLM agents to simulate policy decision-making processes of bounded rational human actors and complex institutional dynamics, such as LLM agents’ high flexibility and autonomy, alongside the complicatedness of agent workflow design and reliability in coupling with existing programmed land use systems. These insights contribute to advancing land system modelling and the broader field of institutional analysis, providing new tools and methodologies for researchers and policy-makers.

Received: 26 Aug 2024 – Discussion started: 21 Oct 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5455 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5455 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

15 Aug 2025

InsNet-CRAFTY v1.0: integrating institutional network dynamics powered by large language models with land use change simulation

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Geosci. Model Dev., 18, 4983–5013, https://doi.org/10.5194/gmd-18-4983-2025,https://doi.org/10.5194/gmd-18-4983-2025, 2025

Short summary

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2661', Anonymous Referee #1, 05 Dec 2024

Summary of the review
Zeng et al. developed an innovative LLM model that simulates interactions between institutional agents that can mimic reasoning, planning and action. The model is novel because it addresses key challenges that learning and memory and polycentricity and because it is linked to an agent-based model that simulates changes in land use and livelihoods. The development of the LLM model described in the paper is ambitious and challenging. Certainly, it cannot be expected that all issues and challenges are yet addressed and that it completely functions as intended. The authors describe the challenges they occur and how they may solve them. It is impressive that the authors make sure that everything becomes available open access.

I read the paper with pleasure. It is generally well-written, novel and informative. However, there are a number of things that need improving in my opinion. That is why I recommend major revisions. In the attached pdf file, the authors can find detailed comments. Below they can find a summary of the issues that are in, my opinion, important to address.

The experimental set up:
The intention of the paper is to test the model and simulate institutional actor’s behavior in the land system. Many different types of policy goals can be tested and different types of actors with different types of profiles and ways in which they interact can be chosen. The choices made in the experimental set up affect the outcome of the experiment. At the moment, limited rationale is provided for the experimental set up by the authors. There is no rationale provided for the choice of the SSP-RCP scenario. Additionally, limited rationale is provided for the choice of starting conditions and the types of policies that are considered. Limited rationale is provided for the choice of the combination of agents in the experiment. As this all influences the outcomes of the experiment, it is important that such rationales are provided. There is limited rationale provided for focusing only on the response of institutional agents to EU land system dynamics, without considering effects other regions in the world may have had on the results. Additionally, it is important to discuss how the experimental set up could have influenced the outcomes in unintended ways and which limitations of the model could have been accidentally missed because they did not come into play because of the way the experiment was set up. It would be great if the authors could address this thoroughly in the paper, so that the value of doing this particular experiment, but also its limitations, becomes clearer. At the moment, it was difficult for me to judge if the model is sufficiently tested using through this one experiment to run other types of scenarios with other policy targets, other institutional agents etc. Or whether more tests and sensitivity analyses are necessary for the model to be used more broadly. Especially since the outcomes of budget surplus for PAs and budget deficit for agriculture are a bit counter-intuitive in my perception and seem to reveal an overreliance of agents on policy documents.

Errors and robustness:
The authors speak of error proneness, error tolerance and robustness but these terms are not defined and the process of testing for this is not explained in the methodology. Usually, these terms are used in modelling literature in the context of quantitative sensitivity analysis but here they are used to refer to some unexpected or undesirable behavior of institutional agents. I find this personally a bit confusing, as I do not see so well how the error and robustness of the model could be derived from a qualitative assessment of the agents’ behavioral patterns. Therefore, I would recommend to either use different terms, such as undesired or unexpected agent behavior or to really well define the terms around error well and thoroughly describe in the methodology how the authors assessed the errors. If the authors would really like to emphasize error proneness and robustness in the more traditional modelling sense, I would recommend the authors to do additional analyses. For example, to add a sensitivity analysis with different starting conditions, different environmental and social goals or different combinations of agents with different profiles, etc.

Information lacking to interpret results well:
As the LLM model is linked to CRAFTY, it is of course not possible to describe every dynamic of both models in detail in this paper. However, to understand the results and discussion some fundamental modelling assumptions were missing from the main paper, such as the way the budget is modelled and how the agents, for example know how much budget is needed etc. It would be great if the authors could provide more detailed descriptions of such assumptions and modelling choices, so that the results can be more easily interpreted. Or to very explicitly refer the appendices that are adjusted in such a way that the reader can understand the results and their interpretation easily after reading them.

Writing:
Although the model is intended to mimic real-life situations, the description of the model and the results, as well as the discussion of the results remains very high-level and abstract. I would highly recommend including real-life examples of agents in the context of the EU and to discuss the results in context of dynamics at play in the EU. This would all make it a bit more tangible. In particular I would recommend including a discussion of the outcome of the model in context of what happened in the EU in the past and what has been found in previous studies.

Writing style:
The paper is generally well-written. Yet, in some parts of the paper jargon is used and quite some terms that would be up for interpretation remain undefined. It would be good to more specifically define some of the terms, so that the model and results are easier to interpret by readers of different disciplines. This is important because the model can be used in interdisciplinary settings and, when linked to other models, such as CRAFTY can influence land use modelling, which is a different field again altogether. I have put comments throughout the paper that are hopefully helpful to address this.

Citation: https://doi.org/10.5194/egusphere-2024-2661-RC1
- AC1: 'Reply on RC1', Yongchao Zeng, 04 Feb 2025
  
  Thank you for your valuable feedback. Please see the attached file for our detailed, point‐by‐point responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2661-AC1
RC2:
'RC2 Comment on egusphere-2024-2661', Anonymous Referee #2, 07 Jan 2025

In the manuscript (submitted to GMD) “InsNet-CRAFTY v1.0: Integrating institutional network dynamics powered by large language models with land use change simulation”, Yongchao Zeng, together with his collaborators, has developed a very interesting and powerful technique to use multiple institutional agents each with its own large language model (LLM) prompt history, together with a land-use change model (the latter based upon the CRAFTY model). For the entire European region, they can simulate the inter-institutional dynamics, with unstructured text (i.e., bullet-point recommendations) and numerical output being passed from one institutional agent to another, driving both the changing meat production and the changing percent of land that is a protected area. The agents that are defined are as diverse as a lawyer agent that is familiar with European law, to lobbyist agents that take the side either of agriculture or of environmental advocacy, and further to a high-level institution agent that has long-range goals in mind and that integrates the advice of other agents and prompts the other agents to try to achieve its goals. I am particularly impressed with this paper, never having imagined LLM chatbots that talk to each other, and furthermore never having imagined that these LLM chatbots can be defined with the prompt engineering to groom them as specialist institutional chatbots that can drive a land-use simulator. The writing (grammar, structure, etc.) is of very high quality. I only ask for minor revisions, which I enumerate below.
Lines 65-66: Holzhauer et al. (2019): This reference is missing in the list of references.
Line 251: Why not SSP3 or SSP5 for the changing climate? SSP1 has little change climate-wise from the current time.
Line 286: how long is an iteration in days or months or years?
Line 296: What are the differences between the definitions and between performance of Llama-3-70b-8192 and gpt-4o, listed below in Table 1?
Line 301: Table 1: Maybe “Wiring” needs to be defined?
Line 306: Is this amount of output for the whole period of 2016-2076? Or is it per iteration? I'm a bit surprised that the amount of output is so small. If you're simulating land use over all of Europe with a 5-arcminute spatial resolution, I would expect a lot more output, especially if different countries have different policies.
Line 371: What does a “link” between nodes signify in a word graph?
Line 500: If you don't discuss this elsewhere here in this paper, it might be useful to know: how much your computers or the LLM computers need to work to produce these results? And how long from start to finish does a simulation take?

Also, in addition to the graphs, I would be particularly interested in seeing (for example) a time-ordered list of bullet points that are output by the various institutions. (This is to get more of a flavor of what messages are being passed between agents.)

Citation: https://doi.org/10.5194/egusphere-2024-2661-RC2
- AC2: 'Reply on RC2', Yongchao Zeng, 04 Feb 2025
  
  Thank you very much for reviewing our manuscript. Please see the attached file for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2661-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2661', Anonymous Referee #1, 05 Dec 2024

Summary of the review
Zeng et al. developed an innovative LLM model that simulates interactions between institutional agents that can mimic reasoning, planning and action. The model is novel because it addresses key challenges that learning and memory and polycentricity and because it is linked to an agent-based model that simulates changes in land use and livelihoods. The development of the LLM model described in the paper is ambitious and challenging. Certainly, it cannot be expected that all issues and challenges are yet addressed and that it completely functions as intended. The authors describe the challenges they occur and how they may solve them. It is impressive that the authors make sure that everything becomes available open access.

I read the paper with pleasure. It is generally well-written, novel and informative. However, there are a number of things that need improving in my opinion. That is why I recommend major revisions. In the attached pdf file, the authors can find detailed comments. Below they can find a summary of the issues that are in, my opinion, important to address.

The experimental set up:
The intention of the paper is to test the model and simulate institutional actor’s behavior in the land system. Many different types of policy goals can be tested and different types of actors with different types of profiles and ways in which they interact can be chosen. The choices made in the experimental set up affect the outcome of the experiment. At the moment, limited rationale is provided for the experimental set up by the authors. There is no rationale provided for the choice of the SSP-RCP scenario. Additionally, limited rationale is provided for the choice of starting conditions and the types of policies that are considered. Limited rationale is provided for the choice of the combination of agents in the experiment. As this all influences the outcomes of the experiment, it is important that such rationales are provided. There is limited rationale provided for focusing only on the response of institutional agents to EU land system dynamics, without considering effects other regions in the world may have had on the results. Additionally, it is important to discuss how the experimental set up could have influenced the outcomes in unintended ways and which limitations of the model could have been accidentally missed because they did not come into play because of the way the experiment was set up. It would be great if the authors could address this thoroughly in the paper, so that the value of doing this particular experiment, but also its limitations, becomes clearer. At the moment, it was difficult for me to judge if the model is sufficiently tested using through this one experiment to run other types of scenarios with other policy targets, other institutional agents etc. Or whether more tests and sensitivity analyses are necessary for the model to be used more broadly. Especially since the outcomes of budget surplus for PAs and budget deficit for agriculture are a bit counter-intuitive in my perception and seem to reveal an overreliance of agents on policy documents.

Errors and robustness:
The authors speak of error proneness, error tolerance and robustness but these terms are not defined and the process of testing for this is not explained in the methodology. Usually, these terms are used in modelling literature in the context of quantitative sensitivity analysis but here they are used to refer to some unexpected or undesirable behavior of institutional agents. I find this personally a bit confusing, as I do not see so well how the error and robustness of the model could be derived from a qualitative assessment of the agents’ behavioral patterns. Therefore, I would recommend to either use different terms, such as undesired or unexpected agent behavior or to really well define the terms around error well and thoroughly describe in the methodology how the authors assessed the errors. If the authors would really like to emphasize error proneness and robustness in the more traditional modelling sense, I would recommend the authors to do additional analyses. For example, to add a sensitivity analysis with different starting conditions, different environmental and social goals or different combinations of agents with different profiles, etc.

Information lacking to interpret results well:
As the LLM model is linked to CRAFTY, it is of course not possible to describe every dynamic of both models in detail in this paper. However, to understand the results and discussion some fundamental modelling assumptions were missing from the main paper, such as the way the budget is modelled and how the agents, for example know how much budget is needed etc. It would be great if the authors could provide more detailed descriptions of such assumptions and modelling choices, so that the results can be more easily interpreted. Or to very explicitly refer the appendices that are adjusted in such a way that the reader can understand the results and their interpretation easily after reading them.

Writing:
Although the model is intended to mimic real-life situations, the description of the model and the results, as well as the discussion of the results remains very high-level and abstract. I would highly recommend including real-life examples of agents in the context of the EU and to discuss the results in context of dynamics at play in the EU. This would all make it a bit more tangible. In particular I would recommend including a discussion of the outcome of the model in context of what happened in the EU in the past and what has been found in previous studies.

Writing style:
The paper is generally well-written. Yet, in some parts of the paper jargon is used and quite some terms that would be up for interpretation remain undefined. It would be good to more specifically define some of the terms, so that the model and results are easier to interpret by readers of different disciplines. This is important because the model can be used in interdisciplinary settings and, when linked to other models, such as CRAFTY can influence land use modelling, which is a different field again altogether. I have put comments throughout the paper that are hopefully helpful to address this.

Citation: https://doi.org/10.5194/egusphere-2024-2661-RC1
- AC1: 'Reply on RC1', Yongchao Zeng, 04 Feb 2025
  
  Thank you for your valuable feedback. Please see the attached file for our detailed, point‐by‐point responses to your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2661-AC1
RC2:
'RC2 Comment on egusphere-2024-2661', Anonymous Referee #2, 07 Jan 2025

In the manuscript (submitted to GMD) “InsNet-CRAFTY v1.0: Integrating institutional network dynamics powered by large language models with land use change simulation”, Yongchao Zeng, together with his collaborators, has developed a very interesting and powerful technique to use multiple institutional agents each with its own large language model (LLM) prompt history, together with a land-use change model (the latter based upon the CRAFTY model). For the entire European region, they can simulate the inter-institutional dynamics, with unstructured text (i.e., bullet-point recommendations) and numerical output being passed from one institutional agent to another, driving both the changing meat production and the changing percent of land that is a protected area. The agents that are defined are as diverse as a lawyer agent that is familiar with European law, to lobbyist agents that take the side either of agriculture or of environmental advocacy, and further to a high-level institution agent that has long-range goals in mind and that integrates the advice of other agents and prompts the other agents to try to achieve its goals. I am particularly impressed with this paper, never having imagined LLM chatbots that talk to each other, and furthermore never having imagined that these LLM chatbots can be defined with the prompt engineering to groom them as specialist institutional chatbots that can drive a land-use simulator. The writing (grammar, structure, etc.) is of very high quality. I only ask for minor revisions, which I enumerate below.
Lines 65-66: Holzhauer et al. (2019): This reference is missing in the list of references.
Line 251: Why not SSP3 or SSP5 for the changing climate? SSP1 has little change climate-wise from the current time.
Line 286: how long is an iteration in days or months or years?
Line 296: What are the differences between the definitions and between performance of Llama-3-70b-8192 and gpt-4o, listed below in Table 1?
Line 301: Table 1: Maybe “Wiring” needs to be defined?
Line 306: Is this amount of output for the whole period of 2016-2076? Or is it per iteration? I'm a bit surprised that the amount of output is so small. If you're simulating land use over all of Europe with a 5-arcminute spatial resolution, I would expect a lot more output, especially if different countries have different policies.
Line 371: What does a “link” between nodes signify in a word graph?
Line 500: If you don't discuss this elsewhere here in this paper, it might be useful to know: how much your computers or the LLM computers need to work to produce these results? And how long from start to finish does a simulation take?

Also, in addition to the graphs, I would be particularly interested in seeing (for example) a time-ordered list of bullet points that are output by the various institutions. (This is to get more of a flavor of what messages are being passed between agents.)

Citation: https://doi.org/10.5194/egusphere-2024-2661-RC2
- AC2: 'Reply on RC2', Yongchao Zeng, 04 Feb 2025
  
  Thank you very much for reviewing our manuscript. Please see the attached file for our responses.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2661-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Yongchao Zeng on behalf of the Authors (14 Mar 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (17 Mar 2025) by Christoph Müller

RR by Anonymous Referee #2 (27 Mar 2025)

RR by Anonymous Referee #1 (07 Apr 2025)

ED: Publish subject to minor revisions (review by editor) (07 Apr 2025) by Christoph Müller

AR by Yongchao Zeng on behalf of the Authors (17 Apr 2025) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (22 Apr 2025) by Christoph Müller

AR by Yongchao Zeng on behalf of the Authors (26 May 2025) Manuscript

Journal article(s) based on this preprint

15 Aug 2025

InsNet-CRAFTY v1.0: integrating institutional network dynamics powered by large language models with land use change simulation

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Geosci. Model Dev., 18, 4983–5013, https://doi.org/10.5194/gmd-18-4983-2025,https://doi.org/10.5194/gmd-18-4983-2025, 2025

Short summary

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Data sets

InsNet-CRAFTY v1.0 [data set] Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell https://doi.org/10.5281/zenodo.13944650

Model code and software

InsNet-CRAFTY v1.0 [code] Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell https://doi.org/10.5281/zenodo.13356487

Yongchao Zeng, Calum Brown, Mohamed Byari, Joanna Raymond, Thomas Schmitt, and Mark Rounsevell

Viewed

Total article views: 545 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
379	132	34	545	22	44

HTML: 379
PDF: 132
XML: 34
Total: 545
BibTeX: 22
EndNote: 44

Views and downloads (calculated since 21 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	71	16	7	94
Nov 2024	58	10	4	72
Dec 2024	22	11	1	34
Jan 2025	41	8	6	55
Feb 2025	42	16	6	64
Mar 2025	27	8	0	35
Apr 2025	30	34	0	64
May 2025	22	4	2	28
Jun 2025	32	13	7	52
Jul 2025	28	7	0	35
Aug 2025	6	5	1	12

Cumulative views and downloads (calculated since 21 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	71	16	7	94
Nov 2024	58	10	4	72
Dec 2024	22	11	1	34
Jan 2025	41	8	6	55
Feb 2025	42	16	6	64
Mar 2025	27	8	0	35
Apr 2025	30	34	0	64
May 2025	22	4	2	28
Jun 2025	32	13	7	52
Jul 2025	28	7	0	35
Aug 2025	6	5	1	12

Viewed (geographical distribution)

Total article views: 542 (including HTML, PDF, and XML) Thereof 542 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 15 Aug 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5455 KB)
Metadata XML

Short summary

Understanding environmental policy interventions is challenging due to complex institutional actor interactions. Large language models (LLMs) offer new solutions by mimicking the actors. We present InsNet-CRAFTY v1.0, a multi-LLM-agent model coupled with a land system model, simulating competing policy priorities. The model shows how LLM agents can simulate decision-making in institutional networks, highlighting both their potential and limitations in advancing land system modelling.

InsNet-CRAFTY v1.0: Integrating institutional network dynamics powered by large language models with land use change simulation

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Data sets

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.


Total:	0
HTML:	0
PDF:	0
XML:	0