<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" specific-use="SMUR" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">EGUsphere</journal-id>
<journal-title-group>
<journal-title>EGUsphere</journal-title>
<abbrev-journal-title abbrev-type="publisher">EGUsphere</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">EGUsphere</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub"></issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/egusphere-2024-138</article-id>
<title-group>
<article-title>Random forests with spatial proxies for environmental modelling: opportunities and pitfalls</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Milà</surname>
<given-names>Carles</given-names>
<ext-link>https://orcid.org/0000-0003-0470-0760</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ludwig</surname>
<given-names>Marvin</given-names>
<ext-link>https://orcid.org/0000-0002-3010-018X</ext-link>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Pebesma</surname>
<given-names>Edzer</given-names>
<ext-link>https://orcid.org/0000-0001-8049-7069</ext-link>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Tonne</surname>
<given-names>Cathryn</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Meyer</surname>
<given-names>Hanna</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Barcelona Institute for Global Health (ISGlobal), Barcelona, Spain</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Universitat Pompeu Fabra (UPF), Barcelona, Spain</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Institute of Landscape Ecology, University of Münster, Münster, Germany</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Institute of Geoinformatics, University of Münster, Münster, Germany</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>CIBER epidemiología y salud pública (CIBERESP), Madrid, Spain</addr-line>
</aff>
<funding-group>
<award-group id="gs1">
<funding-source>Ministerio de Ciencia e Innovación</funding-source>
<award-id>PRE2020-092303</award-id>
</award-group>
</funding-group>
<pub-date pub-type="epub">
<day>24</day>
<month>01</month>
<year>2024</year>
</pub-date>
<volume>2024</volume>
<fpage>1</fpage>
<lpage>30</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2024 Carles Milà et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://egusphere.copernicus.org/preprints/2024/egusphere-2024-138/">This article is available from https://egusphere.copernicus.org/preprints/2024/egusphere-2024-138/</self-uri>
<self-uri xlink:href="https://egusphere.copernicus.org/preprints/2024/egusphere-2024-138/egusphere-2024-138.pdf">The full text article is available as a PDF file from https://egusphere.copernicus.org/preprints/2024/egusphere-2024-138/egusphere-2024-138.pdf</self-uri>
<abstract>
<p>Spatial proxies such as coordinates and Euclidean distance fields are often added as predictors in random forest models; however, their suitability in different predictive conditions has not yet been thoroughly assessed. We investigated 1) the conditions under which spatial proxies are suitable, 2) the reasons for such adequacy, and 3) how proxy suitability can be assessed using cross-validation.&lt;/p&gt;
&lt;p&gt;In a simulation and two case studies, we found that adding spatial proxies improved model performance when both residual spatial autocorrelation, and regularly or randomly-distributed training samples, were present. Otherwise, inclusion of proxies was neutral or counterproductive and resulted in feature extrapolation for clustered samples. Random k-fold cross-validation systematically favoured models with spatial proxies even when not appropriate.&lt;/p&gt;
&lt;p&gt;As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, and considering alternative inherently spatial RF-GLS models.</p>
</abstract>
<counts><page-count count="30"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>