Preprints
https://doi.org/10.5194/egusphere-2024-138
https://doi.org/10.5194/egusphere-2024-138
24 Jan 2024
 | 24 Jan 2024

Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Abstract. Spatial proxies such as coordinates and Euclidean distance fields are often added as predictors in random forest models; however, their suitability in different predictive conditions has not yet been thoroughly assessed. We investigated 1) the conditions under which spatial proxies are suitable, 2) the reasons for such adequacy, and 3) how proxy suitability can be assessed using cross-validation.

In a simulation and two case studies, we found that adding spatial proxies improved model performance when both residual spatial autocorrelation, and regularly or randomly-distributed training samples, were present. Otherwise, inclusion of proxies was neutral or counterproductive and resulted in feature extrapolation for clustered samples. Random k-fold cross-validation systematically favoured models with spatial proxies even when not appropriate.

As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, and considering alternative inherently spatial RF-GLS models.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Journal article(s) based on this preprint

14 Aug 2024
Random forests with spatial proxies for environmental modelling: opportunities and pitfalls
Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer
Geosci. Model Dev., 17, 6007–6033, https://doi.org/10.5194/gmd-17-6007-2024,https://doi.org/10.5194/gmd-17-6007-2024, 2024
Short summary
Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-138', Anonymous Referee #1, 07 Feb 2024
  • RC2: 'Comment on egusphere-2024-138', Carsten F. Dormann, 07 Feb 2024
  • AC1: 'Comment on egusphere-2024-138', Carles Milà, 02 May 2024

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-138', Anonymous Referee #1, 07 Feb 2024
  • RC2: 'Comment on egusphere-2024-138', Carsten F. Dormann, 07 Feb 2024
  • AC1: 'Comment on egusphere-2024-138', Carles Milà, 02 May 2024

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Carles Milà on behalf of the Authors (30 May 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Publish subject to technical corrections (12 Jun 2024) by Danilo Mello
AR by Carles Milà on behalf of the Authors (17 Jun 2024)  Author's response   Manuscript 

Journal article(s) based on this preprint

14 Aug 2024
Random forests with spatial proxies for environmental modelling: opportunities and pitfalls
Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer
Geosci. Model Dev., 17, 6007–6033, https://doi.org/10.5194/gmd-17-6007-2024,https://doi.org/10.5194/gmd-17-6007-2024, 2024
Short summary
Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Data sets

Code and data for "Random forests with spatial proxies for environmental modelling: opportunities and pitfalls" Carles Milà https://zenodo.org/records/10495235

Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Viewed

Total article views: 618 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
444 144 30 618 24 15
  • HTML: 444
  • PDF: 144
  • XML: 30
  • Total: 618
  • BibTeX: 24
  • EndNote: 15
Views and downloads (calculated since 24 Jan 2024)
Cumulative views and downloads (calculated since 24 Jan 2024)

Viewed (geographical distribution)

Total article views: 636 (including HTML, PDF, and XML) Thereof 636 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 03 Sep 2024
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Spatial proxies such as coordinates and distances are often included as predictors in random forest models for predictive mapping. In a simulation and two case studies, we investigated under which conditions this is appropriate. We found that spatial proxies are not always beneficial and thus we conclude that they should not be used as default approach without careful consideration. We also give insights on the reasons behind their suitability, how to detect it, and potential alternatives.