Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

Milà, Carles; Ludwig, Marvin; Pebesma, Edzer; Tonne, Cathryn; Meyer, Hanna

doi:https://doi.org/10.5194/egusphere-2024-138

Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Abstract. Spatial proxies such as coordinates and Euclidean distance fields are often added as predictors in random forest models; however, their suitability in different predictive conditions has not yet been thoroughly assessed. We investigated 1) the conditions under which spatial proxies are suitable, 2) the reasons for such adequacy, and 3) how proxy suitability can be assessed using cross-validation.

In a simulation and two case studies, we found that adding spatial proxies improved model performance when both residual spatial autocorrelation, and regularly or randomly-distributed training samples, were present. Otherwise, inclusion of proxies was neutral or counterproductive and resulted in feature extrapolation for clustered samples. Random k-fold cross-validation systematically favoured models with spatial proxies even when not appropriate.

As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, and considering alternative inherently spatial RF-GLS models.

Received: 16 Jan 2024 – Discussion started: 24 Jan 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 6247 KB)

Download & links

Journal article(s) based on this preprint

14 Aug 2024

Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

Carles Milà, Marvin Ludwig, Edzer Pebesma, Cathryn Tonne, and Hanna Meyer

Geosci. Model Dev., 17, 6007–6033, https://doi.org/10.5194/gmd-17-6007-2024,https://doi.org/10.5194/gmd-17-6007-2024, 2024

Short summary

Country	#	Views	%
United States of America	1	201	31
Germany	2	130	20
China	3	45	7
France	4	26	4
Spain	5	18	2


Total:	0
HTML:	0
PDF:	0
XML:	0

Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Data sets

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.