kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation

Linnenbrink, Jan; Milà, Carles; Ludwig, Marvin; Meyer, Hanna

doi:https://doi.org/10.5194/egusphere-2023-1308

Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Abstract. Random and spatial Cross-Validation (CV) methods are commonly used to evaluate machine learning-based spatial prediction models, and the obtained performance values are often interpreted as map accuracy estimates. However, the appropriateness of such approaches is currently the subject of controversy. For the common case where no probability sample for validation purposes is available, in Milà et al. (2022) we proposed the Nearest Neighbour Distance Matching (NNDM) Leave-One-Out (LOO) CV method. This method produces a distribution of geographical Nearest Neighbour Distances (NND) between test and train locations during CV that matches the distribution of NND between prediction and training locations. Hence, it creates predictive conditions during CV that are comparable to what is required when predicting a defined area. Although NNDM LOO CV produced largely reliable map accuracy estimates in our analysis, as a LOO-based method, it cannot be applied to large datasets found in many studies.

Here, we propose a novel k-fold CV strategy for map accuracy estimation inspired by the concepts of NNDM LOO CV: the k-fold NNDM (kNNDM) CV. The kNNDM algorithm tries to find a k-fold configuration such that the Empirical Cumulative Distribution Function (ECDF) of NND between test and train locations during CV is matched to the ECDF of NND between prediction and training locations.

We tested kNNDM CV in a simulation study with different sampling distributions and compared it to other CV methods including NNDM LOO CV. We found that kNNDM CV performed similarly to NNDM LOO CV and produced reasonably reliable map accuracy estimates across sampling patterns with strong reductions in computation time for large sample sizes. Furthermore, we found a positive linear association between the quality of the match of the two ECDFs in kNNDM and the reliability of the map accuracy estimates.

kNNDM provided the advantages of our original NNDM LOO CV strategy while bypassing its sample size limitations.

Received: 14 Jun 2023 – Discussion started: 05 Jul 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1574 KB)

Download & links

Journal article(s) based on this preprint

07 Aug 2024

kNNDM CV: k-fold nearest-neighbour distance matching cross-validation for map accuracy estimation

Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Geosci. Model Dev., 17, 5897–5912, https://doi.org/10.5194/gmd-17-5897-2024,https://doi.org/10.5194/gmd-17-5897-2024, 2024

Short summary

Country	#	Views	%
Germany	1	437	32
United States of America	2	314	23
China	3	72	5
France	4	57	4
United Kingdom	5	48	3


Total:	0
HTML:	0
PDF:	0
XML:	0

kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Model code and software

Viewed

Viewed (geographical distribution)

Cited

3 citations as recorded by crossref.