Preprints
https://doi.org/10.5194/egusphere-2023-1308
https://doi.org/10.5194/egusphere-2023-1308
05 Jul 2023
 | 05 Jul 2023

kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation

Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Abstract. Random and spatial Cross-Validation (CV) methods are commonly used to evaluate machine learning-based spatial prediction models, and the obtained performance values are often interpreted as map accuracy estimates. However, the appropriateness of such approaches is currently the subject of controversy. For the common case where no probability sample for validation purposes is available, in Milà et al. (2022) we proposed the Nearest Neighbour Distance Matching (NNDM) Leave-One-Out (LOO) CV method. This method produces a distribution of geographical Nearest Neighbour Distances (NND) between test and train locations during CV that matches the distribution of NND between prediction and training locations. Hence, it creates predictive conditions during CV that are comparable to what is required when predicting a defined area. Although NNDM LOO CV produced largely reliable map accuracy estimates in our analysis, as a LOO-based method, it cannot be applied to large datasets found in many studies.

Here, we propose a novel k-fold CV strategy for map accuracy estimation inspired by the concepts of NNDM LOO CV: the k-fold NNDM (kNNDM) CV. The kNNDM algorithm tries to find a k-fold configuration such that the Empirical Cumulative Distribution Function (ECDF) of NND between test and train locations during CV is matched to the ECDF of NND between prediction and training locations.

We tested kNNDM CV in a simulation study with different sampling distributions and compared it to other CV methods including NNDM LOO CV. We found that kNNDM CV performed similarly to NNDM LOO CV and produced reasonably reliable map accuracy estimates across sampling patterns with strong reductions in computation time for large sample sizes. Furthermore, we found a positive linear association between the quality of the match of the two ECDFs in kNNDM and the reliability of the map accuracy estimates.

kNNDM provided the advantages of our original NNDM LOO CV strategy while bypassing its sample size limitations.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Journal article(s) based on this preprint

07 Aug 2024
kNNDM CV: k-fold nearest-neighbour distance matching cross-validation for map accuracy estimation
Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer
Geosci. Model Dev., 17, 5897–5912, https://doi.org/10.5194/gmd-17-5897-2024,https://doi.org/10.5194/gmd-17-5897-2024, 2024
Short summary
Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on egusphere-2023-1308', Nils Tjaden, 07 Jul 2023
  • RC1: 'Comment on egusphere-2023-1308', Italo Goncalves, 23 Aug 2023
  • RC2: 'Comment on egusphere-2023-1308', Anonymous Referee #2, 23 Aug 2023
  • AC1: 'Comment on egusphere-2023-1308', Jan Linnenbrink, 19 Oct 2023

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on egusphere-2023-1308', Nils Tjaden, 07 Jul 2023
  • RC1: 'Comment on egusphere-2023-1308', Italo Goncalves, 23 Aug 2023
  • RC2: 'Comment on egusphere-2023-1308', Anonymous Referee #2, 23 Aug 2023
  • AC1: 'Comment on egusphere-2023-1308', Jan Linnenbrink, 19 Oct 2023

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Jan Linnenbrink on behalf of the Authors (07 Nov 2023)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (03 Dec 2023) by Rohitash Chandra
RR by Italo Goncalves (05 Dec 2023)
RR by Ute Mueller (07 Jan 2024)
RR by Anonymous Referee #4 (07 Jan 2024)
ED: Reconsider after major revisions (25 Jan 2024) by Rohitash Chandra
AR by Jan Linnenbrink on behalf of the Authors (04 Mar 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (08 Apr 2024) by Rohitash Chandra
RR by Ute Mueller (15 Apr 2024)
RR by Wen Luo (17 Apr 2024)
ED: Publish as is (17 Jun 2024) by Rohitash Chandra
AR by Jan Linnenbrink on behalf of the Authors (18 Jun 2024)  Manuscript 

Journal article(s) based on this preprint

07 Aug 2024
kNNDM CV: k-fold nearest-neighbour distance matching cross-validation for map accuracy estimation
Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer
Geosci. Model Dev., 17, 5897–5912, https://doi.org/10.5194/gmd-17-5897-2024,https://doi.org/10.5194/gmd-17-5897-2024, 2024
Short summary
Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Model code and software

kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer https://doi.org/10.6084/m9.figshare.23514135.v1

Jan Linnenbrink, Carles Milà, Marvin Ludwig, and Hanna Meyer

Viewed

Total article views: 1,360 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
1,015 299 46 1,360 46 36
  • HTML: 1,015
  • PDF: 299
  • XML: 46
  • Total: 1,360
  • BibTeX: 46
  • EndNote: 36
Views and downloads (calculated since 05 Jul 2023)
Cumulative views and downloads (calculated since 05 Jul 2023)

Viewed (geographical distribution)

Total article views: 1,358 (including HTML, PDF, and XML) Thereof 1,358 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 03 Sep 2024
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Estimation of map accuracy based on Cross-Validation (CV) in spatial modeling is pervasive but controversial. Here, we build upon our previous work and propose a novel, prediction-oriented k-fold CV strategy for map accuracy estimation in which the distribution of geographical distances between prediction and training points is taken into account when constructing the CV folds. Our method produces more reliable estimates than other CV methods and can be used for large datasets.