08 Jul 2022
 | 08 Jul 2022

Refining data-data and data-model biome comparisons using the Earth Movers' Distance (EMD)

Manuel Chevalier, Anne Dallmeyer, Nils Weitzel, Chenzhi Li, Jean-Philippe Baudouin, Ulrike Herzschuh, Xianyong Cao, and Andreas Hense

Abstract. Biome reconstructions are commonly used in data-data and data-model comparison studies to understand past vegetation dynamics. However, most of these assessments are based on the direct comparison of dominant biomes inferred from pollen samples or vegetation simulations. Dominant biomes are deduced from pollen samples using biome affinity scores, which aggregate pollen percentages of taxa assigned to the different biomes. While this approach generates good results over a large range of temporal and spatial scales, reducing pollen assemblages to a single dominant biome can substantially simplify the vegetation signal preserved in pollen samples and even bias conclusions when, for instance, minimal changes in pollen percentages can change the inferred dominant biome. To resolve these issues, we propose to use the Earth Movers’ distance (EMD) as a new metric to compare distributions of biome scores. The EMD has two main advantages: 1) the distributions of biome scores do not need to be reduced to their dominant biome, and the full breadth of the data is taken into account, and 2) different weights can be given to different types of disagreements to account for the ecological distance (e.g. reconstructing a temperate forest instead of a boreal forest is ecologically less wrong than reconstructing the temperate forest instead of a desert). We also introduce EMD-based statistical tests that determine if the similarity of two samples is significantly better than a random association. This paper illustrates the use of the EMD across a series of palaeoecological data-data and data-model case studies based on published data and simulations. These applications highlight the diverse types of analysis where the EMD adds value compared to analyses of the dominant biomes only. The EMD and the statistical tests are included in the paleotools R package (

Manuel Chevalier et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-489', Anonymous Referee #1, 26 Aug 2022
    • AC1: 'Reply on RC1', Manuel Chevalier, 15 Feb 2023
  • RC2: 'Comment on egusphere-2022-489', Louis François, 18 Nov 2022
    • AC2: 'Reply on RC2', Manuel Chevalier, 15 Feb 2023

Manuel Chevalier et al.

Manuel Chevalier et al.


Total article views: 538 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
367 156 15 538 6 7
  • HTML: 367
  • PDF: 156
  • XML: 15
  • Total: 538
  • BibTeX: 6
  • EndNote: 7
Views and downloads (calculated since 08 Jul 2022)
Cumulative views and downloads (calculated since 08 Jul 2022)

Viewed (geographical distribution)

Total article views: 511 (including HTML, PDF, and XML) Thereof 511 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 21 Mar 2023
Short summary
Data-data and data-model biome comparisons are commonly based on comparing single biome estimates. While this approach generates good results over large temporal and spatial scales, reducing pollen assemblages to a single biome can oversimplify the vegetation signal preserved in pollen samples. We propose to use a multivariate metric, the Earth Movers' Distance (EMD), to include more details about the vegetation structure when performing such comparisons.