08 Jul 2022
08 Jul 2022

Refining data-data and data-model biome comparisons using the Earth Movers' Distance (EMD)

Manuel Chevalier1,2, Anne Dallmeyer3, Nils Weitzel4,5, Chenzhi Li6,7, Jean-Philippe Baudouin4,5, Ulrike Herzschuh6,7,8, Xianyong Cao9, and Andreas Hense1 Manuel Chevalier et al.
  • 1Institute of Geosciences, Sect. Meteorology, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
  • 2Institute of Earth Surface Dynamics, Géopolis, University of Lausanne, Switzerland
  • 3Max Planck Institute for Meteorology, Bundesstrasse 53, 20146 Hamburg, Germany
  • 4Institute of Environmental Physics, Heidelberg University, Im Neuenheimer Feld 229, 69120 Heidelberg, Germany
  • 5Department of Geoscience, University of Tübingen, Schnarrenbergstr. 94-96, 72076 Tübingen, Germany
  • 6Polar Terrestrial Environmental Systems, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Telegrafenberg A45, 14473 Potsdam, Germany
  • 7Institute of Environmental Science and Geography, University of Potsdam, Karl-Liebknecht-Str. 24–25, 14476 Potsdam, Germany
  • 8Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24–25, 14476 Potsdam, Germany
  • 9Alpine Paleoecology and Human Adaptation Group (ALPHA), State Key Laboratory of Tibetan Plateau Earth System, Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, 100101 Beijing, China

Abstract. Biome reconstructions are commonly used in data-data and data-model comparison studies to understand past vegetation dynamics. However, most of these assessments are based on the direct comparison of dominant biomes inferred from pollen samples or vegetation simulations. Dominant biomes are deduced from pollen samples using biome affinity scores, which aggregate pollen percentages of taxa assigned to the different biomes. While this approach generates good results over a large range of temporal and spatial scales, reducing pollen assemblages to a single dominant biome can substantially simplify the vegetation signal preserved in pollen samples and even bias conclusions when, for instance, minimal changes in pollen percentages can change the inferred dominant biome. To resolve these issues, we propose to use the Earth Movers’ distance (EMD) as a new metric to compare distributions of biome scores. The EMD has two main advantages: 1) the distributions of biome scores do not need to be reduced to their dominant biome, and the full breadth of the data is taken into account, and 2) different weights can be given to different types of disagreements to account for the ecological distance (e.g. reconstructing a temperate forest instead of a boreal forest is ecologically less wrong than reconstructing the temperate forest instead of a desert). We also introduce EMD-based statistical tests that determine if the similarity of two samples is significantly better than a random association. This paper illustrates the use of the EMD across a series of palaeoecological data-data and data-model case studies based on published data and simulations. These applications highlight the diverse types of analysis where the EMD adds value compared to analyses of the dominant biomes only. The EMD and the statistical tests are included in the paleotools R package (

Manuel Chevalier et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-489', Anonymous Referee #1, 26 Aug 2022
  • RC2: 'Comment on egusphere-2022-489', Louis François, 18 Nov 2022

Manuel Chevalier et al.

Manuel Chevalier et al.


Total article views: 383 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
258 115 10 383 4 4
  • HTML: 258
  • PDF: 115
  • XML: 10
  • Total: 383
  • BibTeX: 4
  • EndNote: 4
Views and downloads (calculated since 08 Jul 2022)
Cumulative views and downloads (calculated since 08 Jul 2022)

Viewed (geographical distribution)

Total article views: 367 (including HTML, PDF, and XML) Thereof 367 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 29 Nov 2022
Short summary
Data-data and data-model biome comparisons are commonly based on comparing single biome estimates. While this approach generates good results over large temporal and spatial scales, reducing pollen assemblages to a single biome can oversimplify the vegetation signal preserved in pollen samples. We propose to use a multivariate metric, the Earth Movers' Distance (EMD), to include more details about the vegetation structure when performing such comparisons.