Technical note: Euclidean Distance Score (EDS) for algorithm performance assessment in aquatic remote sensing

de Liz Arcari, Amanda; Tavora, Juliana; van der Wal, Daphne; Salama, Mhd. Suhyb

doi:https://doi.org/10.5194/egusphere-2025-4343

Preprints

https://doi.org/10.5194/egusphere-2025-4343

Preprints

26 Sep 2025

| 26 Sep 2025

Status: this preprint is open for discussion and under review for Biogeosciences (BG).

Technical note: Euclidean Distance Score (EDS) for algorithm performance assessment in aquatic remote sensing

Amanda de Liz Arcari, Juliana Tavora, Daphne van der Wal, and Mhd. Suhyb Salama

Abstract. In the absence of community consensus, there remains a gap in standardized, consistent performance assessment of remote-sensing algorithms for water-quality retrieval. Although the use of multiple metrics is common, whether reported individually or combined into scoring systems, approaches are often constrained by statistical limitations, redundancy, and dataset- and context-dependent normalizations, leading to subjective or inconsistent interpretations. To address this, we propose the Euclidean Distance Score (EDS), which integrates five statistically appropriate and complementary metrics into a composite score. Capturing three core aspects of performance (regression fit, retrieval error, and robustness), EDS is computed as the Euclidean distance from an idealized point of perfect performance, providing a standardized and interpretable measure. We demonstrate the applicability of EDS in three scenarios: assessing a single algorithm for different retrieved variables, comparing two algorithms on shared retrievals, and evaluating performance across contrasting trophic conditions. By offering an objective framework, EDS supports consistent validation of aquatic remote sensing algorithms and transparent comparisons in varied contexts.

Received: 04 Sep 2025 – Discussion started: 26 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Amanda de Liz Arcari, Juliana Tavora, Daphne van der Wal, and Mhd. Suhyb Salama

Status: open (until 20 Nov 2025)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2025-4343', Richard Stumpf, 20 Oct 2025 reply

The paper proposes a strategy for algorithm comparison/evaluation by designing a single metric to combine multiple metrics. This is a solid progression from previous work (referenced) that looked at metrics for algorithm assessment. The “Euclidan Distance Score” (EDS) is a strong approach to summarize the data. A critical objective of the authors is to identify only the metrics that are relevant, and summarize those, rather than to include lots of (often closely related) metrics and leave it to the reader to make sense of them. I will say that this paper was a pleasure to review, and it will become an excellent paper that should be quite important (and hopefully well used). But it does need revision to make sure it is correct.
A concern with comparing metrics is how to “normalize” those metrics that have quite disparate ranges. This approach addresses it by treating ratios & proportions, and so are unitless. That provides a good approach that is not arbitrary. While it does not force results to be between 0 and 1, it is set up with two strong conditions. An EDS = 1 is “perfect”. Any EDS < 0 is unacceptably poor, and each of the input parameters to the EDS are typically going to be between 0 and 1. The ones that are not (proportional slope deviation, proportional error, and proportional bias), are really unacceptable if the values exceed 1.
I have two large concerns that should be directly solvable. First: the parameters to input. Second is whether the configuration of the equation parameters is correct.

The inputs are R (Pearson correlation coefficient), linear regression slope calculated in log space (m), median ratio error (e ~ epsilon), Median ratio bias (B ~ beta), and valid retrieval ratio (n).
The question is: are these all robust and independent?
Of these, e, B, and n are quite good. It is true that e and B are not actually independent, but as there appears to be no robust means of separating the two (de-biasing the error means calculating mean errors, rather than median errors, which gets into non-robust methods), so we will go with it. As a practical matter a competent product should tend toward a bias ratio of 1. If it does not, then it is punished relatively severely, as e >= B. A biased “low error” model will probably do worse than an unbiased relatively high error model. This should be noted in the paper.
At lines 24-28 the paper notes the problem of using root-mean-square error metrics. This is a critical point. Basically, the paper sets out that robust metrics should be used, which is why the paper proposed median e and B. However, Pearson regression and linear regression slope are least squares solutions. Thiel-Sen slope, or an equivalent, should be used for the slope. This is necessary, as many optical models (or for that matter, many models) often deviate at very low or very high values. That statistical leverage will severely alter a least squares regression slope, but not a robust slope metric.
Regression as a metric has an additional critical flaw: it normalizes to the standard deviation of the data. Therefore, an exact subset of a population that has a smaller range will have a lower R value than the population. (Worse, as observed in Seegers et al., a low error method with a small range of data will have a lower R values than a higher error method with a much larger range of data.) This problem is also seen in Figure 3. Oligotrophic water has the smallest error, but a low R value. The problem is the narrow range of data. Conversely if the range is large enough, R provides no useful information, both good and poor models can have high R values. Because of this problem, including R means that EDS values are not be comparable across the different data sets. (There is a good discussion of the problem of R by a top statistician https://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf ) .
By the way, R and linear regression slope are not independent, slope = S_y/S_x * R.
As to the input metrics, based on appropriate and consistently robust metrics, the appropriate ones would then appear to be
1 median (Thiel=Sen) slope, to capture whether the data generally behaves well across the range. (I will say that I don’t really like slope, but I do not see a better option, as that would involve more complex partitioning alternatives that are difficult to standardize.)
2 median error
3 median bias
4 retrievals n.
Median error and bias do not appear to be correctly specified in EDS equation (7). As these are ratios, shouldn’t they be (e – 1)² and (B-1)²? Both are defined as a ratio of E/O (expected/observed), so a value of 1, is perfect, and should reduce to zero. Equation would be:
EDS = 1 – sqrt [ (m-1)² + (e-1)² + (B-1)² + (n-1)² ]
The authors might ponder thought experiments as examples (suggestion only). I did only one. An algorithm that has all results on an exact line with a slope of 1, but is severely biased. Error (e ~ epsilon) and (B ~ beta) will be equal. If the bias is 2x, which is a low performance, the EDS would return a value of zero.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-4343-RC1

Amanda de Liz Arcari, Juliana Tavora, Daphne van der Wal, and Mhd. Suhyb Salama

Viewed

Total article views: 553 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
497	49	7	553	6	6

HTML: 497
PDF: 49
XML: 7
Total: 553
BibTeX: 6
EndNote: 6

Views and downloads (calculated since 26 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	384	16	3	403
Oct 2025	113	33	4	150

Cumulative views and downloads (calculated since 26 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	384	16	3	403
Oct 2025	113	33	4	150

Viewed (geographical distribution)

Total article views: 546 (including HTML, PDF, and XML) Thereof 546 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Oct 2025

Short summary

We developed a new way to evaluate how well remote sensing-based methods estimate water quality. Instead of relying on many separate indicators, which can give conflicting results, we created a single score that combines them into one objective measure. This approach makes it easier to compare methods across different conditions and helps researchers and managers choose the best tools for understanding and monitoring our aquatic environments.


Total:	0
HTML:	0
PDF:	0
XML:	0