A Bayesian model for quantifying errors in citizen science data: Application to rainfall observations from Nepal

Eisma, Jessica A.; Schoups, Gerrit; Davids, Jeffrey C.; van de Giesen, Nick

doi:https://doi.org/10.5194/egusphere-2023-658

Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, and Nick van de Giesen

Abstract. High quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that: (1) each CS observation is subject to a specific error type, each with its own bias and noise; and (2) an observation's error type depends on the error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS’s assigned community, coupled with model-inferred error probabilities, can identify observations that require verification. With such a system, the onus of validating CS data is partially transferred from human effort to machine-learned algorithms.

Received: 06 Apr 2023 – Discussion started: 15 May 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1324 KB)

Download & links

Journal article(s) based on this preprint

09 Oct 2023

A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, and Nick van de Giesen

Hydrol. Earth Syst. Sci., 27, 3565–3579, https://doi.org/10.5194/hess-27-3565-2023,https://doi.org/10.5194/hess-27-3565-2023, 2023

Short summary

Country	#	Views	%
United States of America	1	139	35
Germany	2	41	10
Netherlands	3	35	8
China	4	28	7
Italy	5	17	4


Total:	0
HTML:	0
PDF:	0
XML:	0

A Bayesian model for quantifying errors in citizen science data: Application to rainfall observations from Nepal

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)