Preprints
https://doi.org/10.5194/egusphere-2023-658
https://doi.org/10.5194/egusphere-2023-658
15 May 2023
 | 15 May 2023

A Bayesian model for quantifying errors in citizen science data: Application to rainfall observations from Nepal

Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, and Nick van de Giesen

Abstract. High quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that: (1) each CS observation is subject to a specific error type, each with its own bias and noise; and (2) an observation's error type depends on the error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS’s assigned community, coupled with model-inferred error probabilities, can identify observations that require verification. With such a system, the onus of validating CS data is partially transferred from human effort to machine-learned algorithms.

Journal article(s) based on this preprint

09 Oct 2023
A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal
Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, and Nick van de Giesen
Hydrol. Earth Syst. Sci., 27, 3565–3579, https://doi.org/10.5194/hess-27-3565-2023,https://doi.org/10.5194/hess-27-3565-2023, 2023
Short summary

Jessica A. Eisma et al.

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-658', Jonathan Paul, 22 May 2023
    • AC1: 'Reply on RC1', Jessica Eisma, 13 Jul 2023
  • RC2: 'Comment on egusphere-2023-658', Björn Weeser, 19 Jun 2023
    • AC2: 'Reply on RC2', Jessica Eisma, 13 Jul 2023

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-658', Jonathan Paul, 22 May 2023
    • AC1: 'Reply on RC1', Jessica Eisma, 13 Jul 2023
  • RC2: 'Comment on egusphere-2023-658', Björn Weeser, 19 Jun 2023
    • AC2: 'Reply on RC2', Jessica Eisma, 13 Jul 2023

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
ED: Publish subject to minor revisions (further review by editor) (28 Jul 2023) by Wouter Buytaert
AR by Jessica Eisma on behalf of the Authors (31 Jul 2023)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (29 Aug 2023) by Wouter Buytaert
AR by Jessica Eisma on behalf of the Authors (29 Aug 2023)  Manuscript 

Journal article(s) based on this preprint

09 Oct 2023
A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal
Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, and Nick van de Giesen
Hydrol. Earth Syst. Sci., 27, 3565–3579, https://doi.org/10.5194/hess-27-3565-2023,https://doi.org/10.5194/hess-27-3565-2023, 2023
Short summary

Jessica A. Eisma et al.

Jessica A. Eisma et al.

Viewed

Total article views: 376 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
264 94 18 376 9 9
  • HTML: 264
  • PDF: 94
  • XML: 18
  • Total: 376
  • BibTeX: 9
  • EndNote: 9
Views and downloads (calculated since 15 May 2023)
Cumulative views and downloads (calculated since 15 May 2023)

Viewed (geographical distribution)

Total article views: 389 (including HTML, PDF, and XML) Thereof 389 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 09 Oct 2023
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Citizen scientists often submit high quality data, but a robust method for assessing data quality is needed. This study develops a semi-automated program that characterizes the mistakes made by citizen scientists by grouping them into communities of citizen scientists with similar mistake tendencies and flags potentially erroneous data for further review. This work may help citizen science programs assess the quality of their data and can inform training practices.