Preprints
https://doi.org/10.5194/egusphere-2024-3779
https://doi.org/10.5194/egusphere-2024-3779
10 Jun 2025
 | 10 Jun 2025

MeteoSaver v1.0: a machine-learning based software for the transcription of historical weather data

Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery

Abstract. Archives of observed weather data present unique opportunities for scientists to obtain long time series of the historical climate for many regions of the world. Unfortunately, most of these observational records are to-date available only on paper, and thus require digitization and transcription to facilitate analysis of climatic trends. Here we present a new open-source software, MeteoSaver, that uses machine learning (ML) algorithms to transcribe handwritten records of historical weather data. MeteoSaver version 1.0 processes images of tabular sheets alongside user-defined configuration settings, performing transcription through five sequential steps: (i) image pre-processing, (ii) table and cell detection, (iii) transcription, (iv) quality assessment and quality control, and (v) data formatting and upload. As an illustration and evaluation of the software, we apply MeteoSaver to ten pictured sheets of handwritten temperature observations from the Democratic Republic of the Congo. The results show that 95–100 % of the records can be transcribed, of which a median of 74.4 % reached the highest internal quality flag and 74 % matches with the manually transcribed record, yielding a median mean absolute error of 0.3 °C. These results illustrate that MeteoSaver can be applied to a range of handwriting styles and varying tabular dimensions, paper sizes, and maintenance conditions, highlighting its potential for transcribing tabular meteorological observations from multiple regions, especially if the sheets have a consistent format. Overall, our open-source software can help address the challenges of limited available hydroclimatic data within many regions of the world, by helping to save millions of handwritten records of historical weather data presently stored in archives, and expedite research on the climate and environmental changes in data scarce regions.

Competing interests: At least one of the (co-)authors is a member of the editorial board of Geoscientific Model Development. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share

Journal article(s) based on this preprint

23 Apr 2026
MeteoSaver v1.0: a machine-learning based software for the transcription of historical weather data
Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery
Geosci. Model Dev., 19, 3213–3255, https://doi.org/10.5194/gmd-19-3213-2026,https://doi.org/10.5194/gmd-19-3213-2026, 2026
Short summary
Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-3779', Anonymous Referee #1, 15 Feb 2026
    • AC1: 'Reply on RC1', Derrick Muheki, 20 Mar 2026
    • AC3: 'Reply on RC1', Derrick Muheki, 20 Mar 2026
  • RC2: 'Comment on egusphere-2024-3779', Chris Lennard, 20 Feb 2026
    • AC2: 'Reply on RC2', Derrick Muheki, 20 Mar 2026

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-3779', Anonymous Referee #1, 15 Feb 2026
    • AC1: 'Reply on RC1', Derrick Muheki, 20 Mar 2026
    • AC3: 'Reply on RC1', Derrick Muheki, 20 Mar 2026
  • RC2: 'Comment on egusphere-2024-3779', Chris Lennard, 20 Feb 2026
    • AC2: 'Reply on RC2', Derrick Muheki, 20 Mar 2026

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Derrick Muheki on behalf of the Authors (20 Mar 2026)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (27 Mar 2026) by Taesam Lee
AR by Derrick Muheki on behalf of the Authors (02 Apr 2026)  Manuscript 

Journal article(s) based on this preprint

23 Apr 2026
MeteoSaver v1.0: a machine-learning based software for the transcription of historical weather data
Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery
Geosci. Model Dev., 19, 3213–3255, https://doi.org/10.5194/gmd-19-3213-2026,https://doi.org/10.5194/gmd-19-3213-2026, 2026
Short summary
Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery

Model code and software

MeteoSaver v1.0 Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Koen Hufkens, and Wim Thiery https://doi.org/10.5281/zenodo.14246037

Derrick Muheki, Bas Vercruysse, Krishna Kumar Thirukokaranam Chandrasekar, Christophe Verbruggen, Julie M. Birkholz, Koen Hufkens, Hans Verbeeck, Pascal Boeckx, Seppe Lampe, Ed Hawkins, Peter Thorne, Dominique Kankonde Ntumba, Olivier Kapalay Moulasa, and Wim Thiery

Viewed

Total article views: 8,440 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
7,418 896 126 8,440 94 98 130
  • HTML: 7,418
  • PDF: 896
  • XML: 126
  • Total: 8,440
  • Supplement: 94
  • BibTeX: 98
  • EndNote: 130
Views and downloads (calculated since 10 Jun 2025)
Cumulative views and downloads (calculated since 10 Jun 2025)

Viewed (geographical distribution)

Total article views: 8,441 (including HTML, PDF, and XML) Thereof 8,441 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 05 May 2026
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Archives worldwide host vast records of observed weather data crucial for understanding climate variability. However, most of these records are still in paper form, limiting their use. To address this, we developed MeteoSaver, an open-source tool, to transcribe these records to machine-readable format. Applied to ten handwritten temperature sheets, it achieved a median accuracy of 74%. This tool offers a promising solution to preserve records from archives and unlock historical weather insights.
Share