Preprints
https://doi.org/10.5194/egusphere-2026-619
https://doi.org/10.5194/egusphere-2026-619
10 Feb 2026
 | 10 Feb 2026
Status: this preprint is open for discussion and under review for Earth Observation (EO).

Scalable Earth Observation Data Cubes for Advanced Analytics of Dynamic Earth Surface Processes: An Open-Source Package for Customized Processing of Sentinel-2 Data on HPCs and Beyond

Baturalp Arisoy, Florian Betz, Georg Stauch, Doris Klein, Stefan Dech, and Tobias Ullmann

Abstract. Earth Observation archives now encompass petabytes of multispectral imagery, yet transforming these heterogeneous collections into analysis-ready data (ARD) cubes remains a critical bottleneck. We present an open-source Python package that unifies cloud masking, co-registration, and super-resolution into a seamless Xarray-based workflow, tailored specifically to close practical gaps in ARD cube generation. Leveraging scalable high-performance computing (HPC) infrastructure, our framework delivers rapid, reproducible cube construction and incremental updates, enabling users to build or extend large time-series data cubes without reprocessing historical scenes. Besides HPCs, our package is also suitable for local processing of Sentinel-2 data. Our approach integrates (1) s2cloudless, a probabilistic cloud-masking algorithm offering user-defined thresholds to overcome the rigid limitations of the Sentinel-2 Scene Classification Layer (SCL) and STAC metadata; (2) AROSICS, a sliding-window co-registration routine that ensures sub-pixel alignment over complex, dynamic landscapes to produce smoother temporal metrics and more consistent change detection; and (3) SEN2SR, a deep-learning super-resolution model that refines all bands to 2.5 m, revealing fine geomorphic and ecological features previously obscured at native resolutions. Together, these components address three recurring ARD cube gaps in existing Xarray-based toolkits: adaptive cloud filtering, robust time-series alignment, and integrated spatial enhancement within a single, reproducible pipeline. To maximize accessibility and reuse, the package is accompanied by well documented, interactive Python notebooks that guide users through configuration, and end-to-end cube generation. Validated on the German Aerospace Center’s terrabyte HPDA clusters, the pipeline runs equally well on local workstation and can be accessed at https://github.com/BaturalpArisoy/stac2cube.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Baturalp Arisoy, Florian Betz, Georg Stauch, Doris Klein, Stefan Dech, and Tobias Ullmann

Status: open (until 05 May 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2026-619', Anonymous Referee #1, 31 Mar 2026 reply
Baturalp Arisoy, Florian Betz, Georg Stauch, Doris Klein, Stefan Dech, and Tobias Ullmann
Baturalp Arisoy, Florian Betz, Georg Stauch, Doris Klein, Stefan Dech, and Tobias Ullmann

Viewed

Total article views: 324 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
210 100 14 324 38 52
  • HTML: 210
  • PDF: 100
  • XML: 14
  • Total: 324
  • BibTeX: 38
  • EndNote: 52
Views and downloads (calculated since 10 Feb 2026)
Cumulative views and downloads (calculated since 10 Feb 2026)

Viewed (geographical distribution)

Total article views: 316 (including HTML, PDF, and XML) Thereof 316 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 04 Apr 2026
Download
Short summary
Earth Observation archives now encompass multispectral imagery, yet producing analysis-ready Sentinel-2 time series remains a critical bottleneck. We present an open-source Python package that builds data cubes from Spatio-Temporal Asset Catalog catalogues and integrates cloud masking, co-registration, and super-resolution in one workflow. Validated on high-performance computing, it enables rapid, reproducible cube construction and updates, improving temporal consistency in a braided river.
Share