Preprints
https://doi.org/10.5194/egusphere-2025-6079
https://doi.org/10.5194/egusphere-2025-6079
17 Dec 2025
 | 17 Dec 2025
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

A guide to optimised spatiotemporal data co-location by mutual information maximisation

Andrew Steven Martin, Heather Guy, Michael Ray Gallagher, and Ryan Reynolds Neely III

Abstract. The matching of data described on different coordinate systems between multiple data sources spatiotemporal co-location is a necessary and crucial step in geospatial data synthesis and validation. The particular choice of co-location scheme, and the choice of parameters applied to it, decide what subsets of the original datasets are included in downstream analyses, affecting the quantitative outputs of comparison studies and multi-retrieval synthesised datasets. Previously, no generalised framework for deciding how best to co-locate data has existed. We outline a domain- and data-agnostic framework that generalises the process of selecting an optimised co-location parametrisation for a given co-location scheme, by maximising the mutual information encoded between the data included in the subsequent analyses. We demonstrate the framework by applying it to a comparison of vertical cloud fraction profiles retrieved from the polar-orbiting ICESat-2 satellite's ATL09 data product, and surface-based observations at four Cloudnet observatories. We evaluate per-site optimised co-location parametrisations and find that using the optimised co-location parametrisations quantitatively improves the comparison between the datasets over naive choices of co-location parameters. This work has implications across almost all remote sensing data products especially for satellite validations and will facilitate deep learning methodologies by producing paired datasets with the maximal information about the structure between datasets available to be learned.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Andrew Steven Martin, Heather Guy, Michael Ray Gallagher, and Ryan Reynolds Neely III

Status: open (until 22 Jan 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Andrew Steven Martin, Heather Guy, Michael Ray Gallagher, and Ryan Reynolds Neely III

Data sets

Mutual information maximisation for spatiotemporal co-location: ICESat-2 ATL09 and Cloudnet categorize Andrew Martin https://doi.org/10.5281/zenodo.17817304

Interactive computing environment

DAndrewA/a-guide-to-optimised-spatiotemporal-data-co-location-by-mutual-information-maximisation: v1.0.1 Andrew Martin https://doi.org/10.5281/zenodo.17830442

Andrew Steven Martin, Heather Guy, Michael Ray Gallagher, and Ryan Reynolds Neely III
Metrics will be available soon.
Latest update: 17 Dec 2025
Download
Short summary
Matching geospatial data between datasets recorded on different coordinate systems requires choosing parameters that impact the subset of data in downstream analyses. We developed a framework to optimise the choice of parameters by maximising the mutual information between the data being compared. The optimised parameters vary spatially, and using the optimised parameters results in better comparisons between data than using fixed choices of parameters.
Share