the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A guide to optimised spatiotemporal data co-location by mutual information maximisation
Abstract. The matching of data described on different coordinate systems between multiple data sources – spatiotemporal co-location – is a necessary and crucial step in geospatial data synthesis and validation. The particular choice of co-location scheme, and the choice of parameters applied to it, decide what subsets of the original datasets are included in downstream analyses, affecting the quantitative outputs of comparison studies and multi-retrieval synthesised datasets. Previously, no generalised framework for deciding how best to co-locate data has existed. We outline a domain- and data-agnostic framework that generalises the process of selecting an optimised co-location parametrisation for a given co-location scheme, by maximising the mutual information encoded between the data included in the subsequent analyses. We demonstrate the framework by applying it to a comparison of vertical cloud fraction profiles retrieved from the polar-orbiting ICESat-2 satellite's ATL09 data product, and surface-based observations at four Cloudnet observatories. We evaluate per-site optimised co-location parametrisations and find that using the optimised co-location parametrisations quantitatively improves the comparison between the datasets over naive choices of co-location parameters. This work has implications across almost all remote sensing data products – especially for satellite validations – and will facilitate deep learning methodologies by producing paired datasets with the maximal information about the structure between datasets available to be learned.
- Preprint
(4964 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-6079', Anonymous Referee #1, 23 Jan 2026
-
RC2: 'Comment on egusphere-2025-6079', Anonymous Referee #2, 25 Jan 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-6079/egusphere-2025-6079-RC2-supplement.pdf
Data sets
Mutual information maximisation for spatiotemporal co-location: ICESat-2 ATL09 and Cloudnet categorize Andrew Martin https://doi.org/10.5281/zenodo.17817304
Interactive computing environment
DAndrewA/a-guide-to-optimised-spatiotemporal-data-co-location-by-mutual-information-maximisation: v1.0.1 Andrew Martin https://doi.org/10.5281/zenodo.17830442
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 158 | 92 | 16 | 266 | 9 | 11 |
- HTML: 158
- PDF: 92
- XML: 16
- Total: 266
- BibTeX: 9
- EndNote: 11
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The paper outlines an objective method to identify the parameters of a collocation scheme, illustrated for the comparison of ICESAT-2 to Cloudnet profiles of cloud mask by the selection of a radial separation in the satellite track and temporal window for the ground-based field. The algorithm optimises the mutual information content provided by paired collocation observations, arguing that that value increases as the volume of data considered increases until such time as uncorrelated observations begin to contaminate the set.
I cannot more strongly recommend this paper for publication. It was an absolute delight to read and an astonishingly good document for a junior researcher. I have some minor comments and corrections that may assist in the uptake of this method by the atmospheric science community (who are generally unfamiliar with formal mathematics or statistics), but mostly wish to thank the authors for providing me with a rewarding read. I look forward to applying the technique when I next need to run a validation study.
Minor comments:
Technical corrections: