A guide to optimised spatiotemporal data co-location by mutual information maximisation
Abstract. The matching of data described on different coordinate systems between multiple data sources – spatiotemporal co-location – is a necessary and crucial step in geospatial data synthesis and validation. The particular choice of co-location scheme, and the choice of parameters applied to it, decide what subsets of the original datasets are included in downstream analyses, affecting the quantitative outputs of comparison studies and multi-retrieval synthesised datasets. Previously, no generalised framework for deciding how best to co-locate data has existed. We outline a domain- and data-agnostic framework that generalises the process of selecting an optimised co-location parametrisation for a given co-location scheme, by maximising the mutual information encoded between the data included in the subsequent analyses. We demonstrate the framework by applying it to a comparison of vertical cloud fraction profiles retrieved from the polar-orbiting ICESat-2 satellite's ATL09 data product, and surface-based observations at four Cloudnet observatories. We evaluate per-site optimised co-location parametrisations and find that using the optimised co-location parametrisations quantitatively improves the comparison between the datasets over naive choices of co-location parameters. This work has implications across almost all remote sensing data products – especially for satellite validations – and will facilitate deep learning methodologies by producing paired datasets with the maximal information about the structure between datasets available to be learned.