the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ExoCcycle v1.0.0: A Generalized Framework for Spherical Community Detection and its Application to Defining Global Ocean Basins from Multi-Field Data
Abstract. Ocean basins are fundamental units for modeling the Earth system, paleoceanography, and the global carbon cycle. However, their boundaries are often defined heuristically, limiting the robustness of reduced-order models and the interpretation of paleoproxy data, especially in data-limited paleo- or planetary contexts. We present ExoCcycle, an open-source Python library for objective, automated community detection on spherical grids. This framework implements novel composite algorithms (e.g., SB-Reduction) that couple efficient partitioning (Leiden/Louvain) with ensemble-based agglomerative clustering for robust boundary detection. A key technical innovation is our Difference Quantile Transformation Cumulative Density Function (DQT-CDF) edge-weighting scheme, enabling the principled analysis of single or multiple, non-normally distributed scalar fields in a large spherical domain. We validate the method using modern bathymetry and temperature/salinity fields, demonstrating that (1) a spatial resolution of 1–2 degrees is necessary to capture critical basin-defining features such as ridges and plateaus; (2) basin boundaries evolve significantly over geological time, underscoring the inadequacy of using static, modern boundaries for past climate simulations; and (3) the ocean's community structure is fundamentally layered – deep basins (defined by bathymetry) are distinct from shallow shelf partitions (shaped by sedimentation, sea-level changes, and riverine fluxes), and surface basins (driven by wind and temperature/precipitation). ExoCcycle provides a systematic tool for generating physically-grounded, time-evolving basin definitions, enabling the development of next-generation modular intermediate-complexity models for Earth and exoplanet habitability. As a generalized spherical community detection tool, our new framework is also broadly applicable to other non-ocean related domains, including ecology and land processes, atmospheric science, solid-Earth geophysics, and planetary science.
- Preprint
(48725 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 22 Apr 2026)
-
CEC1: 'Comment on egusphere-2025-5581 - No compliance with the policy of the journal', Juan Antonio Añel, 11 Feb 2026
reply
-
AC1: 'Reply on CEC1', Matthew Bogumil, 18 Feb 2026
reply
Dear Juan A. Añel,
Re: Edits of manuscript reference No. EGUSPHERE-2025-5581
Thank you for your comments on clarifying the GMD journal’s policies. To address your comments, we have done the following:- Opened to the public, the Zenodo repository which contains the model code and results from the manuscript (10.5281/zenodo.18675443). The model was previously open to the public through github, but a non-DOI link was not previously included in the manuscript since it is not guaranteed to exist in perpetuity. However, a clearly defined GDM-release branch of the ExoCcycle ropo has been produced and will only undergo edits in association with discussion and review (https://github.com/Bogumil-Matthew/ExoCcycle/tree/GMD-release).
- The above repository’s DOI and github link will also be included in the manuscript under the "Code and Data Availability" section.
- Opened to the public, a new Zenodo repository which contains a redistribution (as requested) of data/models that, in an effort to maintain transparency, we had no part in producing, modifying, or contributing to (i.e., ETOPO and Copernicus datasets).
- The above repository’s DOI will be included in the manuscript under the "Code and Data Availability" section and here: 10.5281/zenodo.18664604.
We thank you for your comments and made the necessary changes, striving to comply with EGUSphere’s
Code and Data Policy. However, we understand that it is possible that there are components of the data
policy or best practices that might have been overlooked. We now assume that we are in compliance with
the "Code and Data Policy" unless otherwise stated by a referee, community member, editor, or chief editor.Thank you for your comments and we look forward to future community engagement on this manuscript,
Matthew BogumilCitation: https://doi.org/10.5194/egusphere-2025-5581-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 18 Feb 2026
reply
Dear authors,
Thanks for your reply, and for addressing the outstanding issues. We can consider now the current version of your manuscript in compliance with the policy of the journal. Please, do not include in the Code and Data policy the link to the GitHub site. GitHub sites are not suitable for publication of scientific assets, and GitHub itself recommends to use repositories such as Zenodo for scientific purposes. Therefore, as it does not serve the purpose of the Code and Data policy of the journal, it is better to avoid including it so that readers do not misunderstand where the correct version of the code and data used in this article is stored.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-5581-CEC2
-
AC1: 'Reply on CEC1', Matthew Bogumil, 18 Feb 2026
reply
-
RC1: 'Comment on egusphere-2025-5581', Anonymous Referee #1, 11 Mar 2026
reply
This paper presents a new framework for community detection on spherical grids and shows applications mapping global ocean basins based on bathymetry and oceanographic/climatological datasets. I find this study interesting and agree that such detection algorithms could be very useful and applied to various geophysical problems. However, I have several concerns related to the current form, and the way it is presented, and recommend major revisions. My main concerns are listed below:
The discussion is too long and very broad in stating potential applications of the detection algorithm. It could be significantly shorter and more specific, focusing on few strong applications (see below), I also miss a comparison of the workflow to other similar studies. It looks like the algorithm is very good at defining basins based on physical properties of the ocean and bathymetry. It can analyze different dataset and define ocean basins in a quantitative way, but other than that the manuscript is not clear on how it provides any information that is not already known, or you could detect using other methods. There are also similar frameworks, especially when considering climatological/oceanographic data, and I miss a more complete discussion comparing your approach to other models and explain the possible advantages and limitations of your model. Where I can see this manuscript being useful is in the crossing between solid earth and the ocean/atmosphere classifications of basins. Especially for paleo-purposes. This needs to be explored more.
When using this approach to define basins on paleobathymetric grids. You are analyzing something that is already to a large degree pre-defined in the plate models. E.g. from the plate boundaries in the plate models, you can define all the oceanic basins. Perhaps with some adjustments to gateways and oceanic plateaus. It is still nice to have quantitative way of defining what is basin, but e.g. for an individual paleobathymetry model, it does not seem very useful. I would rather think that if you were to run every 10 Myr of the last 200 Ma, that an algorithm like this (fast, transparent, reproducible) would be nice addition to the plate models. But perhaps mainly a time saving algorithm. Too me it is a bit much focus on how it can detect basins from paleo-bathymetry only. Where I see this could become a good contribution to the field would be to combine the definition from plate models (paleobathymetry) with proxy data to predict the distribution of basins (water mass dependent and topography dependent). That could be a nice guide to predict distribution of water masses, circulation regimes etc.
Box models. There are many different box models in oceanography, climate science, and paleo climate/geochemistry etc. Designed for specific a problem or to be light and fast in order to run with many different configurations (e.g different boundary conditions for paleo purposes). They have well known limitations. The paper seems to frame it in a way the basin detection algorithm shows that basins properties are very different, and it therefore not ideal to use 1,2,3… boxes for ocean/geochemical/climate models in general. Ocean basin heterogeneity is already very well known (as confirmed by a very large number of oceanographic measurements, GCM/ESM runs (and in deep time ESM simulations and proxy data). Here it is presented like something new. Instead of stating that basins have different properties at the present and in the past, can you try to say something about how one would need to change a specific box model (could show a few examples), based on what this workflow detects. Can you for example give a minimum number of boxes to have a consistent model for the Cenozoic? Or would you need to constantly change the number of boxes?
A research question could be something like how many boxes are needed through time to represent different aspects of oceanography / climatology / geochemistry ? E.g. for a specific model of your choice. Or one for ocean circulation, one for geochemistry, and one climate? You can use paleo-bathymetry combined with output from GCMs (eg. DeepMIP or Valdes et al.), but ideally proxies.
I provide more detailed comments here:
Line 10-15: 2 & 3 seems obvious.
Line 35-40: Yes, we still need reduced order models. Basins change in time is well known, better classification does not directly answer this question. One could maybe explore at what times the basin properties is so fundamentally different that one must change assumptions in specific box models. Also, are more boxes better? Take an ocean six box model for example (e.g., Gnanadesikan et al. 2024) designed to model tipping points in the overturning circulation. If the Southern Ocean gateways are closed one might need to make changes to the Southern Ocean box, maybe add another box. The model (and way the boxes are tied together) will need to be revised. That is just one (or two) small paleogeographic changes. The model is still highly idealized, it could get better or worse. For climate or geochemical model going through millions (or billions of years) one generalized box might still be best. I am wondering if there a way one can use your workflow, taking a specific box model, and through time determine the least number of boxes needed for the specific problem the model is trying to solve? That would be more interesting.
40-45: Very general, please be more specific. How will the tools analyze/help quantify conditions for planetary habitability?
Figure 1: I do not see any question marks
Line 73-76: This touch upon something important that I do not see the authors address later. Could for example be very relevant for paleo. If considering the last glacial, or further back in time e.g. the EOT, there are proxy indications and model experiments showing fundamental differences in water masses under the same solid earth (bathymetry) basin structure. For example, you could have completely different deep water in the major ocean basins depending on if the deep water is sourced from the south or from the North Atlantic (or North Pacific). Could be related to orbital cycles, temperatures, icesheets/sea ice, salinity differences. Basin structure will always, to some degree, depend on bathymetry, but one can have quite different structures under the same bathymetric boundary conditions.
Line 112-117: Interesting question but the manuscript do not go much into other or similar approached to trace/group water masses at basin scale. Also, box models generally have vertical walls, no? And does the geometry of the walls of the model even matter for most box models? To properly solve for e.g. bathymetry GCMs might be the option
The basin boundaries are controlled mostly by the specific plate model, you can see this directly from the plate boundaries. And no, it is not reasonable to use modern definitions, but I think that is established.
119: Pelagic and neritic describe water column environments, not the seafloor. No such thing as pelagic bathymetry. Also, paleo-reconstructed bathymetry -> paleo-bathymetry.
Line 487: «continental choke points and gateways». What is the difference? What do you mean by continental choke points?
549-551: Not very specific, please elaborate on how EcoCcycle is able to solve this.
567: Fig 17? Cannot see this in 16.
575-576: Is it assumed that box structure in the ocean should be static?
578-580: Please be more specific and elaborate on how this is/can be done
Section 4.4
On the choice of what is deep and shallow ocean I wound expect some rationale for the choices. E.g. why exclude intermediate waters? The upper parts of the deep field would be quite different from the deeper part. Having different water masses in different basins, but also the same water masses in several basins.
Line 602-610: E.g. «is it appropriate to horizontally average deep-ocean interactions at scales that force the global system into just 3 or 4 basins? » Of course not, and I assume the makers of such box models are aware these limitations. They are highly simplified to solve specific problems. I think the key here, where such an approach might be able to assist is. «How many boxes are needed through time in order gain something for specific models».
You mention mixing throughout the manuscript but says little on why that is important and how you can improve estimations. Mixing is parametrized in nearly all models, and if you can determine something on that from your workflow that would be good to properly describe. Maybe you can get that, as you mention, from SST, bathymetry, salinity etc. but those are all unknown for paleo.
Fig 19: I would like to see a more detailed description of this figure in the text. This goes back to my comments on the boxes; you have an opportunity here to show how your framework can determine number of boxes needed. More boxes are not necessarily better, depends a lot on the type of box model. Fig. 19 would for example likely need different number of boxes depending on the type of model, and geological time like the Cenozoic, Mesozoic, Paleozoic, and can change often/fast on million-year timescales. How would then the models be comparable? Can you find a minimum number of boxes needed for GEOCLIM7 for example?
Section 5:
The authors write what the workflow potentially can be used for, but do not mention how the methodology/workflow compare to other studies. I think this is needed in a discussion.
5.1.1 This section is very vague in general. For example, lines like, «This concept of basin stability is also central to assessing "continued habitability" on a planetary scale. How is this concept central to habitability? And: «Many organisms, from phytoplankton assemblages to coral reef ecosystems, are adapted to specific environmental conditions localized within these basins (e.g., Schoepf et al. (2023)).» Why would basin change matter? Are you talking about continued habitability of ecosystems within specific basins?
5.1.2: How does « ExoCcycle’s workflow provides a systematic and objective method for grouping seafloor sediment core data based on the integrated water column chemistry that influences carbonate solubility»? You have not shown any of this? How does it work with proxies, drill sites, geological information, sediments etc?
Figure 16c: «corresponding to present-day and past river sediment supply from now submerged river systems that span throughout the Sunda Shelf» Please show this correlation with a figure and provide a references!
5.1.3. Alongside discussing the performance, novelties, discrepancies of your model, I think this could be the most important section in your discussion. Yet (although the title suggests so), it says nothing about proxies! Output form deep time GCM simulations are not proxies.
You might be missing what could be the key contribution from your algorithm, which is quantifiable/reproducible basin definitions though time based on paleogeography and paleo proxies. Definitions based on paleobathymetry would be very close to what is already defined by the plate models, but if you can combine that with paleo ocean proxy reconstructions (e.g. temperature, salinity, or isotopes) and make new definitions you might do something quite novel with wide applications for paleo-oceanography/climatology.
Citation: https://doi.org/10.5194/egusphere-2025-5581-RC1
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 306 | 230 | 26 | 562 | 15 | 13 |
- HTML: 306
- PDF: 230
- XML: 26
- Total: 562
- BibTeX: 15
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
In your Code and Data Availability section you state that the code is restricted to Editors and reviewers. This is not acceptable according to the policy of the journal, and a misunderstanding of the Discussions process of GMD. In GMD, the Discussions stage is designed so that all the research community can openly review a manuscript. As you can understand, for it, it is necessary that all the code and datasets are available to anyone anonymously. Also, for some of the data, you link sites (NCEI and Copernicus) where generic databases are posted, and what is necessary is that you store in a suitable repository the exact data that you have used.
Therefore, the current situation with your manuscript is irregular, as it should have never been accepted for peer-review and Discussions in the journal given the above mentioned issues. As I said, the GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive Editor