Detection and Tracking of Carbon Biomes via Integrated Machine Learning

Mohanty, Sweety; Patara, Lavinia; Kazempour, Daniyal; Kröger, Peer

doi:https://doi.org/10.5194/egusphere-2024-1369

Preprints

https://doi.org/10.5194/egusphere-2024-1369

Preprints

23 May 2024

| 23 May 2024

Detection and Tracking of Carbon Biomes via Integrated Machine Learning

Sweety Mohanty, Lavinia Patara, Daniyal Kazempour, and Peer Kröger

Abstract. In the framework of a changing climate, it is useful to devise methods capable of effectively assessing and monitoring the changing landscape of air-sea CO₂ fluxes. In this study, we developed an integrated machine learning tool to objectively classify and track marine carbon biomes under seasonally and interannually changing environmental conditions. The tool was applied to the monthly output of a global ocean biogeochemistry model at 0.25° resolution run under atmospheric forcing for the period 1958–2018. Carbon biomes are defined as regions having consistent relations between surface CO₂ fugacity (fCO₂) and its main drivers (temperature, dissolved inorganic carbon, alkalinity). We detected carbon biomes by using an agglomerative hierarchical clustering (HC) methodology applied to spatial target-driver relationships, whereby a novel adaptive approach to cut the HC dendrogram based on the compactness and similarity of the clusters was employed. Based only on the spatial variability of the target-driver relationships and with no prior knowledge on the cluster location, we were able to detect well-defined and geographically meaningful carbon biomes. A deep learning model was constructed to track the seasonal and interannual evolution of the carbon biomes, wherein a feed-forward neural network was trained to assign labels to detected biomes. We find that the area covered by the carbon biomes responds robustly to seasonal variations in environmental conditions. A seasonal alternation between different biomes is observed over the North Atlantic and Southern Ocean. Long-term trends in biome coverage over the 1958–2018 period, namely a 10 % expansion of the subtropical biome in the North Atlantic and a 10 % expansion of the subpolar biome in the Southern Ocean, are suggestive of long-term climate shifts. Our approach thus provides a framework that can facilitate the monitoring of the impacts of climate change on the ocean carbon cycle and the evaluation of carbon cycle projections across Earth System Models.

Received: 16 May 2024 – Discussion started: 23 May 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 8838 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (8838 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

13 Mar 2025

Detection and tracking of carbon biomes via integrated machine learning

Sweety Mohanty, Lavinia Patara, Daniyal Kazempour, and Peer Kröger

Ocean Sci., 21, 587–617, https://doi.org/10.5194/os-21-587-2025,https://doi.org/10.5194/os-21-587-2025, 2025

Short summary

Sweety Mohanty, Lavinia Patara, Daniyal Kazempour, and Peer Kröger

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-1369', Anonymous Referee #1, 25 Jul 2024

The authors employ machine learning techniques in order to identify and analyze marine carbon biomes over space and time. They apply the tool using a global biogeochemistry model and identify 7 unique biomes globally. Analysis of these biomes and the drivers of each allow for conclusions about seasonal variation at regional scales and how these climatic patterns are shifting over time. This tool will be publicly accessible, providing an important resource for future research to improve analysis of ocean carbon and carbon cycle projections. The paper provides an important scientific tool, but I do have some edits recommended before publication-ready.
My largest comment is that in the abstract, the authors mention observation of a 10% expansion of the subtropical biome in the North Atlantic over time, and a 10% expansion of the subpolar biome in the Southern Ocean. These are very interesting results, worthy of highlighting in the abstract, but I felt they weren’t actually expounded upon enough in the results section. We have one paragraph at the end discussing it, but I was curious about putting it in context a bit more with the impacts of climate change and what these changing biomes could imply for the future. Additionally, I felt there was no real visual representation of these shifts. Is there a way to emphasize or include it more clearly in figure 6, or perhaps even mention it in the figure caption, to allow the reader to absorb this information better?
My second comment has to do with clarification of the input data: this was all done using one single model and it’s output, correct? I think some supplementary discussion of the model itself’s strengths and weaknesses could be included—I know, for example, some models have unrealistic mixed layer depths when compared with observations. How would something like this impact these biome patterns? Could there even be a supplementary figure comparing some of this with observations? For example, the figure 7 showing the SST, SSS, and MLD for each biome—how well does this match observations that are for roughly the same geographical region as defined by the machine learning biomes? I believe the paper could benefit from a little added discussion about how this method is employed within a model, and how that applies to future research--does it need to be regenerated with selected observations (if so, what are the base requirements for the obs) or someone's own model to usefully apply the biomes, or can they use your defined biomes explored here, and how does that affect research decisions?
Overall, I do recommend this paper for publication, once these edits have been addressed. I believe it is of scientific importance and a useful contribution to the ocean carbon research community, with potential for serving as a baseline tool for future research, and therefore is an important contribution to the field.

Specific notes:
Line 146+: The authors mention for both fCO2 and DIC, they use natural components rather than contemporary. How are these separated? Also, the note ‘they are substantially similar when using contemporary DIC/fCO2”…does this imply that the influence of anthropogenic carbon is not impacting the biomes? I feel this could be explored with a sentence or two here
Line 151: The authors note they decided to build biomes on target-driver relationships rather than drivers themselves, because it’s better for the methodology. Did they test this, or how do they know this is better?
Line 249+: The authors select January 2009 as the training date for the FNN. They do address the sensitivity of this month selection, and acknowledge the caveats in the discussion, which is both good and necessary. However, they don’t really explain why January 2009 is chosen. What about the year 2009—it’s not in the middle of the analyzed time range, in fact it’s near the end. In addition, why the month of January? I think in the methods, this could be explained with more detail and justification.
Line 370: “Only a couple of years were found to be inconsistent with overall pattern” while looking at the figures, those years were pretty significantly outside the expected pattern. Any theories on why that might be? What was going on in those years? How did it bounce back so quickly, with no longer-term shifts on the biomes?
Figures 6&7: While I know the white box was labeled in a figure, I’d appreciate latitudinal/longitudinal boundaries for the NA and SO regions in both these figure captions
Line 438: “instead of directly environmental parameters,” I believe might be missing a word in this line
Line 484: Should be an extra line space between paragraphs

Citation: https://doi.org/10.5194/egusphere-2024-1369-RC1
- AC1: 'Reply on RC1', Sweety Mohanty, 14 Sep 2024
  
  Dear Reviewer,
  Thank you for your time and effort in reviewing our work. We have attached two documents in the author_response_1.zip: i) author_response_1.pdf contains our detailed response to your comments, and ii) OceanScience_2024_CarbonBiomes_figures_tables.pdf includes a list of new/revised figures and tables.
  Yours sincerely and on behalf of all co-authors,
  
  Sweety Mohanty
  
  Citation: https://doi.org/10.5194/egusphere-2024-1369-AC1
RC2:
'Comment on egusphere-2024-1369', Anonymous Referee #2, 26 Jul 2024

Mohanty and coauthors present a novel approach using an ocean biogeochemistry model and machine learning algorithms to detect ocean biogeochemical provinces (or biomes) based on the relationship between the sea surface fugacity of CO2 and its environmental drivers. The authors further investigate the temporal evolution of the biomes to detect changes in the fCO2.
I very much enjoyed reading the manuscript and I believe it provides a clever way to simplify a non-trivial question: How does the air sea CO2 flux (represented here by the fCO2) change over time and what controls this change? I believe there are many applications for this approach thus, I recommend publication.
I do have, however, a couple of questions and comments, that I believe would strengthen the manuscript:
1) The method section - and in particular section 2.3 onward are difficult to read especially for folks that are not familiar with machine learning. This is the result of the many specific terms used (e.g. "merging at a higher height", "Ward variance", "Euclidian Distance", "Ward Linkage", ...). To make the methods section more accessible to the wider audience of the journal, I would suggest to provide less technical text in the main section and add the required detail and terminology to the appendix.
2) I dont find the argument about the choice of an MLR that convincing. Figure A1 clearly shows that the relationships are not "perfectly linear". Furthermore, the arguments provided on lines 178-180 that the MLR is faster and easier interpretable are only to a certain extend true. Using e.g. a simple single layer FFN instead of the MLR could account for the slight divergence from linearity without compromising on speed. For me, the main argument is interpretability. The single weights of the MLR are easier to interpret and process than the more complex weight Matrixes of a FFN.
3) This may be a misunderstanding on my end, but I am still puzzled why you need a FFN for the time variation in the biomes. I fully understand the approach and I endorse it, but would you not also get changing biomes by doing the MLR followed by the hierarchical clustering for each month/year separately? Thought he changing HC relationships, you would also get changes in the biomes, no?
4) A more general question I had that was not answered in the paper: is your approached that was designed from a single model easily adoptable for other models?
And a couple of smaller things:
.) line 20: please add "annual" to the 25% (the number refers to the present day uptake rate - historically, over the industrial period, the ocean uptake was larger)
.) line 130: Please provide more detail how the outlier removal was done
.)lines 255-270: The architecture of the NN are provided by no justification to why. Have you done some optimalization testing (e.g. on the optimal number of neutrons), or are these subjective choices?
.)line 471: "personality" is an odd choice of wording
.) lines 485-490 are a repeat from the introduction and can be removed in my view

Citation: https://doi.org/10.5194/egusphere-2024-1369-RC2
- AC2: 'Reply on RC2', Sweety Mohanty, 14 Sep 2024
  
  Dear Reviewer,
  Thank you for your time and effort in reviewing our work. We have attached two documents in the author_response_2.zip - i) author_response_2.pdf contains our detailed response to your comments and ii) OceanScience_2024_CarbonBiomes_figures_tables.pdf includes a list of new/revised figures and tables.
  Yours sincerely and on behalf of all co-authors,
  
  Sweety Mohanty
  
  Citation: https://doi.org/10.5194/egusphere-2024-1369-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-1369', Anonymous Referee #1, 25 Jul 2024

The authors employ machine learning techniques in order to identify and analyze marine carbon biomes over space and time. They apply the tool using a global biogeochemistry model and identify 7 unique biomes globally. Analysis of these biomes and the drivers of each allow for conclusions about seasonal variation at regional scales and how these climatic patterns are shifting over time. This tool will be publicly accessible, providing an important resource for future research to improve analysis of ocean carbon and carbon cycle projections. The paper provides an important scientific tool, but I do have some edits recommended before publication-ready.
My largest comment is that in the abstract, the authors mention observation of a 10% expansion of the subtropical biome in the North Atlantic over time, and a 10% expansion of the subpolar biome in the Southern Ocean. These are very interesting results, worthy of highlighting in the abstract, but I felt they weren’t actually expounded upon enough in the results section. We have one paragraph at the end discussing it, but I was curious about putting it in context a bit more with the impacts of climate change and what these changing biomes could imply for the future. Additionally, I felt there was no real visual representation of these shifts. Is there a way to emphasize or include it more clearly in figure 6, or perhaps even mention it in the figure caption, to allow the reader to absorb this information better?
My second comment has to do with clarification of the input data: this was all done using one single model and it’s output, correct? I think some supplementary discussion of the model itself’s strengths and weaknesses could be included—I know, for example, some models have unrealistic mixed layer depths when compared with observations. How would something like this impact these biome patterns? Could there even be a supplementary figure comparing some of this with observations? For example, the figure 7 showing the SST, SSS, and MLD for each biome—how well does this match observations that are for roughly the same geographical region as defined by the machine learning biomes? I believe the paper could benefit from a little added discussion about how this method is employed within a model, and how that applies to future research--does it need to be regenerated with selected observations (if so, what are the base requirements for the obs) or someone's own model to usefully apply the biomes, or can they use your defined biomes explored here, and how does that affect research decisions?
Overall, I do recommend this paper for publication, once these edits have been addressed. I believe it is of scientific importance and a useful contribution to the ocean carbon research community, with potential for serving as a baseline tool for future research, and therefore is an important contribution to the field.

Specific notes:
Line 146+: The authors mention for both fCO2 and DIC, they use natural components rather than contemporary. How are these separated? Also, the note ‘they are substantially similar when using contemporary DIC/fCO2”…does this imply that the influence of anthropogenic carbon is not impacting the biomes? I feel this could be explored with a sentence or two here
Line 151: The authors note they decided to build biomes on target-driver relationships rather than drivers themselves, because it’s better for the methodology. Did they test this, or how do they know this is better?
Line 249+: The authors select January 2009 as the training date for the FNN. They do address the sensitivity of this month selection, and acknowledge the caveats in the discussion, which is both good and necessary. However, they don’t really explain why January 2009 is chosen. What about the year 2009—it’s not in the middle of the analyzed time range, in fact it’s near the end. In addition, why the month of January? I think in the methods, this could be explained with more detail and justification.
Line 370: “Only a couple of years were found to be inconsistent with overall pattern” while looking at the figures, those years were pretty significantly outside the expected pattern. Any theories on why that might be? What was going on in those years? How did it bounce back so quickly, with no longer-term shifts on the biomes?
Figures 6&7: While I know the white box was labeled in a figure, I’d appreciate latitudinal/longitudinal boundaries for the NA and SO regions in both these figure captions
Line 438: “instead of directly environmental parameters,” I believe might be missing a word in this line
Line 484: Should be an extra line space between paragraphs

Citation: https://doi.org/10.5194/egusphere-2024-1369-RC1
- AC1: 'Reply on RC1', Sweety Mohanty, 14 Sep 2024
  
  Dear Reviewer,
  Thank you for your time and effort in reviewing our work. We have attached two documents in the author_response_1.zip: i) author_response_1.pdf contains our detailed response to your comments, and ii) OceanScience_2024_CarbonBiomes_figures_tables.pdf includes a list of new/revised figures and tables.
  Yours sincerely and on behalf of all co-authors,
  
  Sweety Mohanty
  
  Citation: https://doi.org/10.5194/egusphere-2024-1369-AC1
RC2:
'Comment on egusphere-2024-1369', Anonymous Referee #2, 26 Jul 2024

Mohanty and coauthors present a novel approach using an ocean biogeochemistry model and machine learning algorithms to detect ocean biogeochemical provinces (or biomes) based on the relationship between the sea surface fugacity of CO2 and its environmental drivers. The authors further investigate the temporal evolution of the biomes to detect changes in the fCO2.
I very much enjoyed reading the manuscript and I believe it provides a clever way to simplify a non-trivial question: How does the air sea CO2 flux (represented here by the fCO2) change over time and what controls this change? I believe there are many applications for this approach thus, I recommend publication.
I do have, however, a couple of questions and comments, that I believe would strengthen the manuscript:
1) The method section - and in particular section 2.3 onward are difficult to read especially for folks that are not familiar with machine learning. This is the result of the many specific terms used (e.g. "merging at a higher height", "Ward variance", "Euclidian Distance", "Ward Linkage", ...). To make the methods section more accessible to the wider audience of the journal, I would suggest to provide less technical text in the main section and add the required detail and terminology to the appendix.
2) I dont find the argument about the choice of an MLR that convincing. Figure A1 clearly shows that the relationships are not "perfectly linear". Furthermore, the arguments provided on lines 178-180 that the MLR is faster and easier interpretable are only to a certain extend true. Using e.g. a simple single layer FFN instead of the MLR could account for the slight divergence from linearity without compromising on speed. For me, the main argument is interpretability. The single weights of the MLR are easier to interpret and process than the more complex weight Matrixes of a FFN.
3) This may be a misunderstanding on my end, but I am still puzzled why you need a FFN for the time variation in the biomes. I fully understand the approach and I endorse it, but would you not also get changing biomes by doing the MLR followed by the hierarchical clustering for each month/year separately? Thought he changing HC relationships, you would also get changes in the biomes, no?
4) A more general question I had that was not answered in the paper: is your approached that was designed from a single model easily adoptable for other models?
And a couple of smaller things:
.) line 20: please add "annual" to the 25% (the number refers to the present day uptake rate - historically, over the industrial period, the ocean uptake was larger)
.) line 130: Please provide more detail how the outlier removal was done
.)lines 255-270: The architecture of the NN are provided by no justification to why. Have you done some optimalization testing (e.g. on the optimal number of neutrons), or are these subjective choices?
.)line 471: "personality" is an odd choice of wording
.) lines 485-490 are a repeat from the introduction and can be removed in my view

Citation: https://doi.org/10.5194/egusphere-2024-1369-RC2
- AC2: 'Reply on RC2', Sweety Mohanty, 14 Sep 2024
  
  Dear Reviewer,
  Thank you for your time and effort in reviewing our work. We have attached two documents in the author_response_2.zip - i) author_response_2.pdf contains our detailed response to your comments and ii) OceanScience_2024_CarbonBiomes_figures_tables.pdf includes a list of new/revised figures and tables.
  Yours sincerely and on behalf of all co-authors,
  
  Sweety Mohanty
  
  Citation: https://doi.org/10.5194/egusphere-2024-1369-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Sweety Mohanty on behalf of the Authors (18 Oct 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Oct 2024) by Aida Alvera-Azcárate

RR by Anonymous Referee #2 (07 Nov 2024)

RR by Anonymous Referee #1 (21 Nov 2024)

ED: Publish subject to minor revisions (review by editor) (27 Nov 2024) by Aida Alvera-Azcárate

AR by Sweety Mohanty on behalf of the Authors (05 Dec 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (12 Dec 2024) by Aida Alvera-Azcárate

AR by Sweety Mohanty on behalf of the Authors (21 Dec 2024) Manuscript

Journal article(s) based on this preprint

13 Mar 2025

Detection and tracking of carbon biomes via integrated machine learning

Sweety Mohanty, Lavinia Patara, Daniyal Kazempour, and Peer Kröger

Ocean Sci., 21, 587–617, https://doi.org/10.5194/os-21-587-2025,https://doi.org/10.5194/os-21-587-2025, 2025

Short summary

Sweety Mohanty, Lavinia Patara, Daniyal Kazempour, and Peer Kröger

Viewed

Total article views: 527 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
357	140	30	527	22	29

HTML: 357
PDF: 140
XML: 30
Total: 527
BibTeX: 22
EndNote: 29

Views and downloads (calculated since 23 May 2024)

Month	HTML	PDF	XML	Total
May 2024	70	16	5	91
Jun 2024	52	12	3	67
Jul 2024	87	43	10	140
Aug 2024	28	9	3	40
Sep 2024	45	32	5	82
Oct 2024	12	3	0	15
Nov 2024	11	3	2	16
Dec 2024	12	9	0	21
Jan 2025	29	9	2	40
Feb 2025	9	3	0	12
Mar 2025	2	1	0	3

Cumulative views and downloads (calculated since 23 May 2024)

Month	HTML	PDF	XML	Total
May 2024	70	16	5	91
Jun 2024	52	12	3	67
Jul 2024	87	43	10	140
Aug 2024	28	9	3	40
Sep 2024	45	32	5	82
Oct 2024	12	3	0	15
Nov 2024	11	3	2	16
Dec 2024	12	9	0	21
Jan 2025	29	9	2	40
Feb 2025	9	3	0	12
Mar 2025	2	1	0	3

Viewed (geographical distribution)

Total article views: 515 (including HTML, PDF, and XML) Thereof 515 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 Mar 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (8838 KB)
Metadata XML

Short summary

Climate change vastly affects the ocean carbon cycle, demanding methods to assess and monitor ocean carbon uptake. In this study, we devised a machine learning tool to detect and track ocean carbon biomes from 1958 to 2018. These biomes show consistent relationships between surface CO₂ fugacity and its drivers. Using ML methods, we identified and monitored carbon biomes over time, displaying meaningful responses to seasonal and long-term shifts and providing insights into climate change impacts.


Total:	0
HTML:	0
PDF:	0
XML:	0

Detection and Tracking of Carbon Biomes via Integrated Machine Learning

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)