the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global pattern of nitrogen metabolism in marine prokaryotes
Abstract. The ocean nitrogen cycle is driven by an ensemble of metabolic processes sustaining marine ecosystems and ocean productivity. However, the spatial distribution and environmental drivers of its major pathways, i.e., nitrogen fixation, denitrification, assimilatory and dissimilatory nitrate reduction to ammonium (ANRA, DNRA), and nitrification are not well known. Furthermore, the taxonomic composition of the prokaryotes supporting each pathway remain incompletely understood. Leveraging newly assembled global marine metagenomic datasets and a state-of-the-art machine learning framework, we inferred the global biogeography of the genomic potential for key metabolic pathways of the marine nitrogen cycle. This was achieved using a multi-output regression of gene read counts against environmental climatologies. Our results reveal distinct biogeographic patterns of genomic potential: anaerobic or light-inhibited pathways are enriched in high-latitude regions, eastern boundary upwelling systems, and deeper ocean layers, while nitrogen fixation and ANRA dominate in oligotrophic gyres. These patterns are consistent with known metabolic strategies, model-based estimates, and underlying taxonomy. Indeed, we identify that Cyanobacteria associate primarily with aerobic, biosynthetic pathways, while Gammaproteobacteria and Nitrososphaeria encode for nitrogen transformations related to energy requirements. By coupling microbial community composition with genome-level information, our approach advances understanding of the microbial foundations of nitrogen transformation pathways and offers new insights on underrepresented processes into biogeochemical models. We highlight the growing value of omic data to better understand marine ecosystem function in relation to environmental gradients and community composition, and their use as a potential observation-based alternative or complement to biogeochemical models.
Status: open (until 06 May 2026)
- RC1: 'Comment on egusphere-2026-1459', Anonymous Referee #1, 25 Apr 2026 reply
-
RC2: 'Comment on egusphere-2026-1459', Anonymous Referee #2, 06 May 2026
reply
General comments:
This work uses correlations between global genomic datasets and biogeochemical environmental data to validate and extrapolate on trends in prokaryotic nitrogen cycling predicted by the CEPHALOPOD statistical model. This is a proof of concept for using metagenomics to predict what the authors refer to as “genomic potential”, or the possibility that cells are expressing the proteins encoded by the genes sequenced in global metagenomic efforts. This work benefits from validation via comparison with known nitrogen cycling trends, and goes on to use the tool for prediction of unknown prokaryotic trends. This is most effective as a source of validation, where the CEPHALOPOD model successfully predicts known trends like increased assimilation of N in light, oxic oligotrophic equatorial waters. However, this doesn’t hold up when predicting N cycling in novel areas. There are base assumptions fed into the model that I think have produced spurious results. Like the first reviewer noted, I have doubts that DNRA and other anaerobic processes are comparatively more prevalent in well oxygenated upwelling systems and polar regions. These analyses hinge on the identification of specific genes to serve as a proxy for processes, which is a difficult thing to do even with metatranscriptomics. I think this paper would be served by a closer look at specific N gene function in eukaryotes, and comparing them to the known geography of prevalent N cycling processes, to use as further validation of the CEPHALOPOD model.
Specific scientific comments:
Light inhibition is not necessarily correlated to anaerobic conditions, especially in high-latitude regions. Those areas are often characterized as oxygen rich, high nutrient, low chlorophyll. This would make oxygen available as an electron receptor during photosynthesis, and eliminate the need for DNRA and other redox reactions. Deeper ocean layers are often oxygen rich while being nutrient poor, depending on depth below the euphotic zone, where oxygen tends to rebound as there are fewer organisms to consume it.
I would consider adjusting your depth-based strata based on region. In oligotrophic gyres, the chlorophyll maximum will often occur at 150m, meaning that the region above that will be well lit and nutrient depleted. Coastal, upwelling regions can have a chlorophyll maximum of 5-10m. Adjusting this will make your environmental data averages more accurate.
Has anyone shown that “genomic potential” has predictive power in prokaryotes? I don’t work with prokaryotes, so I’m not sure if there are studies tying metagenomics to metatranscriptomics on this scale in these organisms. If so, I would prominently include them to bolster your case for DNA as a proxy for likely expression.
I am slightly confused about the choice of nirB and nirD as a proxy for DNRA activity. As far as I know, nirB is also used for nitrate assimilation (ANRA), and some organisms use it concurrently with other forms of nitrite reductase in order to bolster N assimilation when “fresh” N is available. Its use alongside nirD, in the form sometimes referred to as nirBD, is documented as capable of DNRA alongside ANRA. The ratio of nirBD to nirA could be potentially used as evidence of DNRA activity, but I would include evidence that nirBD alone can be used as a DNRA proxy if you are convinced. In prokaryotes, nrfA is often used as the diagnostic gene for DNRA.
L58 -- generally, can you clarify your rates and put context to Tg N/year? It would be helpful to be able meaningfully compare these parts of the nitrogen cycle.
L94 -- denitrification has been shown to happen at some level in all oceans
L106 -- can you be more clear about what a “taxonomic unit” is? A whole genome? A gene? What is a metagenomic “in situ” observation?
125 -- it would be helpful for the reader to enumerate examples where this has been successful?
130 -- can you define the difference between species presence and genomic potential?
169 -- 50 individual observations per sample? Per station?
176 -- regarding enzyme selection: this would benefit from a negative control. Where you don’t see the enzyme, do you not see the activity? This would rule out muddled results caused by enzymes being used for concurrent N cycling activities. It would also help to cite papers where these genes are shown to be responsible for these processes in eukaryotes.
257 -- can you ground this .25 association in anything other than this past publication?
277 -- this is unclear, what makes high latitudes different?
305 -- as I said before, I suspect this DNRA pattern has to do with the boundaries of the euphotic zone being vastly different across these latitudes, where the defined “epipelagic” includes a a fair amount of darkness and anoxia in productive waters
309 -- enhanced nitrogen?
310 -- is the conclusion that more cells are in these regions expressing these genes, or are these genes being comparatively overexpressed? I know you can’t speak to expression data specifically, but it is difficult to grasp the meaning of “genomic potential” without acknowledging it.
464 -- the eastern Pacific upwelling zone, or the California Current Ecosystem, is a high productivity, highly oxygenated system. Maybe you meant the Eastern Tropical North Pacific ODZ?
470 -- how does figure 6B refer to diatoms?
471 -- These papers referring to the potential for diatoms (no capitalization) use of DNRA do not take place in high latitude conditions. Kamp and Stief have exclusively shown diatom DNRA to be associated with benthic conditions: complete anoxia, and induced nitrate storage. It is unlikely that these conditions are met in epipelagic, high latitude water.
480 -- this argument, that genes present don’t necessarily well represent expression, seems to undercut the base assumptions of this paper.
Summary: I like the idea of putting the ample amount of metagenomic data we have to use in models like CEPHALOPOD, and I’m glad to see efforts being put toward this. Novel results will only be credible if they are based on fine-tuned inputs developed from real world biogeochemistry and nitrogen cycling patterns, which I think are a bit lacking in this paper. If the paper pivoted toward using known patterns to validate the model input, I would support this. I would also like to note that some of the conclusions/introductions read as extremely text-generated, and I would encourage the authors to use the results to drive their own conclusions in correlation with existing literature.Citation: https://doi.org/10.5194/egusphere-2026-1459-RC2
Interactive computing environment
Data and code to map the genomic potential for nitrogen metabolism in prokaryotes A. Schickele et al. https://doi.org/10.5281/zenodo.17407277
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 413 | 0 | 1 | 414 | 0 | 0 |
- HTML: 413
- PDF: 0
- XML: 1
- Total: 414
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments:
The authors examine global patterns in key nitrogen cycling pathways in the ocean using genetic datasets. The tool used -- the statistical model CEPHALOPOD -- has been previously published and provides a thorough and statistically rigorous analysis of how the key genes vary with environmental factors. This is a step up from simple DIY correlations between datasets. Most of the results themselves are not novel -- we know that assimilation of N by cyanobacteria should dominate in oligotrophic gyres, and nitrification at depth, for example, but it is still useful to see how these known patterns emerge using this tool. It is very interesting to see the widespread but low abundance coverage of the denitrification genes. However, the conclusion of high relative abundance of DNRA across the ocean, especially at high latitudes in the surface, needs more work to be convincing. Not only is it very surprising (why would an anaerobic process show up in the most oxygenated waters at high latitudes?), but the genetic marker may not be appropriate here. I am not a geneticist or omics expert, but my understanding is that nir B and nir D can be involved in ANRA as well as DNRA (see for example this review: https://doi.org/10.1016/j.synbio.2025.12.017), and ANRA is actually what we expect in high latitude surface waters. I suggest the authors look for nrfA instead and reevaluate their conclusions about DNRA.
Specific scientific comments:
Use of "aerobic" (ex: L. 17): I would not use the term aerobic for autotrophic processes/photosynthesis. I realize the ANRA may be associated with aerobic heterotrophy as well in the surface, which may explain why a small section of the pie in Fig. 6c for ANRA is not cyanobacteria. I and many others consider aerobic to mean O2-consuming, not O2-producing, so may be confusing.
Genes for nitrification: Again, I am not an expert, but why not look for amoA and nxr genes for ammonia and nitrite oxidation? This is what is used for this database on nitrifiers: Tang et al "Database of nitrification and nitrifiers in the global ocean," https://doi.org/10.5194/essd-15-5039-2023 -- which should be cited and with which results here could be compared.
Nitrification taxonomy: It is reassuring that the taxonomy associated with nitrification is Nitros*, but it is perhaps a bit circular to pitch it as a conclusion (considering the literature) since for nitrification, taxonomy and function are so related. It would make the paper stronger overall to pitch this more as a "validation" that the model can find known patterns such as this one.
Fig. 1: There are many wide black arrows, but then you say you did not consider them in this study. Perhaps there is another way to say this that is not so disserving? You are using methods that just identify correlations among datapoints, so probably OK to just not even mention that you cannot consider transport in the model. Same for anammox: you DID consider it, but it was just not sufficiently abundant to include in the statistical analysis.
L. 120: "Habitat modeling" seems like it should represent a much broader category of modeling, rather than just simply this statistical approach. I would assume that any ecosystem model that aims to identify the ecological niche of a population or functional type is a habitat model.
Cutoff (50 observations; line 169): You say that a gene must be present in at least 50 observations "to ensure sufficient spatial coverage" -- this does seem incompatible with many of the anaerobic N cycling transformations in anoxic zones unless you have 50 samples IN anoxic zones. This meant that you couldn't show anammox patterns. Perhaps for these types that didn't make that cut (anammox, and perhaps nrf as a replacement for nirBD for DNRA), it's useful to explain that they do "show up where they should," i.e., the small volumes of anoxic zones, rather than saying "we didn't consider them." You did consider them, but they are just not candidates for global-scale representation which is reasonable!
L. 225: "outliers in the biological inputs were removed." Is this reasonable, given that we do think there is much heterogeneity in microbial communities and BGC fluxes in the environment? What are the implications of this? Is it more like you are finding "average" patterns and ignoring the known deviations? (This seems reasonable to do.)
DNRA results: As I state above, I'm deeply skeptical that DNRA is actually this abundant in oxygenated surface waters. It would be really amazing if these results WERE true, and so, you need to convince me and the other readers that I can believe this. See above major comment about nrf vs. nirBD.
L. 497: Levine et al 2025 is a review, not a model. Cite the models you are referring to directly.
Smaller presentation comments:
L. 85: "ammonium that is being released" -- note that this may also be considered a key N transformation process! Perhaps also mention urea somewhere, as urea cycling is becoming more recognized currently.
L. 107: do you mean "such genes" instead of "such enzymes"?
L. 113: "the one[s] associated" (not "the one")
L. 145: "Doing so, [we] explore"
L. 272: "The three features" -- could you name them here? I am curious at this stage!
L. 302: Could you clarify/specify that "bio-unavailable forms" refers to N fixation and denitrification? This makes sense to look at these two separately from the others, but just took me a second to realize what was going on.
Figs. 4 and 5: Could you just label each of the maps with the process for much easier reference?
L. 408: Again, I would not call cyanobacteria "aerobic."
L. 410/11: Nitros* are named the way they are because they are associated with nitrification, so seems naive to state a result this way (see above comment).
L. 415: "In the lat[t]er"?
L. 416: Is it known that these are diazotrophs? Perhaps more fair to just say non-cyanobacterial groups?