the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Onsite microbiome analysis of stromatolite-like silica structures in a remote subterranean analog martian environments
Abstract. Amorphous silica deposits found in orthoquartzite caves offer valuable analogues for understanding early life on Earth and potential biosignatures on Mars. This study presents the fully on-site microbial community analysis of silica stromatolite-like structures in the ancient and remote orthoquartzite cave Imawarí Yeutá (Auyan Tepui, Venezuela). Using a portable laboratory setup, we performed ATP-based microbial activity assessments and the full DNA-based analysis workflow directly in the cave, without internet access or high computational resources. The data obtained in the cave were then validated in the laboratory using a standard bioinformatics pipeline, qPCR and Biolog EcoPlate assays. The sequencing results revealed that the microbial communities in the stromatolite differ from other biofilms on the cave floor for the higher abundance of Actinobacteriota (particularly the genus Crossiella) and members of Subgroup 13 (Acidobacteriota) suggesting a possible role in the stromatolite formation/development. The ATP-based and Biolog results indicated that the most metabolically active microorganisms are localized in the white layer/colonies at basis of the stromatolite suggesting that the stromatolite development occurs at the interface of this structure with the quartz rock. These findings validate the feasibility of real-time microbial analyses in remote caves with astrobiological interest and provide novel understanding on the microbiological aspects involved in the formation of the silica stromatolites in non-thermal and aphotic environments.
- Preprint
(17570 KB) - Metadata XML
-
Supplement
(734 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2224', Anonymous Referee #1, 26 Jun 2025
EGU Review
June 2025
Cappelletti et al. proffer a method for in-cave molecular, biochemical, and metabolic analysis of microbial communities associated with stromatolite-like structures in an orthoquartzite cave. Their in-cave approach to microbial community sequencing involved thoughtful amendments to otherwise standard DNA sequencing pipelines to enable “off-grid” (no-internet connection and limited computational and electrical resources) work. Results from this amended pipeline are reported as comparable to results laboratory-based full-computational runs on the same samples as processed in the field. In addition to this methodological assessment, the authors show that their modified pipeline resolves differences in microbial communities sampled from different part of the stromatolite-like structures, molecular-based observations that were subsequently corroborated with independent molecular and cultivation-based approaches. This work is valuable for real-time microbiological analyses often perform in remote or hard-to-access locations usually representing astrobiological analogues on Earth.
The manuscript requires substantial revisions with, perhaps, the complete absence of the sequence analysis software, statistical approaches, and method-specific parameters used to generate the microbiome data in various manuscript figures being the most detrimental for adequate reproducibility. Additional suggestions to improve the manuscript are provided below.
Suggestions:
L34: Please revise this long sentence for clarity.
L53: Please revise this sentence for clarity.
L54: Do the author’s mean the informed selection of samples?
L68: Explicitly define the a.s.l. acronym (above sea level, I suppose).
L73: Revise “hundreds of thousands of millions of years” since this reads as hundreds of billions of years which is more than the age of the Universe.
L74: A comparative description of the oligotrophic description here may help the readers: How much more oligotrophic are the caves relative to the world immediately outside of the cave and/or for example hyper arid deserts in the hemisphere?
L120: How far away is the field laboratory from the sampled deposits.
L136: ATP is directly proportional to the activity of cells rather than abundance, particularly in oligotrophic (low-metabolic activity) conditions. It would be helpful to the readers if the authors made this disambiguation as this point is presented in the next line (L137).
L146: Do the authors mean ¼ of the volume of the collected samples? Please revise.
L164: A run-on sentence, please revise.
L179: Can the authors specify the “standard packages for the visualization” and, most importantly, details on software, statistical test, and hypothesis testing parameters for the “analysis of 16S rRNA sequencing data”. These details, although often “standard”, are imperative for reproducibility.
L187: The authors should specify that the chi-squared is specifically assessing the goodness of fit for the distribution of taxa, at various taxonomic levels, across the lab and cave pipelines.
L207: This sentence looks incomplete, with multiple commas and periods. Please revise.
L212: Is a 14d incubation a drastic deviation from the manufacturer’s suggested incubation time? It seems like a long time to incubate 150ul of inoculant. I’d expect volume loss even with a lid. Please state this incubation time as a modification from the standard protocol if it deviates at all.
L244: Figure 5: Explicitly state the bar plot label abbreviations.
L250: Figure 6: Explicitly state the plot label abbreviations. Was a t-test performed here to formally test the difference in the means?
L351: …hard accessibility?
L372: Please revise this sentence for clarity.
L420: Perhaps change activity/amount to activity/quantification?
L420: Breaking up the sentence in L420-423 into two separate statements will improve it.
L423-427: Please breakup this run-on into separate sentences highlighting the i) usefulness of the site as a Mars analogue and the ii) utility of the site for the study of microbe-mineral interactions.
L438: Please hyphenate the compound adjectives in-lab and in-cave.
Revise this same sentence for clarity, for example:
In cave and in lab analyses allowed to provide indications on the active role of microbial cells in the process of stromatolite formation and indicated a possible role of Actinobacteriota and Acidobacteriota members in stromatolite development.
Could read, for example:
In-cave and in-lab analyses allowed assessment of the active role of microbial cells in the process of stromatolite formation and indicated a possible role of Actinobacteriota and Acidobacteriota members in stromatolite development.
Citation: https://doi.org/10.5194/egusphere-2025-2224-RC1 -
AC1: 'Reply on RC1', Martina Cappelletti, 25 Aug 2025
We thank the Reviewer for the valuable comments and suggestions. Below we provide our responses to each point.
L34: Please revise this long sentence for clarity.
Answer: Done
L53: Please revise this sentence for clarity.
Answer: Done
L54: Do the author’s mean the informed selection of samples?
Answer: Yes, we changed the sentence accordingly (see the sentence: “The advantages of in situ DNA analyses include to avoid sample degradation, to limit problems with transportation and to allow field-based pre-screening for informed sample selection.”)
L68: Explicitly define the a.s.l. acronym (above sea level, I suppose).
Answer: Done
L73: Revise “hundreds of thousands of millions of years” since this reads as hundreds of billions of years which is more than the age of the Universe.
Answer: We corrected the sentence by indicating “millions of years”
L74: A comparative description of the oligotrophic description here may help the readers: How much more oligotrophic are the caves relative to the world immediately outside of the cave and/or for example hyper arid deserts in the hemisphere?
Answer: Deep zones of the Tepui quartzite caves are recognized as oligotrophic ecosystems, with organic carbon and nutrient concentrations reported at undetectable levels (Mecchia et al., 2014; Sauro et al., 2018; Ghezzi et al., 2022). Unlike the surrounding tepui surface, which receives constant inputs from soils and vegetation, the cave interior is permanently deprived of external nutrient influx and lacks primary production driven by photosynthesis. To address the reviewer’s comment, we have added a sentence citing previous studies that describe the oligotrophy of tepui caves also in comparison with the areas immediately outside of the cave (lines 75 – 79: “In this regard, the innermost zones of quartzite caves in the tepui mountains are considered oligotrophic ecosystems, with organic nutrient concentrations close to undetectable levels (as reported in Barton et al. 2014; Mecchia et al., 2014; Sauro et al., 2018; Ghezzi et al., 2022). Unlike the external surface, which receives constant inputs from soils and vegetation, the cave interior is deprived of external nutrient sources and lack primary production associated to light-driven photosynthesis.”). However, significant gaps remain in the quantitative characterization of TOC and DOC in these systems, as they are difficult to access and permits for sample collection and transport are difficult to obtain. For this reason, direct quantitative comparisons with other oligotrophic environments, such as hyper-arid deserts, remain challenging.
L120: How far away is the field laboratory from the sampled deposits.
Answer: Samples were collected in areas of the cave located approximately 350 meters in a straight line from the tent laboratory, corresponding to about 650 meters along the actual path.A specific sentence was added in the Method section.
L136: ATP is directly proportional to the activity of cells rather than abundance, particularly in oligotrophic (low-metabolic activity) conditions. It would be helpful to the readers if the authors made this disambiguation as this point is presented in the next line (L137).
Answer:We modified the text indicating "metabolic activity" instead of "abundance"
L146: Do the authors mean ¼ of the volume of the collected samples? Please revise.
Answer:We have corrected the sentence (“we arbitrarily added an amount of sample that filled approximately one quarter of the volume of the Qiagen tube with the PowerBead solution”)
L164: A run-on sentence, please revise.
Answer:We modified the sentence to clarify the meaning (“a sequencing run of approximately 2 hours was performed, with real time basecalling enabled, and carried out directly in situ using the MinKNOW software.”)
L179: Can the authors specify the “standard packages for the visualization” and, most importantly, details on software, statistical test, and hypothesis testing parameters for the “analysis of 16S rRNA sequencing data”. These details, although often “standard”, are imperative for reproducibility.
Answer:The shell script used to handle ONT sequencing data and generate the EMU taxonomy abundance tables, along with the R code to replicate the statistical analyses and figures, have been uploaded in Figshare under the DOI: 10.6084/m9.figshare.29514197. We also improved the method section, specifying the software versions and R packages used at the time of analysis, as well as the pre-processing of EMU abundance data for the calculation of alpha and beta diversity, chi-square tests, and plotting compositional data (Figure 10-11, Figure S1-2).
L187: The authors should specify that the chi-squared is specifically assessing the goodness of fit for the distribution of taxa, at various taxonomic levels, across the lab and cave pipelines.
Answer:We have specified this aspect in the lines 216-218 of the revised version of the paper (added sentence: The goodness-of-fit of taxonomic distributions at different levels between the lab and cave pipelines was assessed using a Chi-square test, implemented through the chisq.test function of the R stats package (v3.6.2).)
L207: This sentence looks incomplete, with multiple commas and periods. Please revise.
Answer:The sentence was corrected (“This technology relies on active cell metabolism, where the dye is reduced by NADH produced during respiration, leading to the formation of the purple compound formazan.”)
L212: Is a 14d incubation a drastic deviation from the manufacturer’s suggested incubation time? It seems like a long time to incubate 150ul of inoculant. I’d expect volume loss even with a lid. Please state this incubation time as a modification from the standard protocol if it deviates at all.
Answer:We agree with the reviewer that this point required further clarification. Metabolic analyses were initially performed after two incubation periods, at both 5 and 14 days. Only slight color changes were observed after 5 days, whereas more pronounced changes became evident after 14 days. We have now specified in the Methods section that both incubation times were tested and that specific methods were applied to avoid volume loss (see in the Method session lines 268 – 272: “Two incubation periods were selected, i.e., 5 and 14 days. The first one corresponds to the time that is typically used in Biolog experiments, while the second one was selected because previously adopted in studies analysing microbial metabolic activities in cave samples using Ecoplates (O’Connor et al. 2021). After inoculation, the Ecoplates were sealed with Parafilm and placed inside sealed containers containing water-soaked absorbent paper to maintain humidity to avoid volume loss due to dryness.”). Furthermore, in the Results section we have specified that only 14-day results were informative: “Two incubation periods were selected, i.e., 5 and 14 days. The longer incubation period (14 days) yielded the same patterns observed at 5 days, but with higher levels of metabolic activity; therefore, only the 14-day results are reported in Figure 7.”
L244: Figure 5: Explicitly state the bar plot label abbreviations.
Answer:The explanation of the abbreviations has been added to the Figure caption
L250: Figure 6: Explicitly state the plot label abbreviations. Was a t-test performed here to formally test the difference in the means?
Answer:We have added the explanation of the abbreviations in the figure caption in the revised version of the paper. Regarding the analysis of the difference, a t-test was performed and the results indicated that the difference was not statistically significant. This aspect was clarified in the figure caption.
L351: …hard accessibility?
Answer:We have replaced “reachability” with “accessibility”
L372: Please revise this sentence for clarity.
Answer:We have clarified the sentence (line 437: “The standard pipeline employed the ‘super accuracy’ basecalling mode on HPC nodes equipped with dedicated GPUs and sufficient RAM/CPUs to process the full sequencing output.”)
L420: Perhaps change activity/amount to activity/quantification?
Answer:We changes the text accordingly.
L420: Breaking up the sentence in L420-423 into two separate statements will improve it.
Answer:We divided the sentence in two sentences (“This work describes the development and validation of procedures to carry out microbial activity/quantification analysis and DNA sequencing in a remote subterranean environment. The results from this study also provide novel insights into the microbiology of silica deposits in orthoquartzite caves, which are considered promising Mars analogues for subsurface and silica-rich environments.”)
L423-427: Please breakup this run-on into separate sentences highlighting the i) usefulness of the site as a Mars analogue and the ii) utility of the site for the study of microbe-mineral interactions.
Answer:We brokeup the sentence in two as suggested by the Reviewer (the new version reports: “The use of Imawarí Yeutá as an environmental setting for these procedures was functional to highlight its potential as an extraterrestrial analogue on Earth, given its extreme remoteness, isolation, and morphological analogies with silica structures detected on Mars. At the same time, it offered scientific interest for studying the microbial communities colonizing this oligotrophic cave and inhabiting the peculiar silica stromatolite-like structures that likely contribute to their formation.”)
L438: Please hyphenate the compound adjectives in-lab and in-cave.
Answer:Done
Revise this same sentence for clarity,
Answer:We corrected/modified the sentence, thank you for the suggestion
Citation: https://doi.org/10.5194/egusphere-2025-2224-AC1 -
AC3: 'Addition to the answer to the comment L179', Martina Cappelletti, 25 Aug 2025
Here we report an addition to the answer to the Reviewer's comment "L179: Can the authors specify the “standard packages for the visualization” and, most importantly, details on software, statistical test, and hypothesis testing parameters for the “analysis of 16S rRNA sequencing data”. These details, although often “standard”, are imperative for reproducibility."
Answer:
The method section now reads as follow: “The pipeline consists of two modules, i.e., a pre-processing module and a classify module. In the pre-processing module, the raw reads are processed through the following steps: i) random subsampling at 25% with reformat.sh from BBMap package v38.98 (https://sourceforge.net/projects/bbmap/), ii) adapter removal by Porechop v0.2.4 (https://github.com/rrwick/Porechop), iii) length (1200 – 1800 bp) and quality (≥9) filtering using Nanofilt v2.6.0 (https://github.com/wdecoster/nanofilt), iv) chimera removal through yacrd vIvysaur using recommended settings for ONT data (https://github.com/natir/yacrd). The second module enabled taxonomy classification and calculation of taxa abundances of the filtered reads against the prebuilt SILVA database v138.1 using the EMU classifier v3.4.5 (Curry et al., 2022). The time requirement for the entire workflow was under 15 minutes, considering approximately 25,000 to 30,000 reads processed for each sample after the initial subsampling at 25%.
For the lab procedure, the same data analysis workflow was followed except for the base calling modality and the subsampling. The reads processed by the lab procedure were base-called using “super accurate” mode with Guppy v6.11. This mode requires access to a machine with a GPU to complete the basecalling within an acceptable timeframe. Such GPUs are often available on HPC nodes whose access requires an internet connection or in high-performing laptops that still would require several hours to complete the basecalling. The direct comparison between the two pipelines is indicated in Table S2.
The abundance data generated from EMU were combined to produce sample-wise taxon-specific abundance tables at the Phylum, Class, Order, Family, and Genus levels, with taxa showing abundances below 1% grouped under the category ‘others’. Visualisation of compositional data through bubble plots (Figure 11; Figure S2) was performed using ggplot.
Alpha and beta diversity metrics were calculated using the R package Vegan v2.6.4. The R package ComplexHeatmap and ggplot2 were used to visualise Genus-level Bray-Curtis dissimilarities between stromatolite (ST-w and ST-b) and biofilm (D-w, P-w, P-y) samples (Figure 10; Figure S1)
The goodness-of-fit of taxonomic distributions at different levels between the lab and cave pipelines was assessed using a Chi-square test, implemented through the chisq.test function of the R stats package (v3.6.2). This statistical test allows the assessment of whether the distribution of taxa in the cave_pipeline deviated significantly from that of the lab_pipeline. The shell scripts used to process sequencing data into EMU abundances, as well as the R scripts used for statistical analyses, beta and alpha diversity analyses, and visualisation of the results, are available on Figshare doi: 10.6084/m9.figshare.29514197.”
Citation: https://doi.org/10.5194/egusphere-2025-2224-AC3
-
AC1: 'Reply on RC1', Martina Cappelletti, 25 Aug 2025
-
RC2: 'Comment on egusphere-2025-2224', Anonymous Referee #2, 11 Aug 2025
Cappelletti et al. report an interesting study on the detection of microbial activity and DNA sequences in a cave environment. They demonstrate the feasibility of conducting all stages of the analysis from sample collection to DNA sequencing within a field lab set up in the cave, without internet access.
The results are necessarily somewhat coarse and represent only the first stages of exploring a microbial ecosystem, but the methods demonstrated in this study could be helpful in accelerating these first steps for future research projects in other systems. For example, if initial taxonomic and metabolic activity results could be generated during an initial field expedition, then follow-up studies with the potential to generate more specific and more thorough data could be executed immediately during the same field trip, without needing to wait months or years for the next field trips, as is often typical with this type of research. Therefore, I think the manuscript should be important and interesting enough to the community that it should be published with minor revisions, even though the actual results reported here are limited to a general characterization of the resident community and lack any key insights.Specific comments:
Grammar: The text contains some grammatical errors and some confusing sentences that could benefit from some careful editing, but in general, the manuscript is certainly comprehensible.Software: The data analyses are described as a "pipeline". If this is a software pipeline - meaning that it can be run with a single command from start to finish in an automated way - then it should be shared via github or some other method. If it is actually more of a "workflow" - meaning that the user manually executes each step of the pipeline, one at a time - then this should be clarified. I recommend the use of the word "workflow" in this case. Note that pipelines are not necessarily better than workflows. I am personally quite skeptical of pipelines, but in any case, this point should be clarified. In a study like this, the availability of a portable pipeline could be helpful to other researchers.
I am also a bit confused about what, exactly, the lab pipeline consisted of. I think more details could be provided here, and also a clarification of whether it is an automated pipeline or a manual step-by-step workflow.
line 226: the use of the word "confirm" here is too strong for the description of an initial result with respect to a biosignature. Please reword with suitable uncertainty.
Figure captions should include explanations of sample name abbreviations.
Citation: https://doi.org/10.5194/egusphere-2025-2224-RC2 -
AC2: 'Reply on RC2', Martina Cappelletti, 25 Aug 2025
We thank the Reviewer for the general positive comments about the manuscript.
Below we provide our responses to the Reviewer's comment:
Specific comments:
Grammar: The text contains some grammatical errors and some confusing sentences that could benefit from some careful editing, but in general, the manuscript is certainly comprehensible.Answer: We have thoroughly checked the text for grammatical erros and typos.
Software: The data analyses are described as a "pipeline". If this is a software pipeline - meaning that it can be run with a single command from start to finish in an automated way - then it should be shared via github or some other method. If it is actually more of a "workflow" - meaning that the user manually executes each step of the pipeline, one at a time - then this should be clarified. I recommend the use of the word "workflow" in this case. Note that pipelines are not necessarily better than workflows. I am personally quite skeptical of pipelines, but in any case, this point should be clarified. In a study like this, the availability of a portable pipeline could be helpful to other researchers. I am also a bit confused about what, exactly, the lab pipeline consisted of. I think more details could be provided here, and also a clarification of whether it is an automated pipeline or a manual step-by-step workflow.
Answer: This is an automated pipeline (not a manual step-by-step workflow) in which the user only decides the subsampling ratio. Other parameters like the read length and minimum quality of ONT reads are coded in the shell script and can be changed by modifying the script accordingly. Regarding the comment about the pipeline sharing need, we have uploaded in Figshare under the DOI: 10.6084/m9.figshare.29514197 the pipeline used to handle ONT sequencing data and generate the EMU taxonomy abundance tables, along with the R code to replicate the statistical analyses and figures.
line 226: the use of the word "confirm" here is too strong for the description of an initial result with respect to a biosignature. Please reword with suitable uncertainty.
Answer: We have replaced “confirm” with “suggest”
Figure captions should include explanations of sample name abbreviations.
Answer: We have added description of abbreviations in the figure captions (see the edited captions of Figure 1, Figures 5, 6, 10 and 11)
Citation: https://doi.org/10.5194/egusphere-2025-2224-AC2
-
AC2: 'Reply on RC2', Martina Cappelletti, 25 Aug 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
530 | 77 | 13 | 620 | 25 | 6 | 17 |
- HTML: 530
- PDF: 77
- XML: 13
- Total: 620
- Supplement: 25
- BibTeX: 6
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1