the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Combining BGC-Argo floats and satellite observations for water column estimations of particulate backscattering coefficient
Abstract. Monitoring carbon cycle processes is key to understanding climate system science. As the second largest carbon reservoir on Earth, the ocean regulates carbon balance through Particulate Organic Carbon (POC), which links surface biomass production, the deep ocean, and sedimentation. The degradation of POC in the deep ocean notably impacts atmospheric CO2 levels. POC estimation is achieved by measuring proxies like the Particulate Backscattering Coefficient (bbp), obtained from satellite observations and in situ sensors, such as the BioGeoChemical-Argo (BGC-Argo) floats. These floats provide global- scale profiles of ocean biogeochemical properties. Previous research has combined data from BGC-Argo floats and satellite sensors, demonstrating the potential of machine learning models to infer vertical bio-optical properties in the water column. By bridging the gap between surface optical properties and deep ocean processes, this approach enhances the estimation within the top 250 meters of the water column. This study focuses on such estimations, including remote sensing data from the Sentinel-3 Ocean and Land Colour Instrument (OLCI) sensor. The addition of optical information about absorption and scattering processes has improved the accuracy of the Random Forest models, which show promising results, especially within the first 50 meters in the Subtropical Gyres. However, in dynamic regions like the North Atlantic, results are less consistent, suggesting further research is needed to understand how the complexity of the water column’s physical state modifies the bbp vertical fluxes.
- Preprint
(14825 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 04 Mar 2025)
-
RC1: 'Comment on egusphere-2024-3942', Anonymous Referee #1, 11 Jan 2025
reply
The current manuscript builds upon existing research to explore the use of multi-output random forest models for retrieving backscattering coefficient (bbp) data at various depths using different input datasets. These inputs include enhancements in spatial resolution and a diversity of data types. However, the draft in its present form is somewhat rudimentary. The work presented is not effectively summarized in the abstract and discussion sections, and the highlights and innovations of the study are not prominently featured. I recommend that the authors address these issues by clearly outlining the study’s contributions and innovations at the end of the introduction. Below are specific suggestions for improvement:
- The introduction initially mentions Particulate Organic Carbon (POC) using its abbreviation without first presenting its full name and explaining its significance within the study’s context. This may confuse readers unfamiliar with the term. Additionally, the discussion on why profiling POC is challenging is insufficiently developed. A more detailed explanation of these measurement difficulties is necessary to establish the research problem’s significance and to clearly justify the study’s objectives. Providing a comprehensive background on POC and elaborating on the challenges in measuring it will better prepare readers for the research presented.
- The second paragraph of the introduction discusses Apparent Optical Properties (AOP), which does not appear to be directly related to the paper’s main focus. This section may detract from the introduction’s clarity and coherence by introducing a topic that is not central to the study’s objectives. It is important to ensure that the introduction remains focused on the key themes and research questions. If AOP is not essential to the main argument, consider removing this section or significantly condensing it to maintain the introduction’s focus and engage readers with the paper’s central themes.
- The logical flow between the introduction’s first two paragraphs is somewhat disjointed, potentially hindering the reader’s understanding of the paper’s overall direction. Furthermore, the latter paragraphs lack a detailed analysis of the current research landscape. The discussion of existing studies is limited and does not clearly identify the knowledge gaps this paper aims to address. To enhance the introduction, revise the first two paragraphs to improve their logical structure and coherence. Additionally, include a more comprehensive review of the current research, highlighting specific gaps in the literature and the problems this study seeks to solve. Incorporating more examples of relevant previous research will strengthen the context and rationale for the study, providing a clearer foundation for the paper’s contributions.
- The introduction should underscore the importance of bbp in POC measurement, as well as the deficiencies and areas for improvement in current bbp products. While the introduction currently highlights the significance of POC, it does not adequately stress the critical role of bbp. Clarify whether POC estimation relies solely on bbp and discuss its specific importance in this context. Additionally, expand upon the current state of bbp data by discussing the limitations of existing bbp products and the shortcomings of related algorithms. For instance, accurately deducing inherent optical properties (IOPs) from apparent optical properties (AOPs) is crucial for POC retrieval models based on IOPs, but this process can be challenging. Furthermore, the complex optical conditions in coastal areas can lead to significant spatial heterogeneity in POC distribution, introducing uncertainty in POC estimation even when using advanced methods. Addressing these points will provide a clearer context for the study’s objectives and the need for improved bbp products.
- It is crucial to provide specific details about the data collected from each dataset, including the exact variables used, the time range of data collection, website links for accessing the data, and the dates when the data were accessed. Currently, Table 1 lacks sufficient information, and the time frames for the BGC-Argo data and other datasets are not clearly stated. To improve clarity and completeness, ensure that all necessary details are included in the data section, allowing readers to understand the scope and sources of the data used in this study.
- In the methods section, the use of Principal Component Analysis (PCA) for dimensionality reduction of high-dimensional features is mentioned, stating that “After this feature reduction on the high-dimensional variables, the 250 m and 50 m measurements with 126 and 26 inputs are reduced to 5 components for each variable, resulting in a total of 20 features. This method still retains 99% of the information.” However, this section lacks supporting data and visualizations to illustrate the PCA results. To enhance clarity and effectiveness, include data tables or figures that demonstrate the specific components selected and their contributions to the overall variance. This will help readers better understand the impact of PCA on the feature set and validate the claim that 99% of the information is retained.
- In Section 2.5, the discussion is somewhat disorganized. The introduction of the Random Forest Regression model should precede the discussion of existing studies based on random forest models. Additionally, such content seems more appropriate for the introduction section, as it pertains to a review of existing research rather than the methods section. Moreover, the authors state, “All the previously mentioned algorithms, along with others such as Linear Regressor (LR), Ridge Linear Regressor (RLR), Random Forest Regressor (RFR), and Multi-Layer Perceptron (MLP), were tested for estimating bbp during the dataset preparation phase. Based on these results, the Random Forest Regressor (RFR) was selected as the most suitable algorithm for this multi-input/multi-output problem.” Comparative results should also be presented to illustrate the differences in inversion results and the stability of various models. This will help substantiate the choice of the Random Forest Regressor as the most suitable algorithm for the problem at hand.
- In the initial paragraphs of Section 3, “Performance of the Random Forest Regressor,” the authors refer to the content of Table 1, including the specific datasets corresponding to each abbreviation. However, this information should have been presented in the data introduction section. Instead, this section should provide details on the data volume obtained after feature engineering and data filtering, specifically how much data is used for training and how much for the independent validation set. This will give readers a clearer understanding of the data used in the study and its distribution between training and validation.
- In the section “3 Performance of the Random Forest Regressor,” the authors discuss the differential contribution of various features within the model. It would be beneficial to clarify the source of this feature importance data. Is it derived from the inherent parameters of the random forest model, or does it rely on additional algorithms? While the random forest, as an ensemble learning method, can assess feature importance through multiple decision trees, providing a measure of each feature’s contribution to the predictive outcome, employing SHAP (SHapley Additive exPlanations) values could offer a more detailed and accurate attribution of feature importance. SHAP values provide a robust approach to explaining machine learning model outputs by assigning each feature an importance value for a particular prediction. Incorporating SHAP could enhance the transparency and depth of the analysis regarding each feature’s influence on the model’s performance.
- In the same section, the authors depict the contribution of various features within the model. However, there are concerns regarding the clarity and utility of the presented feature importance data. Specifically, it should be clarified whether features with low contribution are consistently negligible across all depths. If these features do not significantly contribute to the model’s performance at any depth, it might be beneficial to consider their removal to further reduce dimensionality and enhance the model’s efficiency.
- Additionally, some features are derived from PCA processing, and with the multitude of features used, it is challenging to distinguish between those originating from different datasets or subjected to various treatments in the bar chart. To enhance the richness and readability of the visual information, it is suggested that the authors use distinct colors to represent bars corresponding to different types of features. This would allow for a clearer distinction between features from different datasets or processing methods, thereby providing a more informative and accessible visualization of the data. It is also worth noting that while random forest models can provide feature importances based on the model’s internal assessment, these may not always reflect the true importance of features. The authors might also consider using alternative methods such as SHAP (SHapley Additive exPlanations) to calculate feature importances, which could offer a more nuanced understanding of each feature’s contribution to the model’s predictions.
- In the concluding part of the introduction, the authors outline the main content of the research, focusing on a detailed analysis of estimating bbp in the upper layers of the ocean surface using Sentinel-3 Ocean and Land Colour Instrument (S3OLCI) data. The study aims to enhance spatial resolution from the 4 km resolution of GlobColour level-3 merged products to the 300 m Full Resolution (FR) of Sentinel-3 OLCI. Additionally, the research evaluates model performance after incorporating OLCI spectral wavelengths as features for bbp estimation and compares these results with those obtained using GlobColour. The study also explores whether the inclusion of Inherent Optical Properties (IOPs) derived from satellite data can improve the accuracy of bbp estimation compared to using reflectances alone. These IOPs, provided by the Sentinel-3 OLCI processor, are hypothesized to significantly enhance regression models. The comparison is made between BGC-Argo data and various satellite datasets for two depth layers: from the surface to either 50 m or 250 m. However, the abstract does not provide a comprehensive and concise summary of the work and its innovative aspects. After reading the abstract, it remains unclear what the specific contributions and novelties of this research are. I recommend that the authors revise the abstract to include a brief but complete overview of the study’s objectives, methods, and key findings. The abstract should clearly communicate the innovative aspects of the research, such as the use of higher resolution data, the incorporation of IOPs, and the comparison of model performances, to give readers a clear understanding of the study’s significance and contributions to the field.
- The section “2.5 Multi-output Machine Learning Models” in the methods part of the paper should be clarified to determine whether it represents one of the study’s innovative aspects. If this section indeed constitutes an innovation, it is essential to highlight it appropriately throughout the paper to ensure that readers recognize its significance. In the abstract, include a brief mention of the multi-output machine learning approach and its novelty to pique the interest of potential readers and set the stage for the detailed methodology presented later. In the introduction, provide a clear and concise explanation of what multi-output machine learning models are and how they are applied in this study. Emphasize the innovative nature of using these models, perhaps by comparing them to traditional single-output models or by discussing the advantages they offer in the research context. During the discussion, reflect on the implications of using multi-output machine learning models, including a comparison of their performance with other models, the benefits they provide in terms of accuracy or efficiency, and their potential applications in similar research endeavors. To ensure consistency and clarity, make sure that the term “multi-output” is consistently defined and used throughout the paper, and that its implications for the research are clearly articulated. If the multi-output approach is a key innovation, it should be a central theme in the narrative of the paper, guiding the reader through the methodology, results, and implications of the study.
Overall, addressing these suggestions will significantly enhance the manuscript’s clarity, coherence, and professionalism, thereby strengthening its contribution to the field of ocean physical remote sensing.
Citation: https://doi.org/10.5194/egusphere-2024-3942-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
33 | 8 | 1 | 42 | 0 | 0 |
- HTML: 33
- PDF: 8
- XML: 1
- Total: 42
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1