Phytoplankton Community Composition in the Eastern Subarctic Pacific Derived from Hyperspectral Optics

Pillai, Sacchidanandan Viruthasalam; Peña, M. Angelica; McNabb, Brandon J.; Burt, William J.; Tortell, Philippe D.

doi:10.5194/egusphere-2023-2851

Preprints

https://doi.org/10.5194/egusphere-2023-2851

Preprints

08 Dec 2023

| 08 Dec 2023

Phytoplankton Community Composition in the Eastern Subarctic Pacific Derived from Hyperspectral Optics

Sacchidanandan Viruthasalam Pillai, M. Angelica Peña, Brandon J. McNabb, William J. Burt, and Philippe D. Tortell

Abstract. We evaluate the utility of hyperspectral particulate absorption data to characterize phytoplankton community structure in the eastern Subarctic Pacific Ocean. Relative to existing algorithms based solely on Chlorophyll-a concentrations (Chla), improved taxonomic classification (validated with pigment-based data) was obtained by including Principal Components Analysis of hyperspectral absorption data. Multiple linear regression of hyperspectral absorption data yielded better taxonomic classification, particularly for estimates of haptophyte biomass. In addition, size-fractionated hyperspectral measurements were used to determine the dominant phytoplankton size of the phytoplankton community. Using high-frequency ship-board optical data, we examined the spatial patterns in phytoplankton taxonomic abundance in coastal and offshore waters around Vancouver Island, British Columbia. Results from this analysis were consistent with expectations based on previous low-resolution sampling, demonstrating expected seasonal succession of different phytoplankton groups, and significant variability in coastal phytoplankton taxonomy associated with dominant hydrographic features. In contrast, much less spatial and temporal variability was observed in offshore waters. Derived patterns in phytoplankton taxonomy were linked to observed patterns in surface water biogeochemical properties, notably the distribution of dimethyl sulfide (DMS) and dimethylsulfoniopropionate (DMSP) to Chla ratios. Our results highlight the potential for shipboard hyperspectral absorption data to describe phytoplankton community composition and ancillary biogeochemical variables.

Received: 30 Nov 2023 – Discussion started: 08 Dec 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Sacchidanandan Viruthasalam Pillai, M. Angelica Peña, Brandon J. McNabb, William J. Burt, and Philippe D. Tortell

Status: closed

RC1:
'Comment on egusphere-2023-2851', Anonymous Referee #1, 19 Jan 2024

The manuscript by Pillai et al. investigated phytoplankton community structures derived from HPLC pigment data and optical hyperspectral data based on underway spectrophotometric measurements in the Eastern Subarctic Pacific. The data shown could be potentially interesting, but there are several concerns that the authors need to clarify. Major and specific comments are provided below:

Major comments
1. The authors claimed that the MLR approach using a spectral decomposition outperformed compared with the Chla-based approach (Lines 276 – 277 and 316 – 319). However, it is difficult to understand the logic. Because the MLR approach for estimating diatom and haptophyte groups was developed using CHEMTAX and DPA estimates as a reference for phytoplankton community composition (Table 2), I think that inevitably, the MLR approach has a higher R² value than those in the Chla-based approach (Table 5). I am not convinced that the central claim of the manuscript is correct. More appropriate analysis and explanations are required.

2. I think that the author classified the data into three categories, i.e., the coastal, offshore, and all datasets based on 200 m depth (Line 100). However, the authors show the best-fit MLR equations for diatom using the coastal and all datasets, in contrast, they show the best-fit MLR equations for haptophyte using the offshore and all datasets (sometimes offshore only). For example, Table 2 shows the best-fit MLR equations for haptophytes based on the offshore dataset only, but Table 5 shows haptophyte biomass derived from MLR using offshore and all datasets, respectively. It is difficult to understand how the authors estimate haptophyte biomass using the MLR approach based on all datasets. Why were the three equations using each dataset (coastal, offshore, and all) not shown? These points should be clearly stated.

3. Diatom and haptophyte estimated from different algorithms (CHEMTAX, DPA, and optical bases) were well documented, but no details for the other phytoplankton functional types were provided. To focus and estimate these two groups from the optical hyperspectral data is fine. Still, it would be a good idea if the authors could first show the overall description and figures of CHEMTAX and DPA outputs e.g., like Talyor et al. (2021).

Taylor et al. (2011) Biogeosciences

https://doi.org/10.5194/bg-8-3609-2011

4. In relation to the No.3 major comment, looking at Figure 10, the proportions of diatom in some offshore stations were zero, in other words, the proportions of haptophyte were 100% predominant. Were these proportions estimated from the MLR algorithms? Additionally, the authors mentioned these proportions were in agreement with previous work (Lines 297 – 298). However, the previous work which the authors cited did not show the information on haptophyte. It is impossible to judge the results.

5. To evaluate the model performance, other statistical metrics (e.g., root mean squared error, RMSE) should be calculated. I think that this would be a great addition to the paper. For your reference, please refer to the following papers:

Brewin et al. (2016) Remote Sensing of Environment

http://dx.doi.org/10.1016/j.rse.2016.05.005

Tilstone et al. (2021) Remote Sensing of Environment

https://doi.org/10.1016/j.rse.2021.112444

6. Although the authors made a considerable effort to collect the data and I do believe that this is a valuable data set, a major shortcoming is the lack of real focus. For example, three-component models for phytoplankton size classes (micro-, nano-, and picoplankton) are common. However, this paper shows two size classes. In addition, I think that the data on DMS and DMSP is important, but it seems to me that the story is off topic. The spatial changes in DMS and DMSP were not fully discussed in the current form. To better organize the introduction, objectives, and R&D, the paper needs to be reorganized.

Specific comments
L90: Pigment measurement

The HPLC pigment data are quality controlled against the DHI phytoplankton pigment standard?

L97: 2.3 CHEMTAX and DPA analysis

Which methods, the successive runs by Latasa (2007) or the multiple starting points by Wright et al. (2009), were employed to obtain the optimized pigment: Chla ratio? The information will help readers understand the results more easily even if the information is described in Pena et al. (2019a).

Line 125 – 127: Following the methods of Kramer and Siegle (2019)…

Kramer and Sigle (2019) defined total chlorophyll-a (Tchla) as the sum of monovinyl chlorophyll a, divinyl chlorophyll a, chlorophyllide, and chlorophyll a allomers and epimers and finally excluded chlorophyllide for statistical analysis. Please confirm your dataset to be consistent with Kramer and Siegle (2019).

Line 162: Chla concentrations from the Absorption Line Height method

Were Chla concentrations derived from AC-S validated with Chla concentrations derived from HPLC?

Table 1.

Detailed contents of measurements conducted for each cruise and the total number of samples (coastal or offshore) should be provided.

Table 2.

The letter “l” is missing from the word All in diatom derived from DPA.

Figure 1.

It is better to add the contour line of 200 m depth. The information will help readers understand the coastal or offshore stations. Additionally, please make sure the legend of solid line. The legend shows 2016 Feb Line P, but such a cruise is not found in Table 1.

Figure 4.

Please specify % for the first and second axes.

Figure 9.

Please label as a, b, c, and d.

Figure 10.

Figures are aligned vertically. They are not left and right as mentioned in the figure description.

Citation: https://doi.org/10.5194/egusphere-2023-2851-RC1
- AC1: 'Reply on RC1', Sacchidanandan Pillai, 15 Feb 2024
  
  Based on the reviewer’s comments, we propose the following changes to the paper.
  Abstract: Change line 2-5 to state that we used Principal Components Analysis and Multiple linear regression of hyperspectral absorption data to derive the phytoplankton community composition validated with pigment -based data.
  Key aims of the paper Describe the spatial distribution of phytoplankton community composition in the coastal and offshore eastern Subarctic Pacific with higher spatiotemporal resolution than previously possible
  Use the hyperspectral-derived community composition to explain the distribution and oceanographic controls of phytoplankton groups, as well as DMS/DMSP concentrations.
  Introduction: Line 36-44: To be deleted
  Line 62- 70: Provide justification for the need for phytoplankton community composition data in the region, and a better description of existing DMS/DMSP data and distributions.
  Methods: Table 1: Add number of HPLC samples/ measurements taken
                  Figure 2: Move to the supplementary information
                    Section 2.13- Move to the supplementary information
  Results: Section 3.1 and 3.2 describe approaches to derive inputs for analysis of seasonality, distribution, and controls of phytoplankton groups, as well as for DMS/DMSP concentrations. Additional validation from the size information will also be presented. The comparison with the Chla algorithm will be moved to the supplementary information.
                  Section 3.4 will include a description of the spatial distribution of DMS/DMSP in the region.
                  Figure 7c and 8c will be combined into a single figure, with the aim of showing how we can capture high resolution information about the biomass of Diatoms and Haptophytes. Figures 7a, b and 8a and b will be moved into the supplementary information
  Discussion: Lines 321 to 325 to be moved to supplementary information.
                       Lines 339 to 352 to be combined into one paragraph focussing on how to improve the estimates of phytoplankton community composition algorithms.
                     Lines 353 to 362 will be expanded to highlight how the hyperspectrally-derived phytoplankton community composition expands our ability to understand key biogeochemical processes, as well as potential flaws in the analysis.
  Specific comments are responded below: The manuscript by Pillai et al. investigated phytoplankton community structures derived from HPLC pigment data and optical hyperspectral data based on underway spectrophotometric measurements in the Eastern Subarctic Pacific. The data shown could be potentially interesting, but there are several concerns that the authors need to clarify. Major and specific comments are provided below:
  Thank you for your review of the submitted paper. Based on both reviews submitted, we have decided to restructure the paper to focus less on the methodological aspects (including inter-comparisons), and more on demonstrating how hyperspectral data can be used to derive phytoplankton community composition, and help understand the oceanographic controls on phytoplankton groups and associated biogeochemical variables.
  Major comments
  1. The authors claimed that the MLR approach using a spectral decomposition outperformed compared with the Chla-based approach (Lines 276 – 277 and 316 – 319). However, it is difficult to understand the logic. Because the MLR approach for estimating diatom and haptophyte groups was developed using CHEMTAX and DPA estimates as a reference for phytoplankton community composition (Table 2), I think that inevitably, the MLR approach has a higher R²value than those in the Chla-based approach (Table 5). I am not convinced that the central claim of the manuscript is correct. More appropriate analysis an and explanations are required.
  Both our MLR method and the Chla-based approach of Zeng et al (2018) were tuned using DPA estimates as a reference. As such, we believe it is reasonable to compare the accuracy of these two approaches. We see no reason why the MLR method should inherently achieve better results, as suggested by the reviewer.
  
  2.I think that the author classified the data into three categories, i.e., the coastal, offshore, and all datasets based on 200 m depth (Line 100). However, the authors show the best-fit MLR equations for diatom using the coastal and all datasets, in contrast, they show the best-fit MLR equations for haptophyte using the offshore and all datasets (sometimes offshore only). For example, Table 2 shows the best-fit MLR equations for haptophytes based on the offshore dataset only, but Table 5 shows haptophyte biomass derived from MLR using offshore and all datasets, respectively. It is difficult to understand how the authors estimate haptophyte biomass using the MLR approach based on all datasets. Why were the three equations using each dataset (coastal, offshore, and all) not shown? These points should be clearly stated.
  Most of the diatom variability occurs in the coastal dataset, which explains why the offshore data subset was not used to fit the diatom signal. Conversely, haptophytes dominate in the offshore regions, and this subset was thus used to fit haptophyte abundance. This is now further explained / justified in the revised methods section.
  
  3.Diatom and haptophyte estimated from different algorithms (CHEMTAX, DPA, and optical bases) were well documented, but no details for the other phytoplankton functional types were provided. To focus and estimate these two groups from the optical hyperspectral data is fine. Still, it would be a good idea if the authors could first show the overall description and figures of CHEMTAX and DPA outputs e.g., like Talyor et al. (2021).
  Diatoms and haptophytes are the dominant groups in our study area (Line 264-266). But to address the reviewer’s comment, we have included figures showing additional details of the CHEMTAX and DPA outputs in the supporting information section.
  4. In relation to the No.3 major comment, looking at Figure 10, the proportions of diatom in some offshore stations were zero, in other words, the proportions of haptophyte were 100% predominant. Were these proportions estimated from the MLR algorithms? Additionally, the authors mentioned these proportions were in agreement with previous work (Lines 297 – 298). However, the previous work which the authors cited did not show the information on haptophyte. It is impossible to judge the results.
  The proportions shown in Figure 10 are based on the prevalence of the different communities as determined from the Linear Discriminant Analysis (Methods Section 2.11, Results section 3.1.2). While Pena and Varela (2007) may not have considered haptophytes explicitly, their results indicated that phytoplankton biomass was dominated by less than <5µm phytoplankton, which would include haptophytes. We also now cite Pena et al (2018), who showed that offshore Haptophytes are fairly dominant, with minimum seasonal changes. To make this clearer, we will also change the y-axis labels on Fig. 10 to proportion of optically derived communities.
  5. To evaluate the model performance, other statistical metrics (e.g., root mean squared error, RMSE) should be calculated. I think that this would be a great addition to the paper. For your reference, please refer to the following papers:
  As suggested, we have included calculations of RMSE in the revised paper.
  6. Although the authors made a considerable effort to collect the data and I do believe that this is a valuable data set, a major shortcoming is the lack of real focus. For example, three-component models for phytoplankton size classes (micro-, nano-, and picoplankton) are common. However, this paper shows two size classes. In addition, I think that the data on DMS and DMSP is important, but it seems to me that the story is off-topic. The spatial changes in DMS and DMSP were not fully discussed in the current form. To better organize the introduction, objectives, and R&D, the paper needs to be reorganized.
  Due to technical difficulties in obtaining size fractionated optical properties of three phytoplankton size class, we were only able to distinguish between two size classes. However, based on DPA, CHEMTAX and previous observations in the region, we found that the variability of pico- and nanoplankton was much smaller than that of microplankton. Two size classes, while perhaps not optimal, still allowed us to achieve our goal of validating our optical method for phytoplankton assemblage characterization.
  Regarding the DMS and DMSP data, we have re-structured the paper (as noted above) to provide more focus on the oceanographic drivers of phytoplankton taxonomic shifts, and their impact on other key biogeochemical variable. In this respect, DMS/P is both interesting and important. Both these compounds play a significant role in microbial carbon and sulfur cycling, and (in the case of DMS) can also impact regional climate. It is well known that DMS concentrations often show poor correlations with bulk chlorophyll concentrations, due to strong taxonomic differences in DMS/P production across different phytoplankton groups. Our results are thus important and novel in showing how high-resolution taxonomic data (derived from optical sensors) can help explain variability in DMS/P. The high-resolution taxonomic estimates are particularly important given the increasing availability of underway DMS measurement systems that provide high frequency measurements along a ship-track. To address the reviewer’s concern, we have provided more data and discussion of the spatial trends in DMS/P across our survey region.
  
  Specific comments
  L90: Pigment measurement
  
  The HPLC pigment data are quality controlled against the DHI phytoplankton pigment standard?
  Yes, we have clarified this, and added a citation to Nemcek and Pena (2014), where the standards and more specific procedures are described in detail.
  L97: 2.3 CHEMTAX and DPA analysis
  
  Which methods, the successive runs by Latasa (2007) or the multiple starting points by Wright et al. (2009), were employed to obtain the optimized pigment: Chla ratio? The information will help readers understand the results more easily even if the information is described in Pena et al. (2019a).
  We used the multiple starting points method of Wright et al (2009), and have now clarified this.
  Line 125 – 127: Following the methods of Kramer and Siegle (2019)…
  
  Kramer and Sigle (2019) defined total chlorophyll-a (Tchla) as the sum of monovinyl chlorophyll a, divinyl chlorophyll a, chlorophyllide, and chlorophyll a allomers and epimers and finally excluded chlorophyllide for statistical analysis. Please confirm your dataset to be consistent with Kramer and Siegle (2019).
  We were specifically referring to the removal of degradation pigments and redundant pigments form the analysis. While we may not have measured exactly the same allomers and epimers as Kramer and Siegel (2019), our calculation of TChla should contain the majority of the same pigments.
  Line 162: Chla concentrations from the Absorption Line Height method
  
  Were Chla concentrations derived from AC-S validated with Chla concentrations derived from HPLC?
  Poor wording on our part. It will be rephrased to:
  Following Boss et al. (2007), particulate absorption at 676 nm (aph(676)) was used to estimate Chla concentration.   A linear equation aph(676) = ap(676) − 39 65 ap(650) − 26 65 165 ap(715), was used to subtract a baseline absorption (between 650 and 715 nm). We calibrated aph(676) against HPLC measured Chla (Chlorophyll-a + Divinyl Chlorophyll-a + Chlorophyllide-a).
  Table 1.
  
  Detailed contents of measurements conducted for each cruise and the total number of samples (coastal or offshore) should be provided.
  This information was mostly provided in the text part of our original submission, but has now been included in the revised table, as requested.
  
  Table 2.
  
  The letter “l” is missing from the word All in diatom derived from DPA.
  Corrected – thanks for catching that.
  Figure 1.
  
  It is better to add the contour line of 200 m depth. The information will help readers understand the coastal or offshore stations. Additionally, please make sure the legend of solid line. The legend shows 2016 Feb Line P, but such a cruise is not found in Table 1.
  That was a mistake, the figure shows the 2017 Feb cruise. As requested, we will include the 200m bathymetry.
  Figure 4.
  
  Please specify % for the first and second axes.
  We have now included this (it is 27% and 24% respectively.
  Figure 9.
  
  Please label as a, b, c, and d.
  Done
  Figure 10.
  
  Figures are aligned vertically. They are not left and right as mentioned in the figure description.
  Changed, as requested.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2851-AC1
RC2:
'Comment on egusphere-2023-2851', Anonymous Referee #2, 25 Jan 2024
Major comments:
This paper characterizes phytoplankton community composition in the eastern Subarctic Pacific Ocean using continuously-collected hyperspectral absorption data in comparison to phytoplankton pigment data. The dataset appears to have a lot of value, but there are some major concerns that need to be addressed before publication.
My biggest area of concern is the lack of focus within the manuscript - the authors try to cover too much. I would suggest either making it a dedicated methods comparison paper or choosing a phytoplankton community composition characterization method (this could be the author’s MLR model) and focusing on the ecology of the system.
If the authors made it a methods comparison paper, I would envision it comparing the taxonomic output from the Hirata chlorophyll algorithm, HPLC CHEMTAX and DPA methods, gaussian decomposition of absorption spectra, and the MLR that uses HPLC and gaussian decomposition data. The results could focus on the comparisons between different methods and why those relationships may exist. Based on the results, the authors could suggest a best method for the region, which could be applied to subsequent papers on phytoplankton ecology, etc. I would suggest leaving out the other environmental variables (nutrients, salinity, SST, DMS, DMSP) unless the authors wanted to do methods comparisons for offshore vs. inshore, high vs. low nutrient environments, etc. to see if certain methods were better fits for different environments.
If the authors made it a phytoplankton ecology paper:
The introduction could highlight WHY there is a need to develop a new taxonomic classification method – for example, state (1) the downfalls of the chl-based Zeng algorithm, discretely-collected HPLC, and the gaussian decomposition of absorption spectra, and (2) why the new algorithm is necessary and superior to these methods. The introduction would also need to dive deeper into why understanding phytoplankton community composition in this region is important – for example, the authors could highlight a gap in understanding that the manuscript analysis would fill.

The methods could solely focus on the MLR model with a section on HPLC methods, a section on gaussian decomposition, and a section on building of MLR from the two.

In the results, the authors wouldn’t need to discuss methods comparisons because the advantages and motivations for developing the new model were already discussed in the introduction. Instead, the results could focus on applying the community composition outputs from the MLR model to study the ecology – namely to fill the gap in understanding that was introduced in the introduction.

Specific comments:
The paragraph on lines 27-35 implies that the current study will help overcome the limitations of discrete sampling, but we don’t learn how this will be done until two paragraphs later. I would suggest reformatting this so that the link (that hyperspectral data can be collected continuously, increasing spatial and temporal resolution) is made more explicit either by combining these two paragraphs or at least putting them back-to-back.
Along these lines, it’s not clear to me what the shortcomings for the gaussian decomposition of absorption spectra are – it would be helpful to elaborate on this somewhere in your introduction to further explain the motivation for creating a new method.

Lines 36-44: This paragraph is confusing to me. It appears to me that this study is using different methods to characterize phytoplankton community composition. I’m not sure that classifying qualitative vs. quantitative methods clarifies any forthcoming methodologies or adds anything meaningful to the paper. I would suggest deleting it.

Lines 69-70: This paragraph seems linked to the paragraph on lines 27-35 – both are describing limitations in current methodology and how hyperspectral data can overcome them. I would suggest rearranging the introduction to better highlight this point.

Line 71: I would suggest changing “phytoplankton communities and composition” to “phytoplankton community composition.”

Lines 123-134: Maybe I’m missing something, but it is unclear to me what the purpose of the PCA is. Why not just use the output groups from CHEMTAX? (see my comment from lines 337-338 below)

Lines 172-182: I would suggest only including cell size in this analysis if the paper is going to focus on ecology. If the paper is just focusing on the methodological differences for deriving phytoplankton community composition, then it would only be valuable if you were going to compare methods for different size ranges.

Lines 204-214: Again, it is unclear to me why a PCA needs to be included. Why not just use the output groups from the gaussian decomposition?

Table 2: It would be interesting to see the MLR models developed for the other taxonomic group derived from CHEMTAX in addition to diatoms and haptophytes. If the statistical relationships for diatoms and haptophytes are the strongest, it would provide a concrete reason for why the authors focus on those groups throughout the remainder of the paper.

Lines 258-261: This size information is pretty basic and reflects known cell size differences between taxonomic groups. It would be more interesting to explore how cell size of the community as a whole or of a particular group (e.g. diatoms) varies spatially or temporally, especially if the authors decided to take a more ecological focus within the paper.

Figure 7 & 8: These comparisons are a little misleading because DPA and CHEMTAX outputs are used in the MLR, so therefore it would be expected that they would be correlated.

Lines 322-327: Discussing the merits of assessing multiple phytoplankton groups rather than just one seems unnecessary. I would suggest cutting this paragraph.

Lines 337-338: The fact that the resulting pigment clusters and the CHEMTAX-derived composition are correlated suggests that CHEMTAX is an accurate reflection of the community in this region. I would suggest moving the PCA and correlation between the two to an appendix and just using CHEMTAX data in the methods and analysis (with reference to the appendix for more information).

Technical corrections:
In line 63 the authors specify that they will reference “Chlorophyll-a” as “Chla,” but they have already referenced “chlorophyll” and “chlorophyll-a” previously, e.g. lines 31, 41, 47, 48, 50, 57. Please define the use of “Chla” at the first mention of “chlorophyll” to make terminology consistent.

Line 105: DPA was previously defined in line 42, don’t need to define again, just use “DPA” here.

Line 284: Missing an “s” in “diatoms”
Citation: https://doi.org/10.5194/egusphere-2023-2851-RC2
- AC2:
  'Reply on RC2', Sacchidanandan Pillai, 15 Feb 2024
  Based on the reviewer’s comments, we propose the following changes to the paper.
  Abstract: Change line 2-5 to state that we used Principal Components Analysis and Multiple linear regression of hyperspectral absorption data to derive the phytoplankton community composition validated with pigment -based data.
  Key aims of the paper Describe the spatial distribution of phytoplankton community composition in the coastal and offshore eastern Subarctic Pacific with higher spatiotemporal resolution than previously possible
  Use the hyperspectral-derived community composition to explain the distribution and oceanographic controls of phytoplankton groups, as well as DMS/DMSP concentrations.
  Introduction: Line 36-44: To be deleted
  Line 62- 70: Provide justification for the need for phytoplankton community composition data in the region, and a better description of existing DMS/DMSP data and distributions.
  Methods: Table 1: Add number of HPLC samples/ measurements taken
                  Figure 2: Move to the supplementary information
                    Section 2.13- Move to the supplementary information
  Results: Section 3.1 and 3.2 describe approaches to derive inputs for analysis of seasonality, distribution, and controls of phytoplankton groups, as well as for DMS/DMSP concentrations. Additional validation from the size information will also be presented. The comparison with the Chla algorithm will be moved to the supplementary information.
                  Section 3.4 will include a description of the spatial distribution of DMS/DMSP in the region.
                  Figure 7c and 8c will be combined into a single figure, with the aim of showing how we can capture high resolution information about the biomass of Diatoms and Haptophytes. Figures 7a, b and 8a and b will be moved into the supplementary information
  Discussion: Lines 321 to 325 to be moved to supplementary information.
                       Lines 339 to 352 to be combined into one paragraph focussing on how to improve the estimates of phytoplankton community composition algorithms.
                     Lines 353 to 362 will be expanded to highlight how the hyperspectrally-derived phytoplankton community composition expands our ability to understand key biogeochemical processes, as well as potential flaws in the analysis.
  Major comments:
  This paper characterizes phytoplankton community composition in the eastern Subarctic Pacific Ocean using continuously-collected hyperspectral absorption data in comparison to phytoplankton pigment data. The dataset appears to have a lot of value, but there are some major concerns that need to be addressed before publication.
  My biggest area of concern is the lack of focus within the manuscript - the authors try to cover too much. I would suggest either making it a dedicated methods comparison paper or choosing a phytoplankton community composition characterization method (this could be the author’s MLR model) and focusing on the ecology of the system.
  If the authors made it a methods comparison paper, I would envision it comparing the taxonomic output from the Hirata chlorophyll algorithm, HPLC CHEMTAX and DPA methods, gaussian decomposition of absorption spectra, and the MLR that uses HPLC and gaussian decomposition data. The results could focus on the comparisons between different methods and why those relationships may exist. Based on the results, the authors could suggest a best method for the region, which could be applied to subsequent papers on phytoplankton ecology, etc. I would suggest leaving out the other environmental variables (nutrients, salinity, SST, DMS, DMSP) unless the authors wanted to do methods comparisons for offshore vs. inshore, high vs. low nutrient environments, etc. to see if certain methods were better fits for different environments.
  If the authors made it a phytoplankton ecology paper:
  The introduction could highlight WHY there is a need to develop a new taxonomic classification method – for example, state (1) the downfalls of the chl-based Zeng algorithm, discretely-collected HPLC, and the gaussian decomposition of absorption spectra, and (2) why the new algorithm is necessary and superior to these methods. The introduction would also need to dive deeper into why understanding phytoplankton community composition in this region is important – for example, the authors could highlight a gap in understanding that the manuscript analysis would fill.
  
  The methods could solely focus on the MLR model with a section on HPLC methods, a section on gaussian decomposition, and a section on building of MLR from the two.
  
  In the results, the authors wouldn’t need to discuss methods comparisons because the advantages and motivations for developing the new model were already discussed in the introduction. Instead, the results could focus on applying the community composition outputs from the MLR model to study the ecology – namely to fill the gap in understanding that was introduced in the introduction.
  
  Thank you for these useful suggestions. Based on these comments, we have decided to focus our work on the application of our method to better understanding phytoplankton ecology and the distribution of key biogeochemical variables.
  
  Specific comments:
  The paragraph on lines 27-35 implies that the current study will help overcome the limitations of discrete sampling, but we don’t learn how this will be done until two paragraphs later. I would suggest reformatting this so that the link (that hyperspectral data can be collected continuously, increasing spatial and temporal resolution) is made more explicit either by combining these two paragraphs or at least putting them back-to-back.
  
  We have reformatted the introduction, and now provide a clearer justification for the utility of high-resolution, automated measurements.
  Along these lines, it’s not clear to me what the shortcomings for the gaussian decomposition of absorption spectra are – it would be helpful to elaborate on this somewhere in your introduction to further explain the motivation for creating a new method.
  The gaussian decomposition of the spectra only yields Gaussian amplitudes which are related to the pigment concentration, rather than to the phytoplankton community composition. Additional analysis is needed to determine the community composition from the gaussian amplitudes, which we explore here. This has now been clarified in the revised text, which now de-emphasizes comparisons across methods.
  Lines 36-44: This paragraph is confusing to me. It appears to me that this study is using different methods to characterize phytoplankton community composition. I’m not sure that classifying qualitative vs. quantitative methods clarifies any forthcoming methodologies or adds anything meaningful to the paper. I would suggest deleting it.
  
  This serves to help clarify terminology used later in the paper, as well as to help explore different methods used to characterise the phytoplankton taxonomic composition. It can be deleted as necessary
  Lines 69-70: This paragraph seems linked to the paragraph on lines 27-35 – both are describing limitations in current methodology and how hyperspectral data can overcome them. I would suggest rearranging the introduction to better highlight this point.
  
  Will do
  Line 71: I would suggest changing “phytoplankton communities and composition” to “phytoplankton community composition.”
  
  Will do
  Lines 123-134: Maybe I’m missing something, but it is unclear to me what the purpose of the PCA is. Why not just use the output groups from CHEMTAX? (see my comment from lines 337-338 below)
  
  Using PCA, we can summarise the multi-variate information derived from CHEMTAX, and thus represent the taxonomic information using a smaller number of variables (i.e. reducing the dimensionality of the data set). Moreover, due to uncertainties in the pigment-ratio inputs into CHEMTAX, such a statistical clustering approach eliminates the need for any assumptions about the pigment-taxa relationship. Finally, the output from CHEMTAX lists the biomass of individual phytoplankton taxa, which are not independent from the biomass of other phytoplankton groups, which would reduce the number of truly independent variables in the dataset.
  Lines 172-182: I would suggest only including cell size in this analysis if the paper is going to focus on ecology. If the paper is just focusing on the methodological differences for deriving phytoplankton community composition, then it would only be valuable if you were going to compare methods for different size ranges.
  The inclusion of cell size provides an additional validation that the results of the hyperspectral- derived community composition agrees with our understanding of the phytoplankton groups. Additionally, since pigment data do not yield direct results regarding phytoplankton size, we thought it would a good idea to examine that as well.
  Lines 204-214: Again, it is unclear to me why a PCA needs to be included. Why not just use the output groups from the gaussian decomposition?
  
  The output from the gaussian decomposition only provides information on the concentration of different pigment groups. The PCA is needed to translate pigment-concentrations into estimates of phytoplankton community clusters.
  Table 2: It would be interesting to see the MLR models developed for the other taxonomic group derived from CHEMTAX in addition to diatoms and haptophytes. If the statistical relationships for diatoms and haptophytes are the strongest, it would provide a concrete reason for why the authors focus on those groups throughout the remainder of the paper.
  
  In our study regions, only diatoms and haptophytes had enough biomass variability to allow estimation via MLR or other techniques.
  Lines 258-261: This size information is pretty basic and reflects known cell size differences between taxonomic groups. It would be more interesting to explore how cell size of the community as a whole or of a particular group (e.g. diatoms) varies spatially or temporally, especially if the authors decided to take a more ecological focus within the paper.
  
  We could try to include this information, but without additional data such as species composition, it could be difficult to validate and interpret the results.
  Figure 7 & 8: These comparisons are a little misleading because DPA and CHEMTAX outputs are used in the MLR, so therefore it would be expected that they would be correlated.
  
  Those figures serve to highlight how our model performs, and how the scatter around the 1:1 line compares with the Chla model of Zeng et al (2018).
  Lines 322-327: Discussing the merits of assessing multiple phytoplankton groups rather than just one seems unnecessary. I would suggest cutting this paragraph.
  
  We will do this.
  Lines 337-338: The fact that the resulting pigment clusters and the CHEMTAX-derived composition are correlated suggests that CHEMTAX is an accurate reflection of the community in this region. I would suggest moving the PCA and correlation between the two to an appendix and just using CHEMTAX data in the methods and analysis (with reference to the appendix for more information).
  
  Previous reviewers have taken strong issue with the use of CHEMTAX, and we feel that the PCA helps address these concerns.
  Technical corrections:
  In line 63 the authors specify that they will reference “Chlorophyll-a” as “Chla,” but they have already referenced “chlorophyll” and “chlorophyll-a” previously, e.g. lines 31, 41, 47, 48, 50, 57. Please define the use of “Chla” at the first mention of “chlorophyll” to make terminology consistent.
  
  Line 105: DPA was previously defined in line 42, don’t need to define again, just use “DPA” here.
  
  Line 284: Missing an “s” in “diatoms
  
  We will make these changes.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2851-AC2

Status: closed

RC1:
'Comment on egusphere-2023-2851', Anonymous Referee #1, 19 Jan 2024

The manuscript by Pillai et al. investigated phytoplankton community structures derived from HPLC pigment data and optical hyperspectral data based on underway spectrophotometric measurements in the Eastern Subarctic Pacific. The data shown could be potentially interesting, but there are several concerns that the authors need to clarify. Major and specific comments are provided below:

Major comments
1. The authors claimed that the MLR approach using a spectral decomposition outperformed compared with the Chla-based approach (Lines 276 – 277 and 316 – 319). However, it is difficult to understand the logic. Because the MLR approach for estimating diatom and haptophyte groups was developed using CHEMTAX and DPA estimates as a reference for phytoplankton community composition (Table 2), I think that inevitably, the MLR approach has a higher R² value than those in the Chla-based approach (Table 5). I am not convinced that the central claim of the manuscript is correct. More appropriate analysis and explanations are required.

2. I think that the author classified the data into three categories, i.e., the coastal, offshore, and all datasets based on 200 m depth (Line 100). However, the authors show the best-fit MLR equations for diatom using the coastal and all datasets, in contrast, they show the best-fit MLR equations for haptophyte using the offshore and all datasets (sometimes offshore only). For example, Table 2 shows the best-fit MLR equations for haptophytes based on the offshore dataset only, but Table 5 shows haptophyte biomass derived from MLR using offshore and all datasets, respectively. It is difficult to understand how the authors estimate haptophyte biomass using the MLR approach based on all datasets. Why were the three equations using each dataset (coastal, offshore, and all) not shown? These points should be clearly stated.

3. Diatom and haptophyte estimated from different algorithms (CHEMTAX, DPA, and optical bases) were well documented, but no details for the other phytoplankton functional types were provided. To focus and estimate these two groups from the optical hyperspectral data is fine. Still, it would be a good idea if the authors could first show the overall description and figures of CHEMTAX and DPA outputs e.g., like Talyor et al. (2021).

Taylor et al. (2011) Biogeosciences

https://doi.org/10.5194/bg-8-3609-2011

4. In relation to the No.3 major comment, looking at Figure 10, the proportions of diatom in some offshore stations were zero, in other words, the proportions of haptophyte were 100% predominant. Were these proportions estimated from the MLR algorithms? Additionally, the authors mentioned these proportions were in agreement with previous work (Lines 297 – 298). However, the previous work which the authors cited did not show the information on haptophyte. It is impossible to judge the results.

5. To evaluate the model performance, other statistical metrics (e.g., root mean squared error, RMSE) should be calculated. I think that this would be a great addition to the paper. For your reference, please refer to the following papers:

Brewin et al. (2016) Remote Sensing of Environment

http://dx.doi.org/10.1016/j.rse.2016.05.005

Tilstone et al. (2021) Remote Sensing of Environment

https://doi.org/10.1016/j.rse.2021.112444

6. Although the authors made a considerable effort to collect the data and I do believe that this is a valuable data set, a major shortcoming is the lack of real focus. For example, three-component models for phytoplankton size classes (micro-, nano-, and picoplankton) are common. However, this paper shows two size classes. In addition, I think that the data on DMS and DMSP is important, but it seems to me that the story is off topic. The spatial changes in DMS and DMSP were not fully discussed in the current form. To better organize the introduction, objectives, and R&D, the paper needs to be reorganized.

Specific comments
L90: Pigment measurement

The HPLC pigment data are quality controlled against the DHI phytoplankton pigment standard?

L97: 2.3 CHEMTAX and DPA analysis

Which methods, the successive runs by Latasa (2007) or the multiple starting points by Wright et al. (2009), were employed to obtain the optimized pigment: Chla ratio? The information will help readers understand the results more easily even if the information is described in Pena et al. (2019a).

Line 125 – 127: Following the methods of Kramer and Siegle (2019)…

Kramer and Sigle (2019) defined total chlorophyll-a (Tchla) as the sum of monovinyl chlorophyll a, divinyl chlorophyll a, chlorophyllide, and chlorophyll a allomers and epimers and finally excluded chlorophyllide for statistical analysis. Please confirm your dataset to be consistent with Kramer and Siegle (2019).

Line 162: Chla concentrations from the Absorption Line Height method

Were Chla concentrations derived from AC-S validated with Chla concentrations derived from HPLC?

Table 1.

Detailed contents of measurements conducted for each cruise and the total number of samples (coastal or offshore) should be provided.

Table 2.

The letter “l” is missing from the word All in diatom derived from DPA.

Figure 1.

It is better to add the contour line of 200 m depth. The information will help readers understand the coastal or offshore stations. Additionally, please make sure the legend of solid line. The legend shows 2016 Feb Line P, but such a cruise is not found in Table 1.

Figure 4.

Please specify % for the first and second axes.

Figure 9.

Please label as a, b, c, and d.

Figure 10.

Figures are aligned vertically. They are not left and right as mentioned in the figure description.

Citation: https://doi.org/10.5194/egusphere-2023-2851-RC1
- AC1: 'Reply on RC1', Sacchidanandan Pillai, 15 Feb 2024
  
  Based on the reviewer’s comments, we propose the following changes to the paper.
  Abstract: Change line 2-5 to state that we used Principal Components Analysis and Multiple linear regression of hyperspectral absorption data to derive the phytoplankton community composition validated with pigment -based data.
  Key aims of the paper Describe the spatial distribution of phytoplankton community composition in the coastal and offshore eastern Subarctic Pacific with higher spatiotemporal resolution than previously possible
  Use the hyperspectral-derived community composition to explain the distribution and oceanographic controls of phytoplankton groups, as well as DMS/DMSP concentrations.
  Introduction: Line 36-44: To be deleted
  Line 62- 70: Provide justification for the need for phytoplankton community composition data in the region, and a better description of existing DMS/DMSP data and distributions.
  Methods: Table 1: Add number of HPLC samples/ measurements taken
                  Figure 2: Move to the supplementary information
                    Section 2.13- Move to the supplementary information
  Results: Section 3.1 and 3.2 describe approaches to derive inputs for analysis of seasonality, distribution, and controls of phytoplankton groups, as well as for DMS/DMSP concentrations. Additional validation from the size information will also be presented. The comparison with the Chla algorithm will be moved to the supplementary information.
                  Section 3.4 will include a description of the spatial distribution of DMS/DMSP in the region.
                  Figure 7c and 8c will be combined into a single figure, with the aim of showing how we can capture high resolution information about the biomass of Diatoms and Haptophytes. Figures 7a, b and 8a and b will be moved into the supplementary information
  Discussion: Lines 321 to 325 to be moved to supplementary information.
                       Lines 339 to 352 to be combined into one paragraph focussing on how to improve the estimates of phytoplankton community composition algorithms.
                     Lines 353 to 362 will be expanded to highlight how the hyperspectrally-derived phytoplankton community composition expands our ability to understand key biogeochemical processes, as well as potential flaws in the analysis.
  Specific comments are responded below: The manuscript by Pillai et al. investigated phytoplankton community structures derived from HPLC pigment data and optical hyperspectral data based on underway spectrophotometric measurements in the Eastern Subarctic Pacific. The data shown could be potentially interesting, but there are several concerns that the authors need to clarify. Major and specific comments are provided below:
  Thank you for your review of the submitted paper. Based on both reviews submitted, we have decided to restructure the paper to focus less on the methodological aspects (including inter-comparisons), and more on demonstrating how hyperspectral data can be used to derive phytoplankton community composition, and help understand the oceanographic controls on phytoplankton groups and associated biogeochemical variables.
  Major comments
  1. The authors claimed that the MLR approach using a spectral decomposition outperformed compared with the Chla-based approach (Lines 276 – 277 and 316 – 319). However, it is difficult to understand the logic. Because the MLR approach for estimating diatom and haptophyte groups was developed using CHEMTAX and DPA estimates as a reference for phytoplankton community composition (Table 2), I think that inevitably, the MLR approach has a higher R²value than those in the Chla-based approach (Table 5). I am not convinced that the central claim of the manuscript is correct. More appropriate analysis an and explanations are required.
  Both our MLR method and the Chla-based approach of Zeng et al (2018) were tuned using DPA estimates as a reference. As such, we believe it is reasonable to compare the accuracy of these two approaches. We see no reason why the MLR method should inherently achieve better results, as suggested by the reviewer.
  
  2.I think that the author classified the data into three categories, i.e., the coastal, offshore, and all datasets based on 200 m depth (Line 100). However, the authors show the best-fit MLR equations for diatom using the coastal and all datasets, in contrast, they show the best-fit MLR equations for haptophyte using the offshore and all datasets (sometimes offshore only). For example, Table 2 shows the best-fit MLR equations for haptophytes based on the offshore dataset only, but Table 5 shows haptophyte biomass derived from MLR using offshore and all datasets, respectively. It is difficult to understand how the authors estimate haptophyte biomass using the MLR approach based on all datasets. Why were the three equations using each dataset (coastal, offshore, and all) not shown? These points should be clearly stated.
  Most of the diatom variability occurs in the coastal dataset, which explains why the offshore data subset was not used to fit the diatom signal. Conversely, haptophytes dominate in the offshore regions, and this subset was thus used to fit haptophyte abundance. This is now further explained / justified in the revised methods section.
  
  3.Diatom and haptophyte estimated from different algorithms (CHEMTAX, DPA, and optical bases) were well documented, but no details for the other phytoplankton functional types were provided. To focus and estimate these two groups from the optical hyperspectral data is fine. Still, it would be a good idea if the authors could first show the overall description and figures of CHEMTAX and DPA outputs e.g., like Talyor et al. (2021).
  Diatoms and haptophytes are the dominant groups in our study area (Line 264-266). But to address the reviewer’s comment, we have included figures showing additional details of the CHEMTAX and DPA outputs in the supporting information section.
  4. In relation to the No.3 major comment, looking at Figure 10, the proportions of diatom in some offshore stations were zero, in other words, the proportions of haptophyte were 100% predominant. Were these proportions estimated from the MLR algorithms? Additionally, the authors mentioned these proportions were in agreement with previous work (Lines 297 – 298). However, the previous work which the authors cited did not show the information on haptophyte. It is impossible to judge the results.
  The proportions shown in Figure 10 are based on the prevalence of the different communities as determined from the Linear Discriminant Analysis (Methods Section 2.11, Results section 3.1.2). While Pena and Varela (2007) may not have considered haptophytes explicitly, their results indicated that phytoplankton biomass was dominated by less than <5µm phytoplankton, which would include haptophytes. We also now cite Pena et al (2018), who showed that offshore Haptophytes are fairly dominant, with minimum seasonal changes. To make this clearer, we will also change the y-axis labels on Fig. 10 to proportion of optically derived communities.
  5. To evaluate the model performance, other statistical metrics (e.g., root mean squared error, RMSE) should be calculated. I think that this would be a great addition to the paper. For your reference, please refer to the following papers:
  As suggested, we have included calculations of RMSE in the revised paper.
  6. Although the authors made a considerable effort to collect the data and I do believe that this is a valuable data set, a major shortcoming is the lack of real focus. For example, three-component models for phytoplankton size classes (micro-, nano-, and picoplankton) are common. However, this paper shows two size classes. In addition, I think that the data on DMS and DMSP is important, but it seems to me that the story is off-topic. The spatial changes in DMS and DMSP were not fully discussed in the current form. To better organize the introduction, objectives, and R&D, the paper needs to be reorganized.
  Due to technical difficulties in obtaining size fractionated optical properties of three phytoplankton size class, we were only able to distinguish between two size classes. However, based on DPA, CHEMTAX and previous observations in the region, we found that the variability of pico- and nanoplankton was much smaller than that of microplankton. Two size classes, while perhaps not optimal, still allowed us to achieve our goal of validating our optical method for phytoplankton assemblage characterization.
  Regarding the DMS and DMSP data, we have re-structured the paper (as noted above) to provide more focus on the oceanographic drivers of phytoplankton taxonomic shifts, and their impact on other key biogeochemical variable. In this respect, DMS/P is both interesting and important. Both these compounds play a significant role in microbial carbon and sulfur cycling, and (in the case of DMS) can also impact regional climate. It is well known that DMS concentrations often show poor correlations with bulk chlorophyll concentrations, due to strong taxonomic differences in DMS/P production across different phytoplankton groups. Our results are thus important and novel in showing how high-resolution taxonomic data (derived from optical sensors) can help explain variability in DMS/P. The high-resolution taxonomic estimates are particularly important given the increasing availability of underway DMS measurement systems that provide high frequency measurements along a ship-track. To address the reviewer’s concern, we have provided more data and discussion of the spatial trends in DMS/P across our survey region.
  
  Specific comments
  L90: Pigment measurement
  
  The HPLC pigment data are quality controlled against the DHI phytoplankton pigment standard?
  Yes, we have clarified this, and added a citation to Nemcek and Pena (2014), where the standards and more specific procedures are described in detail.
  L97: 2.3 CHEMTAX and DPA analysis
  
  Which methods, the successive runs by Latasa (2007) or the multiple starting points by Wright et al. (2009), were employed to obtain the optimized pigment: Chla ratio? The information will help readers understand the results more easily even if the information is described in Pena et al. (2019a).
  We used the multiple starting points method of Wright et al (2009), and have now clarified this.
  Line 125 – 127: Following the methods of Kramer and Siegle (2019)…
  
  Kramer and Sigle (2019) defined total chlorophyll-a (Tchla) as the sum of monovinyl chlorophyll a, divinyl chlorophyll a, chlorophyllide, and chlorophyll a allomers and epimers and finally excluded chlorophyllide for statistical analysis. Please confirm your dataset to be consistent with Kramer and Siegle (2019).
  We were specifically referring to the removal of degradation pigments and redundant pigments form the analysis. While we may not have measured exactly the same allomers and epimers as Kramer and Siegel (2019), our calculation of TChla should contain the majority of the same pigments.
  Line 162: Chla concentrations from the Absorption Line Height method
  
  Were Chla concentrations derived from AC-S validated with Chla concentrations derived from HPLC?
  Poor wording on our part. It will be rephrased to:
  Following Boss et al. (2007), particulate absorption at 676 nm (aph(676)) was used to estimate Chla concentration.   A linear equation aph(676) = ap(676) − 39 65 ap(650) − 26 65 165 ap(715), was used to subtract a baseline absorption (between 650 and 715 nm). We calibrated aph(676) against HPLC measured Chla (Chlorophyll-a + Divinyl Chlorophyll-a + Chlorophyllide-a).
  Table 1.
  
  Detailed contents of measurements conducted for each cruise and the total number of samples (coastal or offshore) should be provided.
  This information was mostly provided in the text part of our original submission, but has now been included in the revised table, as requested.
  
  Table 2.
  
  The letter “l” is missing from the word All in diatom derived from DPA.
  Corrected – thanks for catching that.
  Figure 1.
  
  It is better to add the contour line of 200 m depth. The information will help readers understand the coastal or offshore stations. Additionally, please make sure the legend of solid line. The legend shows 2016 Feb Line P, but such a cruise is not found in Table 1.
  That was a mistake, the figure shows the 2017 Feb cruise. As requested, we will include the 200m bathymetry.
  Figure 4.
  
  Please specify % for the first and second axes.
  We have now included this (it is 27% and 24% respectively.
  Figure 9.
  
  Please label as a, b, c, and d.
  Done
  Figure 10.
  
  Figures are aligned vertically. They are not left and right as mentioned in the figure description.
  Changed, as requested.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2851-AC1
RC2:
'Comment on egusphere-2023-2851', Anonymous Referee #2, 25 Jan 2024
Major comments:
This paper characterizes phytoplankton community composition in the eastern Subarctic Pacific Ocean using continuously-collected hyperspectral absorption data in comparison to phytoplankton pigment data. The dataset appears to have a lot of value, but there are some major concerns that need to be addressed before publication.
My biggest area of concern is the lack of focus within the manuscript - the authors try to cover too much. I would suggest either making it a dedicated methods comparison paper or choosing a phytoplankton community composition characterization method (this could be the author’s MLR model) and focusing on the ecology of the system.
If the authors made it a methods comparison paper, I would envision it comparing the taxonomic output from the Hirata chlorophyll algorithm, HPLC CHEMTAX and DPA methods, gaussian decomposition of absorption spectra, and the MLR that uses HPLC and gaussian decomposition data. The results could focus on the comparisons between different methods and why those relationships may exist. Based on the results, the authors could suggest a best method for the region, which could be applied to subsequent papers on phytoplankton ecology, etc. I would suggest leaving out the other environmental variables (nutrients, salinity, SST, DMS, DMSP) unless the authors wanted to do methods comparisons for offshore vs. inshore, high vs. low nutrient environments, etc. to see if certain methods were better fits for different environments.
If the authors made it a phytoplankton ecology paper:
The introduction could highlight WHY there is a need to develop a new taxonomic classification method – for example, state (1) the downfalls of the chl-based Zeng algorithm, discretely-collected HPLC, and the gaussian decomposition of absorption spectra, and (2) why the new algorithm is necessary and superior to these methods. The introduction would also need to dive deeper into why understanding phytoplankton community composition in this region is important – for example, the authors could highlight a gap in understanding that the manuscript analysis would fill.

The methods could solely focus on the MLR model with a section on HPLC methods, a section on gaussian decomposition, and a section on building of MLR from the two.

In the results, the authors wouldn’t need to discuss methods comparisons because the advantages and motivations for developing the new model were already discussed in the introduction. Instead, the results could focus on applying the community composition outputs from the MLR model to study the ecology – namely to fill the gap in understanding that was introduced in the introduction.

Specific comments:
The paragraph on lines 27-35 implies that the current study will help overcome the limitations of discrete sampling, but we don’t learn how this will be done until two paragraphs later. I would suggest reformatting this so that the link (that hyperspectral data can be collected continuously, increasing spatial and temporal resolution) is made more explicit either by combining these two paragraphs or at least putting them back-to-back.
Along these lines, it’s not clear to me what the shortcomings for the gaussian decomposition of absorption spectra are – it would be helpful to elaborate on this somewhere in your introduction to further explain the motivation for creating a new method.

Lines 36-44: This paragraph is confusing to me. It appears to me that this study is using different methods to characterize phytoplankton community composition. I’m not sure that classifying qualitative vs. quantitative methods clarifies any forthcoming methodologies or adds anything meaningful to the paper. I would suggest deleting it.

Lines 69-70: This paragraph seems linked to the paragraph on lines 27-35 – both are describing limitations in current methodology and how hyperspectral data can overcome them. I would suggest rearranging the introduction to better highlight this point.

Line 71: I would suggest changing “phytoplankton communities and composition” to “phytoplankton community composition.”

Lines 123-134: Maybe I’m missing something, but it is unclear to me what the purpose of the PCA is. Why not just use the output groups from CHEMTAX? (see my comment from lines 337-338 below)

Lines 172-182: I would suggest only including cell size in this analysis if the paper is going to focus on ecology. If the paper is just focusing on the methodological differences for deriving phytoplankton community composition, then it would only be valuable if you were going to compare methods for different size ranges.

Lines 204-214: Again, it is unclear to me why a PCA needs to be included. Why not just use the output groups from the gaussian decomposition?

Table 2: It would be interesting to see the MLR models developed for the other taxonomic group derived from CHEMTAX in addition to diatoms and haptophytes. If the statistical relationships for diatoms and haptophytes are the strongest, it would provide a concrete reason for why the authors focus on those groups throughout the remainder of the paper.

Lines 258-261: This size information is pretty basic and reflects known cell size differences between taxonomic groups. It would be more interesting to explore how cell size of the community as a whole or of a particular group (e.g. diatoms) varies spatially or temporally, especially if the authors decided to take a more ecological focus within the paper.

Figure 7 & 8: These comparisons are a little misleading because DPA and CHEMTAX outputs are used in the MLR, so therefore it would be expected that they would be correlated.

Lines 322-327: Discussing the merits of assessing multiple phytoplankton groups rather than just one seems unnecessary. I would suggest cutting this paragraph.

Lines 337-338: The fact that the resulting pigment clusters and the CHEMTAX-derived composition are correlated suggests that CHEMTAX is an accurate reflection of the community in this region. I would suggest moving the PCA and correlation between the two to an appendix and just using CHEMTAX data in the methods and analysis (with reference to the appendix for more information).

Technical corrections:
In line 63 the authors specify that they will reference “Chlorophyll-a” as “Chla,” but they have already referenced “chlorophyll” and “chlorophyll-a” previously, e.g. lines 31, 41, 47, 48, 50, 57. Please define the use of “Chla” at the first mention of “chlorophyll” to make terminology consistent.

Line 105: DPA was previously defined in line 42, don’t need to define again, just use “DPA” here.

Line 284: Missing an “s” in “diatoms”
Citation: https://doi.org/10.5194/egusphere-2023-2851-RC2
- AC2:
  'Reply on RC2', Sacchidanandan Pillai, 15 Feb 2024
  Based on the reviewer’s comments, we propose the following changes to the paper.
  Abstract: Change line 2-5 to state that we used Principal Components Analysis and Multiple linear regression of hyperspectral absorption data to derive the phytoplankton community composition validated with pigment -based data.
  Key aims of the paper Describe the spatial distribution of phytoplankton community composition in the coastal and offshore eastern Subarctic Pacific with higher spatiotemporal resolution than previously possible
  Use the hyperspectral-derived community composition to explain the distribution and oceanographic controls of phytoplankton groups, as well as DMS/DMSP concentrations.
  Introduction: Line 36-44: To be deleted
  Line 62- 70: Provide justification for the need for phytoplankton community composition data in the region, and a better description of existing DMS/DMSP data and distributions.
  Methods: Table 1: Add number of HPLC samples/ measurements taken
                  Figure 2: Move to the supplementary information
                    Section 2.13- Move to the supplementary information
  Results: Section 3.1 and 3.2 describe approaches to derive inputs for analysis of seasonality, distribution, and controls of phytoplankton groups, as well as for DMS/DMSP concentrations. Additional validation from the size information will also be presented. The comparison with the Chla algorithm will be moved to the supplementary information.
                  Section 3.4 will include a description of the spatial distribution of DMS/DMSP in the region.
                  Figure 7c and 8c will be combined into a single figure, with the aim of showing how we can capture high resolution information about the biomass of Diatoms and Haptophytes. Figures 7a, b and 8a and b will be moved into the supplementary information
  Discussion: Lines 321 to 325 to be moved to supplementary information.
                       Lines 339 to 352 to be combined into one paragraph focussing on how to improve the estimates of phytoplankton community composition algorithms.
                     Lines 353 to 362 will be expanded to highlight how the hyperspectrally-derived phytoplankton community composition expands our ability to understand key biogeochemical processes, as well as potential flaws in the analysis.
  Major comments:
  This paper characterizes phytoplankton community composition in the eastern Subarctic Pacific Ocean using continuously-collected hyperspectral absorption data in comparison to phytoplankton pigment data. The dataset appears to have a lot of value, but there are some major concerns that need to be addressed before publication.
  My biggest area of concern is the lack of focus within the manuscript - the authors try to cover too much. I would suggest either making it a dedicated methods comparison paper or choosing a phytoplankton community composition characterization method (this could be the author’s MLR model) and focusing on the ecology of the system.
  If the authors made it a methods comparison paper, I would envision it comparing the taxonomic output from the Hirata chlorophyll algorithm, HPLC CHEMTAX and DPA methods, gaussian decomposition of absorption spectra, and the MLR that uses HPLC and gaussian decomposition data. The results could focus on the comparisons between different methods and why those relationships may exist. Based on the results, the authors could suggest a best method for the region, which could be applied to subsequent papers on phytoplankton ecology, etc. I would suggest leaving out the other environmental variables (nutrients, salinity, SST, DMS, DMSP) unless the authors wanted to do methods comparisons for offshore vs. inshore, high vs. low nutrient environments, etc. to see if certain methods were better fits for different environments.
  If the authors made it a phytoplankton ecology paper:
  The introduction could highlight WHY there is a need to develop a new taxonomic classification method – for example, state (1) the downfalls of the chl-based Zeng algorithm, discretely-collected HPLC, and the gaussian decomposition of absorption spectra, and (2) why the new algorithm is necessary and superior to these methods. The introduction would also need to dive deeper into why understanding phytoplankton community composition in this region is important – for example, the authors could highlight a gap in understanding that the manuscript analysis would fill.
  
  The methods could solely focus on the MLR model with a section on HPLC methods, a section on gaussian decomposition, and a section on building of MLR from the two.
  
  In the results, the authors wouldn’t need to discuss methods comparisons because the advantages and motivations for developing the new model were already discussed in the introduction. Instead, the results could focus on applying the community composition outputs from the MLR model to study the ecology – namely to fill the gap in understanding that was introduced in the introduction.
  
  Thank you for these useful suggestions. Based on these comments, we have decided to focus our work on the application of our method to better understanding phytoplankton ecology and the distribution of key biogeochemical variables.
  
  Specific comments:
  The paragraph on lines 27-35 implies that the current study will help overcome the limitations of discrete sampling, but we don’t learn how this will be done until two paragraphs later. I would suggest reformatting this so that the link (that hyperspectral data can be collected continuously, increasing spatial and temporal resolution) is made more explicit either by combining these two paragraphs or at least putting them back-to-back.
  
  We have reformatted the introduction, and now provide a clearer justification for the utility of high-resolution, automated measurements.
  Along these lines, it’s not clear to me what the shortcomings for the gaussian decomposition of absorption spectra are – it would be helpful to elaborate on this somewhere in your introduction to further explain the motivation for creating a new method.
  The gaussian decomposition of the spectra only yields Gaussian amplitudes which are related to the pigment concentration, rather than to the phytoplankton community composition. Additional analysis is needed to determine the community composition from the gaussian amplitudes, which we explore here. This has now been clarified in the revised text, which now de-emphasizes comparisons across methods.
  Lines 36-44: This paragraph is confusing to me. It appears to me that this study is using different methods to characterize phytoplankton community composition. I’m not sure that classifying qualitative vs. quantitative methods clarifies any forthcoming methodologies or adds anything meaningful to the paper. I would suggest deleting it.
  
  This serves to help clarify terminology used later in the paper, as well as to help explore different methods used to characterise the phytoplankton taxonomic composition. It can be deleted as necessary
  Lines 69-70: This paragraph seems linked to the paragraph on lines 27-35 – both are describing limitations in current methodology and how hyperspectral data can overcome them. I would suggest rearranging the introduction to better highlight this point.
  
  Will do
  Line 71: I would suggest changing “phytoplankton communities and composition” to “phytoplankton community composition.”
  
  Will do
  Lines 123-134: Maybe I’m missing something, but it is unclear to me what the purpose of the PCA is. Why not just use the output groups from CHEMTAX? (see my comment from lines 337-338 below)
  
  Using PCA, we can summarise the multi-variate information derived from CHEMTAX, and thus represent the taxonomic information using a smaller number of variables (i.e. reducing the dimensionality of the data set). Moreover, due to uncertainties in the pigment-ratio inputs into CHEMTAX, such a statistical clustering approach eliminates the need for any assumptions about the pigment-taxa relationship. Finally, the output from CHEMTAX lists the biomass of individual phytoplankton taxa, which are not independent from the biomass of other phytoplankton groups, which would reduce the number of truly independent variables in the dataset.
  Lines 172-182: I would suggest only including cell size in this analysis if the paper is going to focus on ecology. If the paper is just focusing on the methodological differences for deriving phytoplankton community composition, then it would only be valuable if you were going to compare methods for different size ranges.
  The inclusion of cell size provides an additional validation that the results of the hyperspectral- derived community composition agrees with our understanding of the phytoplankton groups. Additionally, since pigment data do not yield direct results regarding phytoplankton size, we thought it would a good idea to examine that as well.
  Lines 204-214: Again, it is unclear to me why a PCA needs to be included. Why not just use the output groups from the gaussian decomposition?
  
  The output from the gaussian decomposition only provides information on the concentration of different pigment groups. The PCA is needed to translate pigment-concentrations into estimates of phytoplankton community clusters.
  Table 2: It would be interesting to see the MLR models developed for the other taxonomic group derived from CHEMTAX in addition to diatoms and haptophytes. If the statistical relationships for diatoms and haptophytes are the strongest, it would provide a concrete reason for why the authors focus on those groups throughout the remainder of the paper.
  
  In our study regions, only diatoms and haptophytes had enough biomass variability to allow estimation via MLR or other techniques.
  Lines 258-261: This size information is pretty basic and reflects known cell size differences between taxonomic groups. It would be more interesting to explore how cell size of the community as a whole or of a particular group (e.g. diatoms) varies spatially or temporally, especially if the authors decided to take a more ecological focus within the paper.
  
  We could try to include this information, but without additional data such as species composition, it could be difficult to validate and interpret the results.
  Figure 7 & 8: These comparisons are a little misleading because DPA and CHEMTAX outputs are used in the MLR, so therefore it would be expected that they would be correlated.
  
  Those figures serve to highlight how our model performs, and how the scatter around the 1:1 line compares with the Chla model of Zeng et al (2018).
  Lines 322-327: Discussing the merits of assessing multiple phytoplankton groups rather than just one seems unnecessary. I would suggest cutting this paragraph.
  
  We will do this.
  Lines 337-338: The fact that the resulting pigment clusters and the CHEMTAX-derived composition are correlated suggests that CHEMTAX is an accurate reflection of the community in this region. I would suggest moving the PCA and correlation between the two to an appendix and just using CHEMTAX data in the methods and analysis (with reference to the appendix for more information).
  
  Previous reviewers have taken strong issue with the use of CHEMTAX, and we feel that the PCA helps address these concerns.
  Technical corrections:
  In line 63 the authors specify that they will reference “Chlorophyll-a” as “Chla,” but they have already referenced “chlorophyll” and “chlorophyll-a” previously, e.g. lines 31, 41, 47, 48, 50, 57. Please define the use of “Chla” at the first mention of “chlorophyll” to make terminology consistent.
  
  Line 105: DPA was previously defined in line 42, don’t need to define again, just use “DPA” here.
  
  Line 284: Missing an “s” in “diatoms
  
  We will make these changes.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2851-AC2

Sacchidanandan Viruthasalam Pillai, M. Angelica Peña, Brandon J. McNabb, William J. Burt, and Philippe D. Tortell

Viewed

Total article views: 964 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
625	284	55	964	71	75

HTML: 625
PDF: 284
XML: 55
Total: 964
BibTeX: 71
EndNote: 75

Views and downloads (calculated since 08 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	90	26	8	124
Jan 2024	46	17	4	67
Feb 2024	33	10	6	49
Mar 2024	17	9	0	26
Apr 2024	21	11	4	36
May 2024	19	7	2	28
Jun 2024	17	12	4	33
Jul 2024	34	8	4	46
Aug 2024	10	2	3	15
Sep 2024	12	4	0	16
Oct 2024	10	4	0	14
Nov 2024	15	4	0	19
Dec 2024	12	5	0	17
Jan 2025	4	13	1	18
Feb 2025	11	10	0	21
Mar 2025	10	7	0	17
Apr 2025	7	8	0	15
May 2025	11	11	1	23
Jun 2025	14	13	2	29
Jul 2025	18	4	0	22
Aug 2025	40	9	1	50
Sep 2025	47	17	1	65
Oct 2025	19	21	1	41
Nov 2025	19	21	3	43
Dec 2025	22	11	5	38
Jan 2026	33	13	2	48
Feb 2026	32	7	3	42
Mar 2026	2	0	2

Cumulative views and downloads (calculated since 08 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	90	26	8	124
Jan 2024	46	17	4	67
Feb 2024	33	10	6	49
Mar 2024	17	9	0	26
Apr 2024	21	11	4	36
May 2024	19	7	2	28
Jun 2024	17	12	4	33
Jul 2024	34	8	4	46
Aug 2024	10	2	3	15
Sep 2024	12	4	0	16
Oct 2024	10	4	0	14
Nov 2024	15	4	0	19
Dec 2024	12	5	0	17
Jan 2025	4	13	1	18
Feb 2025	11	10	0	21
Mar 2025	10	7	0	17
Apr 2025	7	8	0	15
May 2025	11	11	1	23
Jun 2025	14	13	2	29
Jul 2025	18	4	0	22
Aug 2025	40	9	1	50
Sep 2025	47	17	1	65
Oct 2025	19	21	1	41
Nov 2025	19	21	3	43
Dec 2025	22	11	5	38
Jan 2026	33	13	2	48
Feb 2026	32	7	3	42
Mar 2026	2	0	2

Viewed (geographical distribution)

Total article views: 946 (including HTML, PDF, and XML) Thereof 946 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 03 Mar 2026

Short summary

We investigated how hyperspectral optical data collected in the North Pacific can be used to determine the phytoplankton community composition. We used the optically derived infomation of the phytoplankton community to examine the phytoplankton sizes, oceanographic controls and links to other biogeochemical variables. This work was motivated by the upcoming launch of the PACE satellite by NASA and the increased availability of hyperspectral optical measurements in oceanographic studies.


Total:	0
HTML:	0
PDF:	0
XML:	0