Net primary productivity in the Gulf of California: spatiotemporal patterns and drivers in a warming climate (2003&ndash;2024)

González-Rodríguez, Eduardo; Acevedo-Acosta, Juan David; Cervantes-Duarte, Rafael; Gómez-Gutiérrez, Jaime

doi:10.5194/egusphere-2026-1040

Preprints

https://doi.org/10.5194/egusphere-2026-1040

Preprints

10 Mar 2026

| 10 Mar 2026

Net primary productivity in the Gulf of California: spatiotemporal patterns and drivers in a warming climate (2003–2024)

Eduardo González-Rodríguez, Juan David Acevedo-Acosta, Rafael Cervantes-Duarte, and Jaime Gómez-Gutiérrez

Abstract. The Gulf of California is a highly productive and diverse ecosystem that supports nearly 70 % of Mexico's fisheries. Despite its ecological and socio-economic importance, basin-scale analysis of seasonal and interannual variability in net primary productivity (NPP). This study analyses 22 years (2003–2024) of satellite-derived NPP data to identify its main spatiotemporal patterns and environmental drivers. The monthly climatology reveals that the cold season (December–May) accounts for 68 % of the annual NPP, while the warm season (June–November) contributes only 32 %. To characterize regional variability, we applied the k-means++ clustering algorithm, and validated the resultant classification using discriminant analysis. This procedure robustly identified three productivity zones: high (HPZ; > 1700 mg C m^-2 d^-1), mid (MPZ; 900–1700 mg C m⁻² d⁻¹), and low (LPZ; < 900 mg C m^-2 d^-1). Interannually, the HPZ contributes 61.4 % of total NPP despite occupying only 30 % of the gulf’s area, whereas the LPZ covers 36 % of the area but produces just 16.9 %. A significant negative anomaly between 2013 and 2019 led to a 9 % decline in NPP relative to the long-term mean (124 Tg C y^-1), a period coincided with intense warming episodes, including multiple marine heatwaves and the strong 2015–2016 El Niño. A partial recovery occurred between 2020 and 2024. The dominant environmental drivers of NPP differ among zones: sea surface temperature and mixed-layer depth control NPP in the HPZ, temperature and euphotic zone depth in the MPZ, and euphotic depth and phosphate availability in the LPZ. Overall, our analysis demonstrates that anomalously warm years compress the spatial extent of productive zones. Continued warming and stratification are therefore likely to expand oligotrophic conditions, increasing the area of low-productivity waters with profound ecological consequences for the Gulf of California.

Received: 23 Feb 2026 – Discussion started: 10 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Eduardo González-Rodríguez, Juan David Acevedo-Acosta, Rafael Cervantes-Duarte, and Jaime Gómez-Gutiérrez

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-1040', Anonymous Referee #1, 02 Jun 2026
This is potentially an interesting paper, but it requires a major revision for the following reasons:
While K-means clustering is an accepted method of analysis the key question is how the number K of clusters is determined. There is no explanation of this in the paper. It is simply stated that NPP was grouped into 3 clusters using the k-means++ algorithm. How was the number of clusters (3) determined? Did the authors use, for example, the elbow method? Or some other approach? Why is 3 the optimum number of clusters for NPP?

It is unclear how discriminant analysis was applied as no details are given other than to state that it was applied to all variable except NPP – why? Again, the authors need to explain the details of their analysis for the results to be convincing.

The clustering was done on NPP only. Why not on NPP together with other variables, such as SST and/or EZD? Was this attempted? If not, why not?

Was any comparison made between reanalysis products used and observations for this region? How confident can the authors be in their results based on reanalyses? Would the use of different reanalyses lead to different results (one might a priori think so)? Reanalyses are known to differ significantly.

What assumptions are implicit in the use of a generalised linear mixed model with Gaussian errors? Are these assumptions consistent with the behaviour of the variables being fitted? Also given that there is an effect due to the Colorado River (freshwater?) why wasn’t salinity included in the analysis?

The discussion section seems very long (and quite involved) compared to the overall length of the paper. This could do with being more succinct and easier to follow.

Minor comments:
With regard to reanalysis variables (2.2.2) why weren’t these obtained from CMEMS to be consistent with the use of CMEMS for currents? Why use different reanalyses?

In section 3.2. how did the authors determine which observations were misclassified? What is their definition of misclassification?

In section 3.3 there is mention of the effect of the Colorado River outflow, but its position doesn’t seem to be marked on any of the figures.

Figure 5 is quite small (even smaller when printed in the journal) so it is difficult to see the details. Also, in (b) there seems to be some anti-correlation between LPZ and HPZ – is this significant?

Line 263 states that the regions determined by the authors align with previous classifications, so what is new / different here?

Line 337 mentions a “tipping point” but what do the authors mean by this term? There is confusion in the literature about use of the term “tipping point” and here the authors are referring to a period 2014-2016 rather than a point in time.
Citation: https://doi.org/10.5194/egusphere-2026-1040-RC1
RC2:
'Comment on egusphere-2026-1040', Anonymous Referee #2, 16 Jun 2026
This study uses two decades of satellite and reanalysis data to map and track three productivity zones, high, medium, and low, in the Gulf of California and to examine their seasonal and interannual variability. The descriptive contribution is a genuine strength. The manuscript presents a long-term, basin-scale account of where and when productivity is high or low, including a well-documented 2014 to 2016 decline and subsequent recovery. The use of machine learning methods is appropriate for this kind of question, and the manuscript has the potential to be an excellent contribution.
However, I have serious concerns about the methods as applied, which currently prevent the central conclusions from being supported. The most important is a lack of independence among the variables. Causal control over NPP is attributed to predictors, notably SST and euphotic zone depth, that are themselves inputs to the VGPM algorithm used to compute NPP. As a result, the driver analysis is in part recovering the structure of the production model rather than independent environmental control. This is not acknowledged or accounted for, and it must be addressed before the conclusions can stand. A second concern is that the number of clusters is not justified by any cluster validity analysis, and there are a few indications that the clustering method is not fully understood, which I detail below. I recommend major revisions. With the predictor set properly decorrelated, the classification justified or reframed, and model performance cross validated, this could become a sound paper. My specific comments follow.

Major Revisions

1. The manuscript uses a Generalized Linear Mixed Model to identify significant drivers of NPP (Section 2.3.4). A mixed model requires a random effect, which models structure in the data such as the grouping of repeated observations within zones or within months. The random effect is not defined anywhere in the methods, and the stated model structure (line 160) contains only fixed effects. If a random effect was included, please specify it. If one was not, the model is a fixed effects GLM rather than a GLMM and should be described as such. This distinction matters because a properly specified random structure is one of the standard tools for handling non-independence in the data, which is itself a central concern with the current analysis (see comments below). Additionally, the model mixes standardized predictors (SST, EZD, MLD, per line 153) with apparently unstandardized ones (nutrients, currents, winds, SI), evident in the order-of-magnitude differences in standard errors in Table 1 (e.g. UAR ~1.8–2.2 vs SST ~0.08). Because coefficient magnitudes are only comparable across predictors on a common scale, the conclusion that "SST was the dominant predictor" cannot be read from coefficient size as currently presented. Consider standardizing (z-scored) all continuous predictors before fitting.

2. On lines 155 to 156 the authors describe the model as a GLMM with Gaussian errors. Choosing Gaussian errors assumes the data are symmetric around the average, with roughly even spread above and below it, like a normal bell curve. NPP does not behave this way. Because it is a productivity measurement it cannot go below zero, and values like these are usually bunched up at the low end with a long tail of high values, rather than spread evenly. This is a common and well known feature of productivity and concentration data, and it is the reason researchers often work with the logarithm of these values instead. The manuscript does not show any check that the Gaussian assumption is reasonable here, such as a plot of the model residuals or a test for skew.
This matters because the flexibility to handle data that are not bell shaped is the main advantage of this family of models. Choosing the Gaussian option sets that flexibility aside and treats the skewed NPP data as if it were symmetric, which can distort the model and make the significance values for each driver unreliable. I recommend the authors either fit the model using an error type suited to positive, skewed data, such as a Gamma or log normal, or take the logarithm of NPP before fitting. If the authors prefer to keep the Gaussian model, they should show that the residuals are well behaved and justify that choice.

3. A central problem with the feature importance analysis here is the circularity of the features going into the GLM. NPP in this study is not measured, it is calculated by a model (VGPM) based on measured satellite products. Two of the input variables are SST and euphotic zone depth. When the analysis then asks what controls NPP, SST and euphotic-zone depth emerge as the strongest predictors because they are the same quantities used to compute NPP in the first place. The coupling is especially tight for euphotic-zone depth, which the authors derive directly from chlorophyll-a (lines 91 to 92), and chlorophyll is itself the primary driver of the VGPM calculation. EZD and NPP are therefore linked mathematically before any ocean process is considered. A further consequence is that the analysis dismisses genuinely independent variables that likely exert real control, such as the wind components, nitrate, and stratification, all of which were non-significant in every zone (Table 1).
This circularity is visible in the euphotic-depth result itself, which carries the wrong sign. The physically expected relationship is negative, because more productive water is greener and attenuates light more strongly, so high NPP should correspond to a shallower euphotic zone. The authors describe exactly this negative relationship in their discussion (lines 310 to 311), stating that enhanced mixing "reduces euphotic zone depth anomalies, reflecting increased phytoplankton biomass and reduced light penetration associated with productive conditions." Yet the GLMM reports a significant positive coefficient for EZD in the HPZ (β = 0.775, p < 0.01; Table 1, lines 234 to 235). The model thus contradicts both the established physical relationship and the authors' own text, which is most consistent with the coefficient reflecting collinearity among the entangled predictors rather than a physical effect.

4. The GLM(M) treats each observation as independent through time. However, measured oceanographic data are inherently autocorrelated. The temperature and chlorophyll of a given month are closely related to those of the previous month, and the pattern repeats every 12 months with the seasonal cycle. The consequence is that significance values can be artificially low, because the test assumes far more independent observations than the data actually contain. The p-value for the HPZ SST coefficient in Table 1 may be an indicator of this. It is on the order of 10^-48, and it would be reasonable to expect a 22-year monthly series to produce a much larger p-value than that, even if the relationship ultimately remains statistically significant. This issue would be present anywhere significance is claimed, for example Table 1, the Spearman correlations (Methods, line 122), the TREND least-squares regression (Methods, lines 123 to 124), and the ENSO correlation rs(264) = -0.21, p < 0.05 (lines 318 to 319). The latter is also a weak relationship in absolute terms, explaining only about 4% of the variance, so even setting the significance aside, ENSO accounts for very little of the variability in HPZ coverage.
The autocorrelation issue affects the ability to determine whether trends or slopes are significant. It does not necessarily mean the relationships are wrong. Autocorrelation is straightforward to account for, and I propose the following changes, though other equally valid methods exist.
For the GLMM, include an autoregressive error term or a properly specified random effect, and report the effective degrees of freedom.

When analyzing trends, use a statistical test designed for serially dependent data, such as Mann-Kendall with Theil-Sen, which is also available in the same Climate Data Toolbox.

For the ENSO correlation, correct the effective sample size for autocorrelation before claiming significance, and report the effect size.

5. For the k-means clustering, it is not clear where k=3 comes from. The authors do not provide any elbow, silhouette, gap statistic, or stability analysis to support this choice. The authors further suggest that k=3 is consistent with other analyses that divide the region but those values range from 12-22 and general cardinal directions (lines 253-254). An easy fix is to report a cluster-validity criterion or reframe the three zones as threshold-based opposed to data-determined clusters.
6. K-means does not produce misclassified observations, it assigns each point to its nearest centroid. However, the authors report "8.7% misclassified" and per-group "accuracies" of 92.4/96.1/80.1% (lines 173–174) which can only come from the discriminant analysis (how well the non-NPP variables reproduce the cluster labels), not from the clustering itself. Presenting these as if they validate the clustering conflates two different things and suggests the method was not fully understood.
Additionally, discriminant reclassification accuracy does not validate the number of clusters. Any number of k would yield high accuracy because separating bands of a gradient by correlated variables is trivial. As evidence, the weakest group is the middle one (MPZ, 80.1%), exactly the signature of a boundary-straddling band in a continuum rather than a discrete regime. Reclassification accuracy is a valid separability check when the classes are defined independently and assessed with cross-validation, but here the classes were produced by k-means on NPP itself, so high accuracy is largely circular and cannot demonstrate that three distinct regimes exist.
The GLMM can remain a valid tool if the issues above are addressed. Another machine learning algorithm worth considering is Random Forest with feature permutation and cross-validation. Random Forest makes no assumptions about the data distribution, so right-skewed data are not a problem. The GLMM also imposes a linear relationship for each predictor, but phytoplankton growth is well established as a non-linear process, and the temperature response in particular rises to an optimum and then declines. Random Forest accounts for non-linearity automatically, whereas the GLMM may dismiss a real driver simply because its effect is not linear. As discussed above, the GLMM p-values are also unreliable due to autocorrelation and the non-Gaussian response. Feature permutation in Random Forest is a more direct measure of how much information a predictor carries about NPP, because it measures how much the model's accuracy degrades when that variable is shuffled, rather than relying on a coefficient and standard error that depend on assumptions the GLMM violates.
It is important to reiterate that Random Forest does not resolve every issue. Autocorrelation would still need to be handled through blocked (temporal) cross-validation so that adjacent, correlated months are not split across training and testing. The circularity among SST, euphotic depth, and NPP would also persist with the current predictor set, regardless of model, and must be fixed by removing the VGPM-input variables. Finally, because the predictors here are collinear, standard permutation importance can distribute importance arbitrarily among correlated variables, so a correlation-aware variant would be needed for the rankings to be trustworthy.

7. The discussion and conclusions repeatedly attribute NPP variability to the PDO, and in one case the NPGO. The recovery after 2019 is tied to "the negative phase of the PDO that began in late 2019" (line 377), the 2014 to 2016 event to "a positive PDO and the transition to a negative NPGO" (line 339), and the conclusions state broadly that positive phases of MEI v2 and PDO suppress NPP (lines 379 to 380). However, the only climate index actually included in the methods and results is MEI v2. There is no PDO or NPGO time series, correlation, or any other analysis anywhere in the manuscript. These are causal claims that the analysis never tested.
The authors should either add PDO and NPGO to the methods and results, analyzing them against the zone metrics the same way MEI v2 was handled, or demote them to context and present them as plausible co-occurring modes rather than demonstrated drivers. If they are added, two cautions apply. PDO is partly an integration of ENSO forcing, so it is correlated with MEI v2 and should not be treated as an independent driver without addressing that collinearity. The ENSO and PDO teleconnections to this region are also concentrated in the cold season, so any test should be season-resolved rather than run on annual aggregates.

Minor Revisions:
Overall: Please fix the labels on the color bars of the NPP zones. They should read <900, 900-1700, >1700.
Line 9: “mid (MPZ; 900-1700…)” is repeated
Line 44: consider changing “mid and upper trophic levels” to “mid- and upper-trophic levels”
Line 49: When introducing the Gulf of CA, consider including longitude coordinates as well.
Line 50: It would be helpful to reference Figure 1 in the first sentence.
Line 73: Bahía de Banderas is not labeled on the map and unfamiliar readers may not know where this is. Consider rewording or adding the location to the map.
Line 85: Consider adding the longitudes to Cabo San Lucas and Cabo Corrientes
Line 86: Consider adding a quick overall summary of the data collection methods above the first sub-section.
Line 94: “Vertially” should be corrected to “Vertically”
Line 94: If applicable, this website would be more informative than the current ERDDAP listed one. https://orca.science.oregonstate.edu/1080.by.2160.monthly.hdf.vgpm.m.chl.m.sst.php
Line 107: Broken link. Please update. Is it possible you meant this one? https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description
Line 108: Consider changing the subsection title to “Data Regridding” or similar. Homogenization is not wrong but it is a bit vague.
Line 115: “(Hobday et al. 2016)” should be Hobday et al. (2016)
Line 117: “Bimonthly” can be ambiguous and mean either twice a month or every two months. Please consider clarifying or using a different term.
Line 138: “(Moore and McCabe, 1999)” should be “Moore and McCabe (1999)”. Please also consider reporting exactly how outliers were defined (1.5*IQR?) and how many/what percent of data points this excluded.
Line 139: k-means++ is an initialization method for the k-means algorithm. Consider changing to “...grouped into three clusters using the k-means algorithm with k-means++ initialization..”
Line 141: “Spatial distribution and seasonal patterns (cold season: December–May; warm season: June–November) were visualized” would be helpful to point to a figure here.
Line 151: URT and UAR seem to be undefined above.
Line 154: Please provide thresholds used. For example, Pearsons correlation >0.7.
Line 160: Is Zone a predictor? If so, that also becomes circular because NPP defines the zone. So the model would say “this pixel predicts high NPP because it is in the High NPP zone”
Line 176: The last sentence is a little ambiguous. Consider rewriting to something like “By dividing the Gulf into three productivity zones, we can evaluate how abnormal warming alters their physical size, their total carbon contribution, and the underlying physical forces that drive them.”
Line 179: This would be helpful to put a figure reference.
Line 179: Does “climatology” refer to the average annual seasonal cycle max and min? Or just the max and min across this specific time range?
Line 180: I believe “longitudinal” should be “latitudinal”.
Line 180: Do these numbers represent the climatological mean? or do these also include the cold and warm seasons? could be clarified by including the figure number/panel associated with the paragraph.
Line 182: Is the latitude value only for the archipelago? If so, why are the latitudes of the coasts previously listed left out?
Line 190: Is this the per-month average over 22 years? Why is this important? It might make more sense to compare high values to high-only climatology, etc. Also including a figure reference here would be helpful.
Line 195: at first read “zone-integrated” is a little confusing but upon subsequent reads it is clear what you are writing here. Consider rewriting to “NPP per year per productivity zone” or similar and refer to Figure 4, panel b.
Line 205: In Section 3.4, please ensure that spatial percentages (e.g., '% of gulf area') are clearly delineated from the productivity contribution percentages (e.g., '% of total NPP') used in Section 3.3 to avoid reader confusion.
Line 206: I believe it should be Fig. 5, not 4.
Line 207: Much of the paper discusses latitude, it would be helpful to include latitude on at least one of the maps in Figure 5.
Line 307: This is one of the situations where “winter” and “cold season” get a little unclear. Intuitively, 6 months in a near-tropical region is hard to classify as winter. Additionally, in 3.3 "winter HPZ was approx 54%” and the "cold-period" in 3.4 gave PHZ 44.2%. Please explicitly define winter versus cold period and remain consistent.
Line 311: Consider starting a new paragraph at “During winter..” and combining with the sentence below starting “Conversely, ..”
Line 326: “this physical mechanism” is a bit ambiguous.
Line 337: “tipping point” implies a point of no return. Consider changing to regime change or similar.
Line 354: The listed values are wind speeds, not wind stress.
Line 367: Does “Long-term trends” refer to your data set? Or others?
Line 367: Are the 50-100 mg C m^-2 d^-1 also per decade?
Line 371: (Maishal, 2024a) should not be in parenthesis.
Citation: https://doi.org/10.5194/egusphere-2026-1040-RC2

Eduardo González-Rodríguez, Juan David Acevedo-Acosta, Rafael Cervantes-Duarte, and Jaime Gómez-Gutiérrez

Viewed

Total article views: 1,313 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
597	644	72	1,313	60	98

HTML: 597
PDF: 644
XML: 72
Total: 1,313
BibTeX: 60
EndNote: 98

Views and downloads (calculated since 10 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	436	373	66	875
Apr 2026	94	100	0	194
May 2026	43	58	3	104
Jun 2026	20	12	3	35
Jul 2026	4	101	0	105

Cumulative views and downloads (calculated since 10 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	436	373	66	875
Apr 2026	94	100	0	194
May 2026	43	58	3	104
Jun 2026	20	12	3	35
Jul 2026	4	101	0	105

Viewed (geographical distribution)

Total article views: 1,220 (including HTML, PDF, and XML) Thereof 1,220 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 Jul 2026

Short summary

Using twenty years of satellite observations, this study explores what controls marine primary productivity in the Gulf of California. We found that sea surface temperature (SST) plays the crucial role, while mixed layer, euphotic zone, and nutrient availability also affect the most productive zone. From 2013 to 2019, productivity declined during periods of SST anomaly, marine heatwaves, and El Niño events. If these warm conditions continue, low-productivity waters could expand across the gulf.


Total:	0
HTML:	0
PDF:	0
XML:	0