Ocean Model Analysis and Prediction System version 4.1i (OceanMAPSv4p1i)
Abstract. The Ocean Model Analysis and Prediction System (OceanMAPS) is a short-range, near-global, eddy-resolving ocean forecasting system developed at the Bureau of Meteorology. OceanMAPS runs daily, producing 7-day forecasts of 3D prognostic fields of ocean currents, temperature, salinity and sea level anomalies (SLA’s). OceanMAPS is based on MOM5 ocean general circulation model and uses EnKF-C software for data assimilation. Consistent with the previous version of OceanMAPS, version v4p1i (OceanMAPSv4p1i), is based on a hybrid Ensemble Kalman Filter with 48 dynamic and 144 static members. However, OceanMAPSv4p1i employs a 1-day analysis cycle in place of the 3-day cycle in OceanMAPSv4p0i. OceanMAPSv4p1i utilises an asynchronous data assimilation of observations, including Sea Surface Temperature (SST; 2-hourly), SLA (12-hourly), and temperature and salinity profiles (daily). OceanMAPSv4p1i produces better performance in forecast skill and mean absolute error scores in Sea Level Anomaly, Sea Surface Temperature and subsurface Temperature. Improvements gained are greater in surface fields, such as sea level anomaly and sea surface temperature, which have less persistence and a greater tendency. A reduction of ~10 % in SST errors and a ~7–8 % reduction in SLA errors is demonstrated in forecast stats. OceanMAPSv4p1i forecasts also better represent mesoscale ocean eddies.
Overall assessment
This manuscript presents an update to the Ocean Model Analysis and Prediction System (OceanMAPS v4.1i), describing changes to the data assimilation cycle and evaluating forecast performance relative to the previous version (v4.0i). The transition to a higher-frequency assimilation cycle is a relevant development, and the system itself is of clear importance to the operational oceanography community.
The manuscript contains a substantial amount of detail and presents a broad evaluation of forecast performance. However, several aspects of the methodological description, justification of configuration choices, statistical evaluation, and interpretation require clarification or strengthening. Addressing these issues would significantly improve the clarity, rigour, and interpretability of the study.
A further concern relates to the level of novelty and its justification for publication. The primary change between v4.0i and v4.1i appears to be the adoption of a one-day analysis cycle (with BRT and NRT steps) in place of the previous three-day cycle. While this is an operationally relevant modification, it is not clear yet that this change alone constitutes a sufficiently substantial methodological advance to warrant publication in its current form, particularly given that the reported improvements are mixed. The results show clear gains for SST and SLA, but more limited improvements for temperature and a deterioration (or at best marginal change) in salinity. In its current form, the manuscript reads primarily as a system upgrade report rather than a clear demonstration of a scientific advance. To strengthen the contribution, the authors should more clearly articulate what new understanding is provided, or provide deeper analysis that links the changes in configuration to the observed performance differences.
Recommendation: Major revisions are required. The manuscript describes an important operational system, but improvements are needed in the clarity of the system description, justification of methodological choices, consistency of assumptions, statistical rigour of the evaluation, and balance of interpretation. The novelty of the changes and the strength of the results should also be more clearly justified. Addressing these issues would substantially strengthen the manuscript and clarify its scientific contribution.
Major comments
The description of the data assimilation cycle and forecast system (particularly Section 2.5 and Figure 1) is difficult to follow in its current form. The manuscript combines descriptions of v4.1i and v4.0i in a way that assumes familiarity with the earlier system, which makes it challenging to reconstruct how the new system actually operates.
It would improve clarity to present a clean, self-contained description of v4.1i, followed by a separate comparison with v4.0i.
Several aspects remain unclear throughout Section 2.5, including:
The terminology used (e.g., “best-estimate”, “catch-up runs”, “batching”) is also ambiguous and sometimes colloquial, making the workflow difficult to follow.
A schematic diagram with a clearly explained caption of the analysis steps, along with a step-by-step description of each stage of the system, would greatly improve clarity.
Figure 1 helps explain the elements of the forecast. But the very brief caption doesn’t help the reader understand what’s shown.
The configuration includes a large number of parameters and empirical factors (e.g., observation errors, R-factors, K-factors, inflation, alpha parameter, flux perturbations), but there is little explanation of how these values were chosen.
For example:
It is not clear whether these values are:
Without justification, it is difficult to assess whether these choices are appropriate or transferable to other systems. The manuscript would benefit from either supporting evidence (e.g., sensitivity studies) or a discussion of how these parameters were determined.
The treatment of observation errors appears inconsistent across variables.
At L226, assumed errors for subsurface temperature and salinity are very large (0.5oC or 0.075 psu) compared to instrumental uncertainties (e.g., Argo errors are typically ~0.002oC and ~0.01 psu). This suggests the inclusion of representation error, but this is not explicitly stated or justified.
By contrast, at L227, SST and SLA errors are taken directly from observational data files. These values likely reflect measurement uncertainty and do not include representation error associated with the model resolution. These values aren’t explicitly stated reported in the manuscript.
This implies that:
This inconsistency requires clarification. It may also help explain why the system shows improvement for SST and SLA but weaker performance for subsurface temperature and salinity.
The static ensemble is constructed from a ~1o resolution model (L207), while the localisation radius is set to 150–175 km (L224). What 1o resolution model was used? No details are provided in the paper.
At this coarse resolution:
Given that the Gaspari and Cohn taper goes to zero at the specified radius, this implies that:
This raises the possibility that the assimilation of SST and SLA is effectively reduced to a largely vertical projection at each location, rather than a fully spatially distributed adjustment.
The manuscript refers to quasi-dynamical consistency (L77), but it is not clear how this is achieved under these constraints. The relationship between the static ensemble resolution, localisation scale, and resulting covariance structure should be explained more clearly.
The evaluation period spans only 5.5 months (L296). Within this period:
This suggests that the experiment period is too short to draw robust conclusions.
A longer evaluation period (at least one full annual cycle) would allow:
If longer experiments are not feasible, the limitations of the short period should be clearly acknowledged, and conclusions framed more cautiously.
The assessment relies on globally averaged statistics and visual comparisons, but no formal statistical tests are applied.
The manuscript frequently refers to “significant” improvements, but:
This is particularly important given that some reported differences are small (e.g., L324, L446).
The manuscript would benefit from:
Without this, it is difficult to determine whether reported improvements are robust or within sampling variability. Other studies comparing forecast statistics have applied such tests and been able to more rigorously distinguish between performance that is statistically significant, and performance that is not statistically significant. See, for example, Oke and Rykova 2025; https://doi.org/10.3389/fmars.2025.1729116
The reported improvements are mixed:
However, the interpretation tends to emphasise improvements without fully addressing the weaker performance in salinity.
In addition, the manuscript focuses on describing differences in performance but provides limited explanation of why these differences occur.
For example:
A more process-based interpretation would strengthen the scientific contribution.
Most results focus on 1-day forecasts. Over such a short lead time, forecast skill may be difficult to distinguish from persistence (i.e., using the last analysis as a forecast). Figures 11 and 12 are the only figures that consider forecast lead times longer than a day.
It would be useful to include:
This would better demonstrate the value of the dynamical system.
Detailed comments
SST and SLA forecasts improve, but salinity forecasts deteriorate and temperature differences are modest. The overall evidence for improvement is therefore mixed.
L19: Clarify whether sea-level anomaly is treated as a surface field or an integrated quantity of the water column.
L54: Since the Bluelink ReANalysis (BRAN) was used as a testbed for OceanMAPS for an extended period, this should be stated explicitly.
L81: Clarify whether the new version demonstrates improved skill associated with state-dependent, time-evolving error covariance. Maybe some examples of state-dependent covariance fields would help.
L113: SST is defined multiple times.
L179: Provide justification for the multipliers used to perturb atmospheric forcing. Were these based on estimates of forcing uncertainty? Or trial and error?
L207: Describe what model was used to construct the static ensemble. Given that the operational model is eddy-resolving (0.1o), it is not clear why a 1o model is used for this purpose.
L207–220: The description of the assimilation cycle is difficult to follow. A clearer step-by-step description or schematic would help. The description is concise, but unclear.
L224: Localisation radius of 150-175 km is small relative to the 1o ensemble grid. The implications should be discussed explicitly.
L226: Assumed observation errors for subsurface temperature and salinity are large compared to instrument accuracy. If intended to represent representation error, this should be clearly stated and justified.
L227: Clarify SST and SLA observation error values and why representation error is not included.
L229: Clarify whether “grid cell” refers to the 0.1o model grid or the 1o ensemble grid.
L237: An R-factor of 6 is applied to temperature and salinity. Clarify how this modifies the effective observation errors.
L240: Combined use of observation error, R-factor, and K-factor suggests potentially very large effective errors. It would help to report typical values actually used. Maybe a histogram.
L264: Specify which climatology is used for salinity restoring. The reference is to 2003. But many more modern climatologies are available.
L269–292 and Figure 1: The forecast system description is unclear. Clarify how many forecasts are produced and how ensemble members contribute.
L271–276: Clarify the roles of each task (BRT analysis, hindcast, NRT analysis, catch-up runs).
Figure 1: The number of ensemble members and forecasts is unclear. It looks like 48 dynamical members (but only used for the assimilation) and just 4 forecasts.
Section 2.5: Would benefit from reorganisation for clarity rather than conciseness.
L296: A longer experiment period would allow evaluation across seasons.
L299: Clarify whether surface flux perturbations are identical between systems (v4.0i and v4.1i).
L305: Specify depth range for temperature and salinity comparisons.
Section 3.1: Claims of “significant” differences are not supported by statistical testing.
Figures 2–5: Inclusion of uncertainty estimates (e.g., error bars) would help interpretation.
L333: Use consistent terminology for SLA.
L339: Clarify meaning of “consistent”.
L340: Clarify interpretation of “minimal corrections”. Are the increments smaller in v4.1i compared to v4.0i?
L345: Seasonal effects suggest need for longer evaluation period.
L345: Averaging over the full water column obscures vertical structure; depth-resolved analysis would be more informative (e.g., contour plots of MAE on time vs depth plots).
L349: Clarify whether improvements reflect seasonal variation or changes in the system itself.
Section 3.2 and Figures 6–9: Many differences are small; it is unclear which are statistically meaningful.
Figure 10: Including a third category for each case, reporting the number of bins with no statistically significant difference. There is a lot of white in the bottom panels of Figures 2-5, implying many bins with no significant difference.
Figures 11–12: Consider testing whether error growth differences are statistically significant (test the statistical difference of the trend in each plot).
L442: In some cases error growth appears faster in v4.1i; this should be acknowledged.
L483: Statement of overall superiority is not consistent with salinity results.
Figure 13: Provides limited additional insight and could be reconsidered.
Figure 14: The standard deviation of salinity (in Figure 14d) is large (!0.4 psu for v4.1, ~0.32 psu for v4.0i, ~0.25 psu for the observations). Perhaps salinity is unconstrained. This would be consistent with the large assumed observation errors and degradation of salinity in v4.1i. This should be discussed.
L511: Avoid colloquial phrasing.
L539: The manuscript would benefit from the inclusion of example oceanographic fields, rather than relying exclusively on statistical comparisons. At present, the evaluation is entirely based on summary metrics, which provide limited insight into the actual structure and realism of the model output. In particular, the reader is not given a clear sense of the level of detail achieved in the “mesoscale ocean representation,” which is a stated objective of the system.
Including even a single illustrative case study would improve the paper. For example, the authors could present SST and SLA fields over a 7-day forecast period for a dynamically active region (e.g., the Tasman Sea), comparing forecasts from the new and previous versions against a suitable verifying analysis (such as OceanCurrent or another independent product). Selecting a specific event - such as an eddy interaction, eddy merger, or variability in the separation of a western boundary current - would provide a concrete demonstration of system performance.
Such examples would complement the statistical analysis and allow readers to visually assess how differences between system configurations translate into changes in the representation of ocean features. At present, the absence of any oceanographic fields limits the interpretability of the results and makes it difficult to assess the practical impact of the reported improvements.
L551–558: Consider whether this material is necessary for the paper.