Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets

Nellikkattil, Arjun Babu; O’Brien, Travis Allen; Lemmon, Danielle; Lee, June-Yi; Chu, Jung-Eun

doi:https://doi.org/10.5194/egusphere-2023-592

Preprints

https://doi.org/10.5194/egusphere-2023-592

Preprints

09 May 2023

| 09 May 2023

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets

Arjun Babu Nellikkattil, Travis Allen O’Brien, Danielle Lemmon, June-Yi Lee, and Jung-Eun Chu

Abstract. This study describes a generalized framework, Scalable Feature Extraction and Tracking (SCAFET) to extract and track features from large climate datasets. SCAFET utilizes novel shape-based metrics that can efficiently identify and compare features from different mean states, datasets, and between distinct regions. Features of interest are extracted by segmenting the data based on a scale-independent bounded variable called shape index (SI). SI gives a quantitative measurement of the local geometric shape of the field with respect to its surroundings. To demonstrate the capabilities of the method, we illustrate the detection of atmospheric rivers, tropical and extratropical cyclones, sea surface temperature fronts, and jet streams. Cyclones and atmospheric rivers are extracted from the ERA5 reanalysis dataset to show how the algorithm extracts both locations and areas from climate datasets. The extraction of sea surface temperature fronts exemplifies how SCAFET effectively handles curvilinear grids. Lastly, jet streams are extracted to demonstrate how the algorithm can also detect 3D features. SCAFET can be implemented to extract and track most weather and climate features.

Received: 28 Mar 2023 – Discussion started: 09 May 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 26304 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (26304 KB)

Supplement (16809 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

15 Jan 2024

Scalable Feature Extraction and Tracking (SCAFET): a general framework for feature extraction from large climate data sets

Arjun Babu Nellikkattil, Danielle Lemmon, Travis Allen O'Brien, June-Yi Lee, and Jung-Eun Chu

Geosci. Model Dev., 17, 301–320, https://doi.org/10.5194/gmd-17-301-2024,https://doi.org/10.5194/gmd-17-301-2024, 2024

Short summary

Arjun Babu Nellikkattil, Travis Allen O’Brien, Danielle Lemmon, June-Yi Lee, and Jung-Eun Chu

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-592', Anonymous Referee #1, 06 Jul 2023
The authors present SCAFET as a new framework to extract weather features from large climate datasets. SCAFET follows the standard paradigm of segment, filter, and track. The traditional approach to segmentation utilizes absolute thresholding of appropriate climate variables, which is well known to be sensitive to the particular model, climate state, and even spatial location. This makes it difficult to have a standardized detection algorithm that works uniformly across models and warming scenarios etc. SCAFET is instead shape-based, giving a relative and directional thresholding, making it a much more robust approach for weather feature identification. The authors demonstrate its utility by identifying and tracking atmospheric rivers, cyclones, fronts, and jet streams. SCAFET is a significant advance over the traditional absolute thresholding methods currently used by climate practitioners. With some minor revisions, see below, I recommend the manuscript for publication.
My main comment or question is related to how sensitive feature identification is to SCAFET parameters. You have shown that it is possible to identify weather features with SCAFET, which is great, but there is no discussion on how sensitive the results are. For example, how sensitive is the detection of ARs in Figure 4 to the parameters used in Table 1? On the one hand, it is intuitive to identify ARs as long, narrow shapes with (relatively) high IVT and precipitation. But concrete numbers must be used to implement that intuition. If you slightly change the SI threshold for Ridges, or the minimum length, or angle coherence, etc. does this totally change the kind of objects identified so that they no longer resemble ARs (I wouldn't think so, but perhaps), or does it slightly change the details of ARs detected? If it is the latter case, how did you decide on the exact values used in Table 1 for the best identification of ARs? I see there is one sentence, "The quantitative values for the properties are obtained from a consensus of previous studies referenced within each section." but I think this requires more elaboration.

My second question is, what are we supposed to take away from Section 4.1 on Jet Streams? It shows some proof-of-concept that the method can be applied, in principle, to 3D data. To my eye, I don't see a clear jet stream identified by SCAFET in (b), (d), and (f) of Figure 7. So while the method can be applied to 3D data, it is not clear that it is successful in identifying features in 3D data.

Third, while I think SCAFET is indeed a significant advance, I believe there are some statements made in the paper which are not justified, or I have misunderstood what you are trying to say.
Around line 40 there is discussion of dataset pre-processing, such as computing IVT fields for AR detection, and how this becomes infeasible for high resolutions and large ensembles. At first I read this as implying pre-processing as a downside of traditional methods, but something that SCAFET would bypass. However, SCAFET itself uses these pre-processed fields in the identification of ARs and cyclones.

Starting at the end of line 327, there is the sentence "Due to its design, SCAFET does not require a priori climate information to identify features." I am not sure what is meant by this sentence. In the work presented, the shape-based component is only one piece of the full pipeline to identify weather features. Most obvious, the shapes are extracted from pre-processed fields IVT and RV, which are created from "climate information". Even knowing what generic shapes are appropriate for particular weather features I see as climate information.

Finally, the writing and grammar etc. of the paper need to be cleaned up. Below are some instances I found during my reading:

line 15: "… and value (5Vs) of climate data (REFs) of climate data."

starting in line 162: "In the current study, a simple radius is defined and the closest object within the given radius to each object at time n is clustered and identified from time n+1 as the same object in motion." I get the general idea of what you are saying here, but I found this sentence hard to parse.

line 186-187: "…derive this threshold from dataset directly, …"

line 204-205: "… each object is used as to filter …"

line 274: "…, SSTFs are not tracked as ocean fronts are stationary rather than…"

in the middle of the Figure 6 caption, "In the next step, ridges, caps, and domes are extracted from (b) and weak and small…", do you mean for this to be (a) instead of (b)?

line 293-294: "Since the scope of this section is limited to the validation of the detection method, we have only shown jet detection in three selected time steps." I'm not fully sure what you are trying to say here. Do you mean that filtering and tracking steps have not been performed here?

Figure 7 caption: "The 3D jet streams extracted for the corresponding time period is show in …"

line 362-363: "change of direction of a along the curve."

In Figure A2, please adjust the legend so it can be read more clearly
Citation: https://doi.org/10.5194/egusphere-2023-592-RC1
- AC1: 'Reply on RC1', Arjun Nellikkattil, 20 Aug 2023
  
  We thank anonymous reviewer #1 for their comments and suggestions. Please refer to the attached pdf documents for a detailed response to the comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-592-AC1
RC2:
'Comment on egusphere-2023-592', Anonymous Referee #2, 10 Jul 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-592/egusphere-2023-592-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-592-RC2
- AC2: 'Reply on RC2', Arjun Nellikkattil, 20 Aug 2023
  
  We thank anonymous reviewer #2 for their comments. Please refer to the attached pdf document to see our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2023-592-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-592', Anonymous Referee #1, 06 Jul 2023
The authors present SCAFET as a new framework to extract weather features from large climate datasets. SCAFET follows the standard paradigm of segment, filter, and track. The traditional approach to segmentation utilizes absolute thresholding of appropriate climate variables, which is well known to be sensitive to the particular model, climate state, and even spatial location. This makes it difficult to have a standardized detection algorithm that works uniformly across models and warming scenarios etc. SCAFET is instead shape-based, giving a relative and directional thresholding, making it a much more robust approach for weather feature identification. The authors demonstrate its utility by identifying and tracking atmospheric rivers, cyclones, fronts, and jet streams. SCAFET is a significant advance over the traditional absolute thresholding methods currently used by climate practitioners. With some minor revisions, see below, I recommend the manuscript for publication.
My main comment or question is related to how sensitive feature identification is to SCAFET parameters. You have shown that it is possible to identify weather features with SCAFET, which is great, but there is no discussion on how sensitive the results are. For example, how sensitive is the detection of ARs in Figure 4 to the parameters used in Table 1? On the one hand, it is intuitive to identify ARs as long, narrow shapes with (relatively) high IVT and precipitation. But concrete numbers must be used to implement that intuition. If you slightly change the SI threshold for Ridges, or the minimum length, or angle coherence, etc. does this totally change the kind of objects identified so that they no longer resemble ARs (I wouldn't think so, but perhaps), or does it slightly change the details of ARs detected? If it is the latter case, how did you decide on the exact values used in Table 1 for the best identification of ARs? I see there is one sentence, "The quantitative values for the properties are obtained from a consensus of previous studies referenced within each section." but I think this requires more elaboration.

My second question is, what are we supposed to take away from Section 4.1 on Jet Streams? It shows some proof-of-concept that the method can be applied, in principle, to 3D data. To my eye, I don't see a clear jet stream identified by SCAFET in (b), (d), and (f) of Figure 7. So while the method can be applied to 3D data, it is not clear that it is successful in identifying features in 3D data.

Third, while I think SCAFET is indeed a significant advance, I believe there are some statements made in the paper which are not justified, or I have misunderstood what you are trying to say.
Around line 40 there is discussion of dataset pre-processing, such as computing IVT fields for AR detection, and how this becomes infeasible for high resolutions and large ensembles. At first I read this as implying pre-processing as a downside of traditional methods, but something that SCAFET would bypass. However, SCAFET itself uses these pre-processed fields in the identification of ARs and cyclones.

Starting at the end of line 327, there is the sentence "Due to its design, SCAFET does not require a priori climate information to identify features." I am not sure what is meant by this sentence. In the work presented, the shape-based component is only one piece of the full pipeline to identify weather features. Most obvious, the shapes are extracted from pre-processed fields IVT and RV, which are created from "climate information". Even knowing what generic shapes are appropriate for particular weather features I see as climate information.

Finally, the writing and grammar etc. of the paper need to be cleaned up. Below are some instances I found during my reading:

line 15: "… and value (5Vs) of climate data (REFs) of climate data."

starting in line 162: "In the current study, a simple radius is defined and the closest object within the given radius to each object at time n is clustered and identified from time n+1 as the same object in motion." I get the general idea of what you are saying here, but I found this sentence hard to parse.

line 186-187: "…derive this threshold from dataset directly, …"

line 204-205: "… each object is used as to filter …"

line 274: "…, SSTFs are not tracked as ocean fronts are stationary rather than…"

in the middle of the Figure 6 caption, "In the next step, ridges, caps, and domes are extracted from (b) and weak and small…", do you mean for this to be (a) instead of (b)?

line 293-294: "Since the scope of this section is limited to the validation of the detection method, we have only shown jet detection in three selected time steps." I'm not fully sure what you are trying to say here. Do you mean that filtering and tracking steps have not been performed here?

Figure 7 caption: "The 3D jet streams extracted for the corresponding time period is show in …"

line 362-363: "change of direction of a along the curve."

In Figure A2, please adjust the legend so it can be read more clearly
Citation: https://doi.org/10.5194/egusphere-2023-592-RC1
- AC1: 'Reply on RC1', Arjun Nellikkattil, 20 Aug 2023
  
  We thank anonymous reviewer #1 for their comments and suggestions. Please refer to the attached pdf documents for a detailed response to the comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-592-AC1
RC2:
'Comment on egusphere-2023-592', Anonymous Referee #2, 10 Jul 2023

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-592/egusphere-2023-592-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2023-592-RC2
- AC2: 'Reply on RC2', Arjun Nellikkattil, 20 Aug 2023
  
  We thank anonymous reviewer #2 for their comments. Please refer to the attached pdf document to see our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2023-592-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Arjun Nellikkattil on behalf of the Authors (18 Oct 2023) Author's response Author's tracked changes Manuscript

ED: Reconsider after major revisions (30 Oct 2023) by Simone Marras

ED: Publish as is (02 Nov 2023) by Simone Marras

AR by Arjun Nellikkattil on behalf of the Authors (05 Nov 2023) Author's response Manuscript

Journal article(s) based on this preprint

15 Jan 2024

Scalable Feature Extraction and Tracking (SCAFET): a general framework for feature extraction from large climate data sets

Arjun Babu Nellikkattil, Danielle Lemmon, Travis Allen O'Brien, June-Yi Lee, and Jung-Eun Chu

Geosci. Model Dev., 17, 301–320, https://doi.org/10.5194/gmd-17-301-2024,https://doi.org/10.5194/gmd-17-301-2024, 2024

Short summary

Arjun Babu Nellikkattil, Travis Allen O’Brien, Danielle Lemmon, June-Yi Lee, and Jung-Eun Chu

Supplement

https://doi.org/10.5194/egusphere-2023-592-supplement

Data sets

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets Arjun Babu Nellikkattil https://doi.org/10.5281/zenodo.7767301

Model code and software

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets Arjun Babu Nellikkattil https://doi.org/10.5281/zenodo.7767301

Arjun Babu Nellikkattil, Travis Allen O’Brien, Danielle Lemmon, June-Yi Lee, and Jung-Eun Chu

Viewed

Total article views: 602 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
407	173	22	602	35	12	14

HTML: 407
PDF: 173
XML: 22
Total: 602
Supplement: 35
BibTeX: 12
EndNote: 14

Views and downloads (calculated since 09 May 2023)

Month	HTML	PDF	XML	Total
May 2023	109	48	3	160
Jun 2023	28	10	1	39
Jul 2023	66	13	5	84
Aug 2023	70	14	5	89
Sep 2023	61	19	1	81
Oct 2023	36	27	3	66
Nov 2023	11	21	2	34
Dec 2023	21	18	2	41
Jan 2024	5	3	0	8
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	0
Jun 2024	0
Jul 2024	0
Aug 2024	0
Sep 2024	0

Cumulative views and downloads (calculated since 09 May 2023)

Month	HTML	PDF	XML	Total
May 2023	109	48	3	160
Jun 2023	28	10	1	39
Jul 2023	66	13	5	84
Aug 2023	70	14	5	89
Sep 2023	61	19	1	81
Oct 2023	36	27	3	66
Nov 2023	11	21	2	34
Dec 2023	21	18	2	41
Jan 2024	5	3	0	8
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	0
Jun 2024	0
Jul 2024	0
Aug 2024	0
Sep 2024	0

Viewed (geographical distribution)

Total article views: 598 (including HTML, PDF, and XML) Thereof 598 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 03 Sep 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (26304 KB)
Metadata XML

Short summary

The exponential increases in the climate and weather data demand computationally efficient and mathematically sound feature extraction algorithms to identify phenomenons such as atmospheric rivers, cyclones, sea surface temperature fronts, jet streams, etc. In this study, we present an innovative generalized framework for extracting two and three-dimensional features from gridded datasets using the local geometric shape of the input fields.


Total:	0
HTML:	0
PDF:	0
XML:	0

Scalable Feature Extraction and Tracking (SCAFET): A general framework for feature extraction from large climate datasets

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Supplement

Data sets

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.