the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Applicability of physics-based and machine-learning-based algorithms of geostationary satellite in retrieving the diurnal cycle of cloud base height
Abstract. Four distinct retrieval algorithms, comprising two physics-based and two machine-learning (ML) approaches, have been developed to retrieve cloud base height (CBH) and its diurnal cycle from Himawari-8 geostationary satellite observations. Validations have been conducted using the joint CloudSat/CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) CBH products in 2017, ensuring independent assessments. Results show that the two ML-based algorithms exhibit markedly superior performance (with a correlation coefficient of R > 0.91 and an absolute bias of approximately 0.8 km) compared to the two physics-based algorithms. However, validations based on CBH data from the ground-based lidar at the Lijiang station in Yunnan province and the cloud radar at the Nanjiao station in Beijing, China, explicitly present contradictory outcomes (R < 0.60). An identifiable issue arises with significant underestimations in the retrieved CBH by both ML-based algorithms, leading to an inability to capture the diurnal cycle characteristics of CBH. The strong consistence observed between CBH derived from ML-based algorithms and the spaceborne active sensor may be attributed to utilizing the same dataset for training and validation, sourced from the CloudSat/CALIOP products. In contrast, the CBH derived from the optimal physics-based algorithm demonstrates the good agreement in diurnal variations of CBH with ground-based lidar/cloud radar observations during the daytime (with an R value of approximately 0.7). Therefore, the findings in this investigation from ground-based observations advocate for the more reliable and adaptable nature of physics-based algorithms in retrieving CBH from geostationary satellite measurements. Nevertheless, under ideal conditions, with an ample dataset of spaceborne cloud profiling radar observations encompassing the entire day for training purposes, the ML-based algorithms may hold promise in still delivering accurate CBH outputs.
- Preprint
(4893 KB) - Metadata XML
-
Supplement
(4286 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-2843', Anonymous Referee #1, 10 Jan 2024
- The authors emphasized that the algorithms can capture the diurnal variation of cloud base height well, but it seems that only a case study based on several days of retrievals was presented. More experiments and discussions are needed to support this conclusion.
- Can you discuss the differences between the CPR/CALIOP and the ground-based lidar and cloud radarmeasurements, these will be known sources of bias in your comparisons.
- To better understand the diurnal variation of CBH, it is suggested to convert the observation time from UTC time to local time.
- Three of the four algorithms cannot retrieve nighttime CBHs, and the RF IR-only algorithm appears to have worse performance than other methods. Is it possible to obtain nighttime CBHs using physical-based algorithms? Please add some discussion in the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-2843-RC1 -
AC1: 'Reply on RC1', Min Min, 11 Jan 2024
Thanks for your comments and suggestions. We will make a full response after all the reviewers' comments come out. Here we would like to make a brief explanation for your comment about "the lack of multiple cases" (your first comment). In addition to the case in the main body, one-year results (in 2017) from Beijing Nanjiao station are provided as Figure S1 within the supplementary documentation.
Citation: https://doi.org/10.5194/egusphere-2023-2843-AC1
-
RC2: 'Comment on egusphere-2023-2843', Anonymous Referee #2, 18 Jan 2024
Comments on “Applicability of physics-based and machine-learning-based algorithms of
geostationary satellite in retrieving the diurnal cycle of cloud base height” submitted by
Mengyuan Wang et al.
CBH is an important parameter of cloud and the determination of the CBH is a meaningful and important work in atmospheric science. This submission compared the performance of four algorithms (2 physics-based and 2 machine-learning-based) using lidar and radar data from two stations in China.
My major comments are given below:
The data used for training and validation are not sufficiently big. It seems that the data for training and validation come from the same year 2017. In the Section 2 (Data) and the Section 4 (Result), it seems that there is no clear description of how long period of data are used for training and validation.
In the conclusion, it seems that the authors are delivering contradictory and confusing information in the conclusion. Line 564 “However, in stark contrast, the results from the physics-based algorithms are superior to those from the ML-based algorithms”. In line 590 “Note that the ML-based algorithms still demonstrate better CBH retrievals using the spaceborne joint CloudSat/CALIOP detection method”
Line 245-line 264, Section 3.1, this section introduces an algorithm but it is too brief. It is suggested to describe the algorithm in a more detailed way though the complete set of the algorithm is not necessary. For instance, what is the performance of the algorithm? How popular it is in the community?
Similar comments as Section 3.1 for the Section 3.2.
Line 184-190, Why the level-2 cloud products from FY satellite are used for the Operational H8/AHI Level-1B data? Can you explain? Is there any reason not using other satellites’ cloud products? What is the quality and property of the FY satellite cloud products compared to other satellites’ cloud products? Do the Operational H8/AHI Level-1B data have cloud products?
L195 “This validation is carried out by using analogous MODIS Level-2 cloud products as a reference” MODIS is a polar orbit satellite while the H8 is a geostationary satellite. The spatial and temporal overlaps of these two satellites are very limited. Is it suitable to use MODIS cloud data for H8 data?
Line 450-451 “The ground-based lidar data at Lijiang station on December 6, 2018, and January 8, 2019, are selected for validation.” Only two days of data are used for validation. This is too little. Normally a large dataset is needed in order to produce statistically meaningful validation result. For instance, one year of data is needed in order to see the seasonal effects.
Line 480-481, “CBHs from another ground-based cloud radar dataset covering the entire year of 2017 are also collected and used in this study.” It appears that the RF model training was also based the data from 2017. The radar data are also from 2017, same as the RF model training. It will be more statistically meaningful if temporally independent radar data (e.g. 2018) are used for the validation.
In Fig. 8, the Fig. 8(c), (d) results are obviously much worse than the (a) and (b). Why? Is it possible that the altitude information has not been trained/considered in the two ML algorithms? The number of data points in (a) and (b) are much smaller than (c) and (d), why? It is suggested to display the number of data points in each plot of Fig. 8. Add the number of data points in all the figures in the paper.
Overall, the quality of the figures is not good. They are somewhat blurred. The plotting format is not consistent among different figures.
Other minor comments are given below:
Line 71, CBH should be defined for the first time use
Line 113 “A recent study by (Yang et al., 2021) utilized” A recent study by Yang et al. (2021) utilized”
Similarly for line 118. Check all the other similar cases in the paper.
Line 303-304 “air mass 1 (air mass 1=1/cos(view zenith angle)), and air mass 2 (air mass 2=1/cos(solar zenith angle)).” Do you need the units for the air mass 1 and air mass 2? Here “ais mass 1” and “air mass 2” are air density or air mass?
Line 321 “581,783 matching points are selected from H8/AHI and CloudSat data for 2017.” What is the start of observation date and end of observation date of the H8/AHI and CloudSat data?
Line 335-343, it is a little awkward to see the result is placed at the end of the data section. It should be put into the Section 4: Result and Discussions.
Line 373, “the better” “better”
Line 393: “Unit = dBZ” “unit: dBZ”
Line 526 “It not surprised” rephrase this.
Line 529, “Theerfore” Therefore
Line 572, “near-perfect CBH results” remove the “near-perfect”
Line 587 “that machine learning (ML)-based algorithms are constrained by the size of their datasets.” I suggest the authors use more data to train the ML model in the revision.
Line 1073, “Same as Fig. 6” Maybe you meant “Same as Fig. 5”.
In Fig. 7, results for 4 algorithms (except the ML IR-single) are missing for the period 9 UTC to 22 UTC. Is this because of the unavailability of the data during the evening time? If yes, please add such a statement in the figure caption to explain.
Citation: https://doi.org/10.5194/egusphere-2023-2843-RC2 -
AC2: 'Reply on RC2', Min Min, 19 Jan 2024
Thanks for your comments and suggestions. We will make a full response after all the reviewers' comments come out. We will make detailed changes in response to your comments.
Citation: https://doi.org/10.5194/egusphere-2023-2843-AC2
-
AC2: 'Reply on RC2', Min Min, 19 Jan 2024
Status: closed
-
RC1: 'Comment on egusphere-2023-2843', Anonymous Referee #1, 10 Jan 2024
- The authors emphasized that the algorithms can capture the diurnal variation of cloud base height well, but it seems that only a case study based on several days of retrievals was presented. More experiments and discussions are needed to support this conclusion.
- Can you discuss the differences between the CPR/CALIOP and the ground-based lidar and cloud radarmeasurements, these will be known sources of bias in your comparisons.
- To better understand the diurnal variation of CBH, it is suggested to convert the observation time from UTC time to local time.
- Three of the four algorithms cannot retrieve nighttime CBHs, and the RF IR-only algorithm appears to have worse performance than other methods. Is it possible to obtain nighttime CBHs using physical-based algorithms? Please add some discussion in the manuscript.
Citation: https://doi.org/10.5194/egusphere-2023-2843-RC1 -
AC1: 'Reply on RC1', Min Min, 11 Jan 2024
Thanks for your comments and suggestions. We will make a full response after all the reviewers' comments come out. Here we would like to make a brief explanation for your comment about "the lack of multiple cases" (your first comment). In addition to the case in the main body, one-year results (in 2017) from Beijing Nanjiao station are provided as Figure S1 within the supplementary documentation.
Citation: https://doi.org/10.5194/egusphere-2023-2843-AC1
-
RC2: 'Comment on egusphere-2023-2843', Anonymous Referee #2, 18 Jan 2024
Comments on “Applicability of physics-based and machine-learning-based algorithms of
geostationary satellite in retrieving the diurnal cycle of cloud base height” submitted by
Mengyuan Wang et al.
CBH is an important parameter of cloud and the determination of the CBH is a meaningful and important work in atmospheric science. This submission compared the performance of four algorithms (2 physics-based and 2 machine-learning-based) using lidar and radar data from two stations in China.
My major comments are given below:
The data used for training and validation are not sufficiently big. It seems that the data for training and validation come from the same year 2017. In the Section 2 (Data) and the Section 4 (Result), it seems that there is no clear description of how long period of data are used for training and validation.
In the conclusion, it seems that the authors are delivering contradictory and confusing information in the conclusion. Line 564 “However, in stark contrast, the results from the physics-based algorithms are superior to those from the ML-based algorithms”. In line 590 “Note that the ML-based algorithms still demonstrate better CBH retrievals using the spaceborne joint CloudSat/CALIOP detection method”
Line 245-line 264, Section 3.1, this section introduces an algorithm but it is too brief. It is suggested to describe the algorithm in a more detailed way though the complete set of the algorithm is not necessary. For instance, what is the performance of the algorithm? How popular it is in the community?
Similar comments as Section 3.1 for the Section 3.2.
Line 184-190, Why the level-2 cloud products from FY satellite are used for the Operational H8/AHI Level-1B data? Can you explain? Is there any reason not using other satellites’ cloud products? What is the quality and property of the FY satellite cloud products compared to other satellites’ cloud products? Do the Operational H8/AHI Level-1B data have cloud products?
L195 “This validation is carried out by using analogous MODIS Level-2 cloud products as a reference” MODIS is a polar orbit satellite while the H8 is a geostationary satellite. The spatial and temporal overlaps of these two satellites are very limited. Is it suitable to use MODIS cloud data for H8 data?
Line 450-451 “The ground-based lidar data at Lijiang station on December 6, 2018, and January 8, 2019, are selected for validation.” Only two days of data are used for validation. This is too little. Normally a large dataset is needed in order to produce statistically meaningful validation result. For instance, one year of data is needed in order to see the seasonal effects.
Line 480-481, “CBHs from another ground-based cloud radar dataset covering the entire year of 2017 are also collected and used in this study.” It appears that the RF model training was also based the data from 2017. The radar data are also from 2017, same as the RF model training. It will be more statistically meaningful if temporally independent radar data (e.g. 2018) are used for the validation.
In Fig. 8, the Fig. 8(c), (d) results are obviously much worse than the (a) and (b). Why? Is it possible that the altitude information has not been trained/considered in the two ML algorithms? The number of data points in (a) and (b) are much smaller than (c) and (d), why? It is suggested to display the number of data points in each plot of Fig. 8. Add the number of data points in all the figures in the paper.
Overall, the quality of the figures is not good. They are somewhat blurred. The plotting format is not consistent among different figures.
Other minor comments are given below:
Line 71, CBH should be defined for the first time use
Line 113 “A recent study by (Yang et al., 2021) utilized” A recent study by Yang et al. (2021) utilized”
Similarly for line 118. Check all the other similar cases in the paper.
Line 303-304 “air mass 1 (air mass 1=1/cos(view zenith angle)), and air mass 2 (air mass 2=1/cos(solar zenith angle)).” Do you need the units for the air mass 1 and air mass 2? Here “ais mass 1” and “air mass 2” are air density or air mass?
Line 321 “581,783 matching points are selected from H8/AHI and CloudSat data for 2017.” What is the start of observation date and end of observation date of the H8/AHI and CloudSat data?
Line 335-343, it is a little awkward to see the result is placed at the end of the data section. It should be put into the Section 4: Result and Discussions.
Line 373, “the better” “better”
Line 393: “Unit = dBZ” “unit: dBZ”
Line 526 “It not surprised” rephrase this.
Line 529, “Theerfore” Therefore
Line 572, “near-perfect CBH results” remove the “near-perfect”
Line 587 “that machine learning (ML)-based algorithms are constrained by the size of their datasets.” I suggest the authors use more data to train the ML model in the revision.
Line 1073, “Same as Fig. 6” Maybe you meant “Same as Fig. 5”.
In Fig. 7, results for 4 algorithms (except the ML IR-single) are missing for the period 9 UTC to 22 UTC. Is this because of the unavailability of the data during the evening time? If yes, please add such a statement in the figure caption to explain.
Citation: https://doi.org/10.5194/egusphere-2023-2843-RC2 -
AC2: 'Reply on RC2', Min Min, 19 Jan 2024
Thanks for your comments and suggestions. We will make a full response after all the reviewers' comments come out. We will make detailed changes in response to your comments.
Citation: https://doi.org/10.5194/egusphere-2023-2843-AC2
-
AC2: 'Reply on RC2', Min Min, 19 Jan 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
414 | 144 | 34 | 592 | 49 | 20 | 26 |
- HTML: 414
- PDF: 144
- XML: 34
- Total: 592
- Supplement: 49
- BibTeX: 20
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1