Using machine learning algorithms to analyze remote sensing and ground-truth Lake Chad&rsquo;s level data

Djimadoumngar, Kim-Ndor

doi:https://doi.org/10.5194/egusphere-2022-427

Preprints

https://doi.org/10.5194/egusphere-2022-427

Preprints

05 Jul 2022

| 05 Jul 2022

Using machine learning algorithms to analyze remote sensing and ground-truth Lake Chad’s level data

Kim-Ndor Djimadoumngar

Abstract. Lake Chad is facing critical environmental situations since the 1960s due to the effects of climate change and anthropogenic activities on its ecosystems. The statistical analyses of remote sensing climate variables (i.e., evapotranspiration, specific humidity, soil temperature, air temperature, precipitation, soil moisture) and remote sensing and ground-truth lake level applied to the period 1993–2012 reveal that remote sensing lake level data has a skewed distribution and positive significant association with only soil moisture, whereas ground-truth lake level has a symmetrical distribution and negative significant associations with all the climate variables. The regression of remote sensing and ground-truth lake level onto climate variables using Linear Regression (LR), Support Vector Regression (SVR), Regression Tree (RT), Random Forest Regression (RF), and Deep Learning (DL) methods show that (i) RF outperforms the other models with the highest coefficient of determination (R²) and explained variance score (EVS) values and (ii) SVR has the lowest Mean Absolute Error (MAE), Mean Squared Error (MSE), and k-fold cross-validation (k-fold CV) values. The RF feature ranking function shows that soil temperature is the major driver of remote sensing lake level fluctuations, whereas precipitation is the first factor for ground-truth lake level. This study provides more in-depth knowledge of the factors influencing Lake Chad’s level and perspectives for an integrated and forward-looking water management system for connecting climate change, vulnerability, human activities, and water balance research in the Lake Chad human-environment system. We cannot get the necessary ground truth data at this time because of the challenging security situations in the region. However, the development of the data analysis methodology reported here is of fundamental importance in understanding the water cycle dynamics in this important basin, even under challenging field conditions. Verification studies can be performed when more ground-truth data eventually become available.

Received: 02 Jun 2022 – Discussion started: 05 Jul 2022

Download & links

Kim-Ndor Djimadoumngar

Status: closed

RC1:
'Comment on egusphere-2022-427', Anonymous Referee #1, 28 Jul 2022
In summary this article runs two parallel investigations into the relationship between a selection of atmospheric and land quantities, and two methods of measuring the height of Lake Chad, one form of data is measured in-situ while the other is measured from a remote sensing platform. The collection of data, and the data processing was transparent and thoroughly documented. The investigation then applied a series of out-of-the-box approaches at their default values to regress two lake heights onto the climate variables; a lengthy study of the comparison was made, and some light scientific conclusions from the point of view of relative importance of contributions from different quantities to the lake heights. Some patchy analysis and rough conclusions were drawn to suggest appropriate algorithms for the regression.

I believe the paper was trying to fit within the following scopes of GMD: (i) "new methods for assessment of models, including work on developing new metrics for assessing model performance and novel ways of comparing model results with observational data" and (ii) papers describing new standard experiments for assessing model performance or novel ways of comparing model results with observational data. Unfortunately I do not believe this paper fits within these categories, without major rewriting, and I describe this along with further broad reasons below, along with some suggested changes:

Motivation: The paper is (partially) motivated by saying physical models are not used due to data scarcity, and this is why data driven approaches may be a way forward. And yet in regimes of scarce data, this is precisely where physical models excel, as they can use physics to generalize off-data, while data driven models require far more good quality data. Goals of the investigation was also to "contribute to the general understanding of hydrological processes in the Lake Chad basin", though no scientific conclusions were made in this paper, it was primarily focused on comparing machine learning tools and data exploration.

Novelty of methods and assessment: The assessment came from a series of standard statistical measures such as $R^2$ or MSE. Likewise the methods all came from the standard libraries of Sci-kit learn. The methods were taken with their default values and were not tuned to problem performance. The DL method had a more in-depth overview of the construction, but performed very poorly in all categories without explanation, (perhaps from a lack of data or lack of size/layers of the relatively modest size of DL).

Training: For many of these methods, performance is heavily dependent on tuning parameters. In a case where this parameter space is not explored, it is difficult to know if statements of performance apply to the methods themselves, or the quality of the packages default options. For example the clear overfitting to training data of RT, and likely overfitting of RF could possily be improved with parameter choices?

Results: Some results were repeated, e.g. Table 7 summarizes the performance, but Figure 5,6,7 repeat this data with no additional insights gained. Throughout, the fit to training data was given as evidence for performance. In some cases, e.g. Figure 9, this even changes the conclusions - that RF is quoted several times as being considered a better model for the data than LR, despite LR giving consistently lower test errors in both cases. It was not clear that Table 9 is only available in RF; furthermore, other forms of sensitivity anaylsis or attribution analyses were not carried out for SVR or LR to see if these results were consistent or based on the ML tool chosen.

Conclusions: I did not understand many of the conclusions. (1) I believe the LR was shown to be more performant than RF on test data, though the authors state RF is a preferred technique (2) I did not understand as to what we should conclude from the use remote sensing and ground-truth data, I feel the authors merely indicated that data is useful for validation, this was consistent through as I could not undertand what we were supposed to draw from the parallel investigations, nor what the results helped to explain in this regard (3) I did not see an explanations for Table 9, arguably an important conclusion of why we find different attributions of importance to the different data sources, and whether anything can be learnt from this.

For consideration for publication I would suggest

Clearer referenced motivations for why it is a good idea to consider data driven approaches even in areas where data quality is poor.

Making the data exploration more concise, e.g. is showing the calculations of interquartile ranges necessary?

Critical presentation of results: I do not believe Figure 4(c),4(d) are necessary. Enhancing readability of Table 5 by splitting targets into new table and use of boldface to enhance useful comparisons. I do not believe Figure 5, 6 or 7 are necessary, nor their analysis beyond what is described in table 7. Need for log scales in Figure 8, and the error in $R^2$ values (negative?). Removal of the confusing Figure 9, as in all other plots training and test data are separate, here they are combined.

New results for robustness: (1) Ensuring that reasonable exploration of Scikit-learn parameter spaces are reported on for each tool were made to ensure robust method performance. (2) Sensitivity analysis for other methods suchLR and SVR to compare with Table 9.

Rewritten clear conclusions evaluating the success of the author's own goals laid out in the introduciton: This should include explanation of why these parallel investigations were run, what are the consequences of the results in this respect. Explanation backed up by the robust ML results of which methods are best suited to the data set. Scientific explanation or discussion of why different data lead to different attributions backed up by robust evidence from the multiple methods such as LR, SVR, and RF attribution analysis.

Discussion: Outlook on the steps and challenges that are required to make predictions and projections with such models, what scientific steps could be taken on the back of this investigation regarding climate variable attribution, or the use of remote/in-situ data.

General improvements to the formatting of tables, figures to enhance readibility, with more detailed captions. Use of footnotes for URLs. rather than keeping them inline in text
Citation: https://doi.org/10.5194/egusphere-2022-427-RC1
- AC1: 'Reply on RC1', Kim-Ndor Djimadoumngar, 23 Aug 2022
  
  Dear Referee,
  Thank you very much for taking your time to review my manuscript. Your comments and suggestions are highly appreciated. I have taken into consideration your suggested methods and re-analyzed the dataset. I will soon submit a revised manuscript with some responses to your comments and questions.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC1
CEC1:
'Comment on egusphere-2022-427', Juan Antonio Añel, 16 Aug 2022

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our Code and Data Policy.

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your data and code in Dryad; however, the archived assets present several problems.
First, for the code, I have not found a license listed. If you do not include a license, the code continues to be your property and can not be used by others, despite any statement on being free to use. Therefore, when uploading the code to the repository, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licensesk/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.
Second, it is unclear that all the code used for your work is included in the repository. Please, make it clear, for example, by adding a Readme file that explains step by step all the computations or implementations described in the manuscript (e.g. MLR, SVR, RF...) and where (in what files) they can be found in the repository.
Also, you have uploaded your Python notebooks as .docx files. This is a weird way of sharing them. Indeed, the advantage of a python notebook is that anyone can download and use it directly. Moreover, .docx is a privative format, and compatibility of this format is not fully assured, so if you were to use a plain text format to share the code, it would be better to use .txt or OpenDocument Format (.odt). Notwithstanding, in this case, I encourage you to take advantage of the fact of using a notebook and share your ipynb files.
The same applies to the spreadsheet containing data. XLS and XLSX are not standard formats. Please, save the data in a format that assures future accessibility to the files (e.g. .ods, .csv, .dat)
Therefore, please, publish your code following the above-mentioned instructions, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2022-427-CEC1
- AC2:
  'Reply on CEC1', Kim-Ndor Djimadoumngar, 26 Aug 2022
  
  Dear Executive Editor,
  I sincerely apologize for a such delay in relying to your comment. I live in a technically and technologically very challenged area, with frequent power blackouts and difficult or poor internet connectivity. I also apologize for the file formats.
  I am sorry for my misunderstanding about the data policies. I thought the data are only shared with Copernicus/GMD reviewers at this stage of peer review process. They will be public if the manuscript is accepted. They are currently set for private access prior to publication. Dryad will share with you for the review process. If the mansucript is accepted, I will uncheck the box for private for peer review. That is my understanding.
  I have been working on your suggestions above despite power and internet challenges. I hope it will not be too late.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC2
  - CEC2: 'Reply on AC2', Juan Antonio Añel, 26 Aug 2022
    
    Dear Kim-Ndor,
    Thanks for your reply. Unfortunately, you are wrong. In our Discussions process, anyone can be a reviewer, not only those invited by editors. Therefore, the code must be available to anyone. We will reject your manuscript if you do not provide the code fully open at this stage. Indeed, it should have never been published in Discussions with such shortcomings; This was an oversight by the topical editor, and we apologize for it.
    Regards,
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2022-427-CEC2
    
    AC3: 'Reply on CEC2', Kim-Ndor Djimadoumngar, 26 Aug 2022
    
    Dear Executive Editor,
    Thank you very much for this clarification. I am working on that.
    However, I have an intellectual propriety concern and a question. If anyone can be a reviewer, do you guarantee that my idea and dataset will not be taken by someone else, twisted, and published before me in case you reject my manuscript at the end of the peer review process? I had to go to the shores of Lake Chad to collect the ground-truth coordinates and request ground-truth lake level data from local institutions for this manuscript. It is just to let you know the work I have accomplished to this level to not accept to be a victim of idea/intellectual propriety theft.
    Respectfully,
    Kim-Ndor
    
    Citation: https://doi.org/10.5194/egusphere-2022-427-AC3
RC2:
'Comment on egusphere-2022-427', Anonymous Referee #2, 31 Aug 2022

General comment:

The stated goals of this paper are to determine how accurately remote sensing data can help study ground-truth Lake Chad hydrologic data and what machine learning models maybe of best use to analyze both remote sensing and ground-truth Lake Chad hydrologic data. While the reviewer agrees with the motivations of the study (water availability in vulnerable regions and data-driven models where observations are scarce), in view of the multiplicity of the study aims it is difficult to identify the new contribution of this work to the literature. This lack of clarity in the aims of this work is evident throughout the paper. Consequently, I recommend rejection of the paper to allow the author sufficient time to revise the paper and focus it on clear objectives that demonstrates awareness of previous work and that are reflected in the methods of analysis and interpretation and discussion of the findings. Specific comments follow below.

Specific comments:

– The contextual background in the introduction is inadequate. This is evident also in the sparse references. This problem is related to the lack of clarity in the aims of the study. The author does not articulate the state-of-the-art of remote sensing of this lake, or of other lakes in general, in order to put this work in the context of previous work. What do we know about remote sensing of Lake Chad water level, or water level in other lakes? What do we need to know to advance knowledge of the hydrology of this lake? How exactly does this work advance the state-of-the-art?

The author has not provided the physical background on the main drivers of lake water level changes, i.e., are these changes dominated by heat flux changes or mass changes? The author states that (line 52–54) “we assume that Lake Chad’s level is a function of precipitation, soil moisture, air temperature, soil temperature, evapotranspiration, and specific humidity factors:” what are the physical basis for these assumptions? Can the author explain, for example, how soil temperature or soil moisture controls Lake Chad water level? Such explanations can be useful for evaluating the validity of statements such as (line 54) “precipitation is the only and most important climate variable on which all other climate variable variations depend.”

– The author has not provided enough information about the quality of the input data sets (GPCC, GLDAS, and remote sensing lake levels) used in this study. Are these data sets validated for other lakes? What is the spatial resolution of the ground-truth lake level (i.e., spatial sampling distance) relative to the spatial resolution of the satellite-derived lake levels and relative to the lake surface area? It is difficult to evaluate the presented statistical analysis without a detailed description of the input data set characteristics.

– The discussion and interpretation of the results of the statistical analysis are poor. For example, there is no discussion of the physical relationships underlying one of the main findings presented in Table 9; there is no discussion of the limitations of the input data sets; there is no discussion of the findings in the context of previous findings in this lake or other lakes; and there is no discussion of why Random Forest Regression and Support Vector Regression outperforms other algorithms. Without these discussions, it is difficult to determine the limits of applicability and usefulness of this work.

– There are too many acronyms, and this burdens the reader to retain them all while reading the paper. Please remove acronyms from the abstract; use acronyms only when the phrase occurs more than three times; or else, define the acronyms in the section where they are used.

– The figures and tables are of poor quality and overly verbose. Please use consistent font size and shade in the tables and figures (e.g., Figures 6 and 7); reduce words in the tables (e.g., Table 2 and 3); use consistent labeling in all the panel figures (top-left, top-right, bottom-left or bottom-right).

Other comments:

– Line 123: “Remote sensing lake level data is processed at latitude 13.02 and longitude 14.38:” please specify the meaning of this sentence.

– Line 68: please put the direction (N, S, E, W) after every latitude or longitude i.e., 6ºN and 20ºN.

– Please remove the word “we” throughout the manuscript.

References:

Kuhwald, Katja & Oppelt, Natascha. (2016). Remote sensing for lake research and monitoring – Recent advances. Ecological Indicators. 64. 105-122. 10.1016/j.ecolind.2015.12.009.

Policelli, Frederick & Hubbard, Alfred & Jung, Hahn Chul & Zaitchik, Ben & Ichoku, Charles. (2018). A predictive model for Lake Chad total surface water area using remotely sensed and modeled hydrological and meteorological parameters and multivariate regression analysis. Journal of Hydrology. 568. 10.1016/j.jhydrol.2018.11.037.

Wenbin, Zhu & Jia, Shaofeng & Lall, Upmanu & Cao, Qing & Mahmood, Rashid. (2018). Relative contribution of climate variability and human activities on the water loss of the Chari/Logone River discharge into Lake Chad: A conceptual and statistical approach. Journal of Hydrology. 569. 10.1016/j.jhydrol.2018.12.015.

Citation: https://doi.org/10.5194/egusphere-2022-427-RC2
- AC4: 'Reply on RC2', Kim-Ndor Djimadoumngar, 06 Sep 2022
  
  Dear Referee,
  Thank you very much for taking the time to review my manuscript and make comments and suggestions.
  I have taken into consideration the suggested methods from Referee #1 to re-analyze the study. I will add your comments and suggestions for the revised manuscript.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC4

Status: closed

RC1:
'Comment on egusphere-2022-427', Anonymous Referee #1, 28 Jul 2022
In summary this article runs two parallel investigations into the relationship between a selection of atmospheric and land quantities, and two methods of measuring the height of Lake Chad, one form of data is measured in-situ while the other is measured from a remote sensing platform. The collection of data, and the data processing was transparent and thoroughly documented. The investigation then applied a series of out-of-the-box approaches at their default values to regress two lake heights onto the climate variables; a lengthy study of the comparison was made, and some light scientific conclusions from the point of view of relative importance of contributions from different quantities to the lake heights. Some patchy analysis and rough conclusions were drawn to suggest appropriate algorithms for the regression.

I believe the paper was trying to fit within the following scopes of GMD: (i) "new methods for assessment of models, including work on developing new metrics for assessing model performance and novel ways of comparing model results with observational data" and (ii) papers describing new standard experiments for assessing model performance or novel ways of comparing model results with observational data. Unfortunately I do not believe this paper fits within these categories, without major rewriting, and I describe this along with further broad reasons below, along with some suggested changes:

Motivation: The paper is (partially) motivated by saying physical models are not used due to data scarcity, and this is why data driven approaches may be a way forward. And yet in regimes of scarce data, this is precisely where physical models excel, as they can use physics to generalize off-data, while data driven models require far more good quality data. Goals of the investigation was also to "contribute to the general understanding of hydrological processes in the Lake Chad basin", though no scientific conclusions were made in this paper, it was primarily focused on comparing machine learning tools and data exploration.

Novelty of methods and assessment: The assessment came from a series of standard statistical measures such as $R^2$ or MSE. Likewise the methods all came from the standard libraries of Sci-kit learn. The methods were taken with their default values and were not tuned to problem performance. The DL method had a more in-depth overview of the construction, but performed very poorly in all categories without explanation, (perhaps from a lack of data or lack of size/layers of the relatively modest size of DL).

Training: For many of these methods, performance is heavily dependent on tuning parameters. In a case where this parameter space is not explored, it is difficult to know if statements of performance apply to the methods themselves, or the quality of the packages default options. For example the clear overfitting to training data of RT, and likely overfitting of RF could possily be improved with parameter choices?

Results: Some results were repeated, e.g. Table 7 summarizes the performance, but Figure 5,6,7 repeat this data with no additional insights gained. Throughout, the fit to training data was given as evidence for performance. In some cases, e.g. Figure 9, this even changes the conclusions - that RF is quoted several times as being considered a better model for the data than LR, despite LR giving consistently lower test errors in both cases. It was not clear that Table 9 is only available in RF; furthermore, other forms of sensitivity anaylsis or attribution analyses were not carried out for SVR or LR to see if these results were consistent or based on the ML tool chosen.

Conclusions: I did not understand many of the conclusions. (1) I believe the LR was shown to be more performant than RF on test data, though the authors state RF is a preferred technique (2) I did not understand as to what we should conclude from the use remote sensing and ground-truth data, I feel the authors merely indicated that data is useful for validation, this was consistent through as I could not undertand what we were supposed to draw from the parallel investigations, nor what the results helped to explain in this regard (3) I did not see an explanations for Table 9, arguably an important conclusion of why we find different attributions of importance to the different data sources, and whether anything can be learnt from this.

For consideration for publication I would suggest

Clearer referenced motivations for why it is a good idea to consider data driven approaches even in areas where data quality is poor.

Making the data exploration more concise, e.g. is showing the calculations of interquartile ranges necessary?

Critical presentation of results: I do not believe Figure 4(c),4(d) are necessary. Enhancing readability of Table 5 by splitting targets into new table and use of boldface to enhance useful comparisons. I do not believe Figure 5, 6 or 7 are necessary, nor their analysis beyond what is described in table 7. Need for log scales in Figure 8, and the error in $R^2$ values (negative?). Removal of the confusing Figure 9, as in all other plots training and test data are separate, here they are combined.

New results for robustness: (1) Ensuring that reasonable exploration of Scikit-learn parameter spaces are reported on for each tool were made to ensure robust method performance. (2) Sensitivity analysis for other methods suchLR and SVR to compare with Table 9.

Rewritten clear conclusions evaluating the success of the author's own goals laid out in the introduciton: This should include explanation of why these parallel investigations were run, what are the consequences of the results in this respect. Explanation backed up by the robust ML results of which methods are best suited to the data set. Scientific explanation or discussion of why different data lead to different attributions backed up by robust evidence from the multiple methods such as LR, SVR, and RF attribution analysis.

Discussion: Outlook on the steps and challenges that are required to make predictions and projections with such models, what scientific steps could be taken on the back of this investigation regarding climate variable attribution, or the use of remote/in-situ data.

General improvements to the formatting of tables, figures to enhance readibility, with more detailed captions. Use of footnotes for URLs. rather than keeping them inline in text
Citation: https://doi.org/10.5194/egusphere-2022-427-RC1
- AC1: 'Reply on RC1', Kim-Ndor Djimadoumngar, 23 Aug 2022
  
  Dear Referee,
  Thank you very much for taking your time to review my manuscript. Your comments and suggestions are highly appreciated. I have taken into consideration your suggested methods and re-analyzed the dataset. I will soon submit a revised manuscript with some responses to your comments and questions.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC1
CEC1:
'Comment on egusphere-2022-427', Juan Antonio Añel, 16 Aug 2022

Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our Code and Data Policy.

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your data and code in Dryad; however, the archived assets present several problems.
First, for the code, I have not found a license listed. If you do not include a license, the code continues to be your property and can not be used by others, despite any statement on being free to use. Therefore, when uploading the code to the repository, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licensesk/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.
Second, it is unclear that all the code used for your work is included in the repository. Please, make it clear, for example, by adding a Readme file that explains step by step all the computations or implementations described in the manuscript (e.g. MLR, SVR, RF...) and where (in what files) they can be found in the repository.
Also, you have uploaded your Python notebooks as .docx files. This is a weird way of sharing them. Indeed, the advantage of a python notebook is that anyone can download and use it directly. Moreover, .docx is a privative format, and compatibility of this format is not fully assured, so if you were to use a plain text format to share the code, it would be better to use .txt or OpenDocument Format (.odt). Notwithstanding, in this case, I encourage you to take advantage of the fact of using a notebook and share your ipynb files.
The same applies to the spreadsheet containing data. XLS and XLSX are not standard formats. Please, save the data in a format that assures future accessibility to the files (e.g. .ods, .csv, .dat)
Therefore, please, publish your code following the above-mentioned instructions, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2022-427-CEC1
- AC2:
  'Reply on CEC1', Kim-Ndor Djimadoumngar, 26 Aug 2022
  
  Dear Executive Editor,
  I sincerely apologize for a such delay in relying to your comment. I live in a technically and technologically very challenged area, with frequent power blackouts and difficult or poor internet connectivity. I also apologize for the file formats.
  I am sorry for my misunderstanding about the data policies. I thought the data are only shared with Copernicus/GMD reviewers at this stage of peer review process. They will be public if the manuscript is accepted. They are currently set for private access prior to publication. Dryad will share with you for the review process. If the mansucript is accepted, I will uncheck the box for private for peer review. That is my understanding.
  I have been working on your suggestions above despite power and internet challenges. I hope it will not be too late.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC2
  - CEC2: 'Reply on AC2', Juan Antonio Añel, 26 Aug 2022
    
    Dear Kim-Ndor,
    Thanks for your reply. Unfortunately, you are wrong. In our Discussions process, anyone can be a reviewer, not only those invited by editors. Therefore, the code must be available to anyone. We will reject your manuscript if you do not provide the code fully open at this stage. Indeed, it should have never been published in Discussions with such shortcomings; This was an oversight by the topical editor, and we apologize for it.
    Regards,
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2022-427-CEC2
    
    AC3: 'Reply on CEC2', Kim-Ndor Djimadoumngar, 26 Aug 2022
    
    Dear Executive Editor,
    Thank you very much for this clarification. I am working on that.
    However, I have an intellectual propriety concern and a question. If anyone can be a reviewer, do you guarantee that my idea and dataset will not be taken by someone else, twisted, and published before me in case you reject my manuscript at the end of the peer review process? I had to go to the shores of Lake Chad to collect the ground-truth coordinates and request ground-truth lake level data from local institutions for this manuscript. It is just to let you know the work I have accomplished to this level to not accept to be a victim of idea/intellectual propriety theft.
    Respectfully,
    Kim-Ndor
    
    Citation: https://doi.org/10.5194/egusphere-2022-427-AC3
RC2:
'Comment on egusphere-2022-427', Anonymous Referee #2, 31 Aug 2022

General comment:

The stated goals of this paper are to determine how accurately remote sensing data can help study ground-truth Lake Chad hydrologic data and what machine learning models maybe of best use to analyze both remote sensing and ground-truth Lake Chad hydrologic data. While the reviewer agrees with the motivations of the study (water availability in vulnerable regions and data-driven models where observations are scarce), in view of the multiplicity of the study aims it is difficult to identify the new contribution of this work to the literature. This lack of clarity in the aims of this work is evident throughout the paper. Consequently, I recommend rejection of the paper to allow the author sufficient time to revise the paper and focus it on clear objectives that demonstrates awareness of previous work and that are reflected in the methods of analysis and interpretation and discussion of the findings. Specific comments follow below.

Specific comments:

– The contextual background in the introduction is inadequate. This is evident also in the sparse references. This problem is related to the lack of clarity in the aims of the study. The author does not articulate the state-of-the-art of remote sensing of this lake, or of other lakes in general, in order to put this work in the context of previous work. What do we know about remote sensing of Lake Chad water level, or water level in other lakes? What do we need to know to advance knowledge of the hydrology of this lake? How exactly does this work advance the state-of-the-art?

The author has not provided the physical background on the main drivers of lake water level changes, i.e., are these changes dominated by heat flux changes or mass changes? The author states that (line 52–54) “we assume that Lake Chad’s level is a function of precipitation, soil moisture, air temperature, soil temperature, evapotranspiration, and specific humidity factors:” what are the physical basis for these assumptions? Can the author explain, for example, how soil temperature or soil moisture controls Lake Chad water level? Such explanations can be useful for evaluating the validity of statements such as (line 54) “precipitation is the only and most important climate variable on which all other climate variable variations depend.”

– The author has not provided enough information about the quality of the input data sets (GPCC, GLDAS, and remote sensing lake levels) used in this study. Are these data sets validated for other lakes? What is the spatial resolution of the ground-truth lake level (i.e., spatial sampling distance) relative to the spatial resolution of the satellite-derived lake levels and relative to the lake surface area? It is difficult to evaluate the presented statistical analysis without a detailed description of the input data set characteristics.

– The discussion and interpretation of the results of the statistical analysis are poor. For example, there is no discussion of the physical relationships underlying one of the main findings presented in Table 9; there is no discussion of the limitations of the input data sets; there is no discussion of the findings in the context of previous findings in this lake or other lakes; and there is no discussion of why Random Forest Regression and Support Vector Regression outperforms other algorithms. Without these discussions, it is difficult to determine the limits of applicability and usefulness of this work.

– There are too many acronyms, and this burdens the reader to retain them all while reading the paper. Please remove acronyms from the abstract; use acronyms only when the phrase occurs more than three times; or else, define the acronyms in the section where they are used.

– The figures and tables are of poor quality and overly verbose. Please use consistent font size and shade in the tables and figures (e.g., Figures 6 and 7); reduce words in the tables (e.g., Table 2 and 3); use consistent labeling in all the panel figures (top-left, top-right, bottom-left or bottom-right).

Other comments:

– Line 123: “Remote sensing lake level data is processed at latitude 13.02 and longitude 14.38:” please specify the meaning of this sentence.

– Line 68: please put the direction (N, S, E, W) after every latitude or longitude i.e., 6ºN and 20ºN.

– Please remove the word “we” throughout the manuscript.

References:

Kuhwald, Katja & Oppelt, Natascha. (2016). Remote sensing for lake research and monitoring – Recent advances. Ecological Indicators. 64. 105-122. 10.1016/j.ecolind.2015.12.009.

Policelli, Frederick & Hubbard, Alfred & Jung, Hahn Chul & Zaitchik, Ben & Ichoku, Charles. (2018). A predictive model for Lake Chad total surface water area using remotely sensed and modeled hydrological and meteorological parameters and multivariate regression analysis. Journal of Hydrology. 568. 10.1016/j.jhydrol.2018.11.037.

Wenbin, Zhu & Jia, Shaofeng & Lall, Upmanu & Cao, Qing & Mahmood, Rashid. (2018). Relative contribution of climate variability and human activities on the water loss of the Chari/Logone River discharge into Lake Chad: A conceptual and statistical approach. Journal of Hydrology. 569. 10.1016/j.jhydrol.2018.12.015.

Citation: https://doi.org/10.5194/egusphere-2022-427-RC2
- AC4: 'Reply on RC2', Kim-Ndor Djimadoumngar, 06 Sep 2022
  
  Dear Referee,
  Thank you very much for taking the time to review my manuscript and make comments and suggestions.
  I have taken into consideration the suggested methods from Referee #1 to re-analyze the study. I will add your comments and suggestions for the revised manuscript.
  Respectfully,
  Kim-Ndor
  
  Citation: https://doi.org/10.5194/egusphere-2022-427-AC4

Kim-Ndor Djimadoumngar

Viewed

Total article views: 728 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
484	209	35	728	8	8

HTML: 484
PDF: 209
XML: 35
Total: 728
BibTeX: 8
EndNote: 8

Views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	123	28	10	161
Aug 2022	102	41	12	155
Sep 2022	49	23	3	75
Oct 2022	23	7	1	31
Nov 2022	16	4	1	21
Dec 2022	14	8	0	22
Jan 2023	11	3	0	14
Feb 2023	19	15	0	34
Mar 2023	7	3	0	10
Apr 2023	8	6	0	14
May 2023	2	1	0	3
Jun 2023	5	3	1	9
Jul 2023	6	15	2	23
Aug 2023	8	4	0	12
Sep 2023	12	4	0	16
Oct 2023	15	4	0	19
Nov 2023	6	1	0	7
Dec 2023	9	7	0	16
Jan 2024	15	3	1	19
Feb 2024	10	14	2	26
Mar 2024	12	13	1	26
Apr 2024	12	2	1	15

Cumulative views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	123	28	10	161
Aug 2022	102	41	12	155
Sep 2022	49	23	3	75
Oct 2022	23	7	1	31
Nov 2022	16	4	1	21
Dec 2022	14	8	0	22
Jan 2023	11	3	0	14
Feb 2023	19	15	0	34
Mar 2023	7	3	0	10
Apr 2023	8	6	0	14
May 2023	2	1	0	3
Jun 2023	5	3	1	9
Jul 2023	6	15	2	23
Aug 2023	8	4	0	12
Sep 2023	12	4	0	16
Oct 2023	15	4	0	19
Nov 2023	6	1	0	7
Dec 2023	9	7	0	16
Jan 2024	15	3	1	19
Feb 2024	10	14	2	26
Mar 2024	12	13	1	26
Apr 2024	12	2	1	15

Viewed (geographical distribution)

Total article views: 653 (including HTML, PDF, and XML) Thereof 653 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 24 Apr 2024

Short summary

This study aims to identify the best methods to analyze Lake Chad's level and which of the remote sensing and ground-truth data give higher accuracy. Random Forest is the best model. Soil temperature is the major driver of remote sensing lake level fluctuations. Precipitation is the first factor for ground-truth lake level. This study gives perspectives on a water management system connecting climate change and vulnerability in the Lake Chad region.


Total:	0
HTML:	0
PDF:	0
XML:	0