UVBoost (v0.5): a hybrid radiative transfer and machine learning model for estimating ultraviolet radiation

de Paula Corrêa, Marcelo

doi:10.5194/egusphere-2022-465

Preprints

https://doi.org/10.5194/egusphere-2022-465

Preprints

05 Jul 2022

| 05 Jul 2022

UVBoost (v0.5): a hybrid radiative transfer and machine learning model for estimating ultraviolet radiation

Marcelo de Paula Corrêa

Abstract. This article presents UVBoost, a hybrid radiative transfer estimator based on a Supervised Machine Learning (SML) regression model powered by high precision ultraviolet radiation (UVR) calculations provided by a conventional Radiative Transference Model (RTM). The proposed regression model takes UVR as a dependent variable, and the Solar Zenith Angle (SZA), Total Ozone Content (TOC), and Aerosol Optical Depth (AOD), as the independent predictive variables. UVBoost was developed to increase computational speed for conducting calculations with large databases, without sacrificing result accuracy. Furthermore, this method employs a user-friendly code, which can be used by laymen or researchers in other areas. UVBoost can be used to disseminate UVR data online anywhere in different spatiotemporal scales, or for climatological projection studies on a global scale. The model was developed by comparing seven regression SML tools via cross validation. These results were validated using non-parametric statistical tests. Of all the tested tools, the Categorical Boosting (CatBoost) method showed the best accuracy at the lowest computational cost. Two additional studies were carried out, one at the global scale, and another at the local scale, to compare the traditional RTM vs. the UVBoost results. The first study simulated a global UVR field (1°x1°), with 64800 grid points, with input data from CMIP6, available at https://pcmdi.llnl.gov/CMIP6/. The differences between the RTM and the UVBoost were less than ±5 % for approximately 95 % of all points, except for points with high SZA. The computational speed of UVBoost surpassed that of the RTM by more than three orders of magnitude. The second study simulated the daily UVR at eight different locations on Earth. The results showed that the UVBoost was very efficient in simulating accumulated UVR doses during the day, with negligible differences (< ±3 %), which means it can be used in studies on UVR and human health. In the future, UVBoost will include other geophysical parameters and be extended to other bands in the electromagnetic spectrum.

Received: 08 Jun 2022 – Discussion started: 05 Jul 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 922 KB)

Supplement (99 KB)

Download & links

Marcelo de Paula Corrêa

Status: closed

CEC1:
'Comment on egusphere-2022-465', Juan Antonio Añel, 23 Aug 2022

Dear author,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
Your manuscript currently contains several violations of our policy. First, you have archived the SLM material on GitHub. However, GitHub is not a suitable repository. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, you must move all the information in the GitHub repository to one of the suitable ones.
Secondly, you do not provide the input and output data for your work. In your reply to the Topical Editor, you pointed out the CMIP6 repository; however, this information is too generic. A paper using SML to have the exact input and output files is critical to validate the scientific replicability and reproducibility. Therefore, it is mandatory that you upload to the repository such files. Beyond this, your manuscript is mostly about developing the database and applying the boost method, making it more critical to publish the mentioned data.
Also, you cite the UCAR repository for the TUV data, and again, the UCAR repositories are not suitable for long-term permanent archival, and we can not trust them for scientific publication. In this way, you must ask the maintainers of the TUV data to upload the file to one of the suitable repositories, or you must do it yourself.
Therefore, please, publish your code and data in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage which closes in one week.
Also, on this way, you must include the modified 'Code and Data Availability' section in a potential reviewed version of your manuscript, the DOI of the code (and another DOI for the dataset if necessary).
Be aware that failing to comply with this request will result in the rejection of your manuscript for publication.
Juan A. Añel
Geosci. Model Dev. Exec. Editor

Citation: https://doi.org/10.5194/egusphere-2022-465-CEC1
- AC1: 'Reply on CEC1', Marcelo de Paula Correa, 27 Aug 2022
  
  Dear Dr. Juan A. Añel, Geosci. Model Dev. Exec. Editor
  Thank you for recommendations. I will provide a new version of the manuscript with the SLM material and input data in one of the appropriate repositories.
  I will send this new version along with the answers for the reviewers as soon as possible.
  Best regards,
  
  Marcelo!
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC1
CC1:
'Comment on egusphere-2022-465', Juan Antonio Añel, 23 Aug 2022

Dear author and topical editor,
Reading this manuscript, it has come to my attention that its main contribution seems to be the construction of a dataset and then using it to test several different approaches. For it, third-party libraries (CatBoost and Sklearn) are used, and the regressor used is included in CatBoost. The implementation is a few lines of code in Python.
In this way, I doubt if Geosci. Model Dev. is the right journal for this work, and I would thank you if you double-check it. It could be that this paper is more a data paper, for example, suitable for the EGU journal ESSD, as the work developed by the author is simply the application of third-party code.
Probably, other reviewers can provide more insight on this issue.
Dr Juan A. Añel

Citation: https://doi.org/10.5194/egusphere-2022-465-CC1
- AC2: 'Reply on CC1', Marcelo de Paula Correa, 31 Aug 2022
  
  Dear Dr. Dr Juan A. Añel, GMD Exec. Editor
  Hereby I submit my revised manuscript for consideration for publication in the GMD or, if you deem it more suitable, for recommendation to the journal EGU ESSD.
  You and reviewers asked pertinent questions and made suggestions that have been addressed in this new version of the manuscript.
  A point-by-point explanation of how I responded to the formal comments of reviewers will be on the following replies.
  Thank you for your consideration and I hope to hear from you soon.
  
  Sincerely,
  Marcelo de Paula Corrêa
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC2
RC1:
'Comment on egusphere-2022-465', Anonymous Referee #1, 24 Aug 2022
Review of GMD manuscript egusphere-2022-465

Title: UVBoost (v0.5): a hybrid radiative transfer and machine learning model for estimating ultraviolet radiation

Author(s): Marcelo de Paula Corrêa

MS No.: egusphere-2022-465

MS type: Model description paper

General Comment: The article describes in detail the development of a UV radiation model (UVboost v0.5) based on SML (Supervised Machine Learning) tools. Seven different methods where chosen and after a process of cross-validation, the CAT Categorical Boosting (CatBoost) technique gave the best results, showing the best accuracy at the lowest computational cost. The accuracy of the model is also based on the chosen physical radiative transfer model TUV, well recognized in the radiative transfer community that works in the UV radiation field. Certainly the model UVboost v0.5 has a clear application, with well defined limits: clear skies and to predict or to estimate a great number of data (well spatial or temporal) and not to predict very specific and accurate situations.

The paper is well structured and well written but a revision is needed for next publication.

The first thing is related to the employed methodology, the SML methods. These methods are classified as part of the Artificial Intelligence (AI) but different classifications appear in the literature, as for instance Neural Network method. I’m not expert on them and hence a short classification (table or scheme) in section 2 would be welcome.

Also, I think that the author needs to check if other models of this type have been published in the literature for solar radiation or UV prediction/estimations or related with this topic. There are no references to this in the article. For instance I have seen the paper “Review of photovoltaic power forecasting by J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F.J. Martinez-de-Pison, F. Antonanzas-Torres. Solar Energy 136 (2016) 78-111, which in my opinion is related to this discussion.

The range of values selected for the AOD in Table I is very high, from 0 to 15. Therefore my question is if these values are spectral (given for a wavelength) or are integrated or broadband values corresponding to the integration over the whole spectral UV range of the model. Values of AOD greater than 1-2 (for a given wavelength or spectral AOD) are already very high (although used and measured values: i.e. AERONET in China) and are into the conditions of multiple scattering where the use or application of Beer-Lambert law is not correct. Values greater that 1-2 are very rare (not frequent) as can be seen in the values used in this article for the comparison. The use of the proposed high range of AOD values may disturb the physical model and create a high number of simulations that are not needed, which may also disturb the applied Machine Learning methods. Therefore, explain this problem about the values of AOD

I recommend and subsection where briefly describing the model (input-output-main core) in order to be run by the users, since the various information is scattered throughout the text. The model calculate UVI index and “Vitamin D weighted irradiance” but it is not clear for me if the model gives the UVER (W/m2) as output. As mentioned by one of the referees this type of models based of SML are difficult to replicate

For all of this, I considers that he paper may be accepted for publication after the recommended revision.
Citation: https://doi.org/10.5194/egusphere-2022-465-RC1
- AC3:
  'Reply on RC1', Marcelo de Paula Correa, 31 Aug 2022
  Dear anonymous reviewer #1,
  I thank you for the careful review of my paper. You proposed a series of reviews, which were mostly fulfilled. There were some suggestions that were partially answered. However, an appropriate argument for not completing the request was provided by me. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  The first thing is related to the employed methodology, the SML methods. These methods are classified as part of the Artificial Intelligence (AI) but different classifications appear in the literature, as for instance Neural Network method. I’m not expert on them and hence a short classification (table or scheme) in section 2 would be welcome.
  
  Answer: Machine learning (ML) is a subfield of artificial intelligence. Neural networks, in turn, make up the structure of ML algorithms. In my work, I don't mention the term artificial intelligence, but the use of a supervised ML that improve traditional mathematical methods. In my point of view, a discussion on these nomenclatures is beyond the scope of the study.
  In any case, if the editor agrees with the reviewer's suggestion, I propose the following text in the introduction section:
  In general, terms such as artificial intelligence (AI), machine learning (ML), and neural networks (NN) are mutually and reciprocally used. However, each is essentially a component of the prior term. Roughly, AI refers to the simulation of human intelligence processed by systems, machines or computers. AI use enhances the speed, precision and effectiveness of several daily tasks, such as cybersecurity, web search, online shopping, and more. In scientific research, AI has been widely used in big data and advanced statistical analysis. Using genomic data to predict protein structures and understanding the climate change effects are some of the main contributions of the use of AI (The Alan Turing Institute, 2022).
  ML, in turn, is defined as the use of a set of algorithms that find data patterns without the need for explicit instruction. ML algorithms may or may not be supervised. In case of supervised ML (SML), the code compares its outputs with the correct outputs during training. On the other hand, if the learning is not supervised (UML), the algorithm merely looks for patterns in the dataset.
  At last, a Neural Network (NN) is a simplified model of the human brain that emulates a set of consecutive algorithms (layers). Similar to linear regression (LR), the final layer represents the answer. The main difference between a usual LR model and a NN is the relevance of change on weights of the function. In NN, the output of a function is the input of the subsequent function. Thus, any change in the weights will affect other inputs in a function. That is, a cascade effect on the other artificial neurons in the network (Appenzeler, 2017; Kavlakoglu, 2020).
  Appenzeler, T.: The AI revolution in science. Science. In: News from Science. doi: 10.1126/science.aan7064, 2017.
  Kavlakoglu, E.: AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference? IBM Cloud Blog. https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks, 2020.
  The Alan Turing Institute - The Royal Society. The AI revolution in scientific research. Available at https://royalsociety.org/-/media/policy/projects/ai-and-society/AI-revolution-in-science.pdf.
  
  Also, I think that the author needs to check if other models of this type have been published in the literature for solar radiation or UV prediction/estimations or related with this topic. There are no references to this in the article. For instance I have seen the paper “Review of photovoltaic power forecasting by J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F.J. Martinez-de-Pison, F. Antonanzas-Torres. Solar Energy 136 (2016) 78-111, which in my opinion is related to this discussion.
  
  Answer: The author thanks the reviewer for pointing out this deficiency that were corrected in this new version. Please check the new paragraphs included in Section 1 – Introduction:
  ML methods have also been used for solar radiation prediction; and, this use had increased significantly in recent years. A review published about 5 years ago showed that the use of some ML techniques such as boosting, regression tree or random forest for solar radiation prediction were still rarely used (Voyant et al., 2017). Currently, ML techniques have been applied in different areas, such as solar panel power prediction (Zazoun, 2022), satellite radiation estimates (Cornejo-Bueno et al., 2019), and, as a complement to surface radiation measurements in different spectral bands (Feng et al., 2020; Narvaez, et al., 2020). After all, most weather stations cannot reliably provide global solar radiation observation data and finding an accurate way to predict this is very important.
  A recent and in-depth review study with more than 230 scientific publications in 20 years showed that: a) data quality control before model prediction is essential; b) methods that use training models are more accurate than methods based on statistical filters; c) novel and combined ML techniques tend to be future hot topics (Zhou et al., 2021). These elements are all addressed in this work. As there are still very few studies on UVR prediction using ML methods (e.g. Wu et al., 2022), this study offers a new contribution on the topic.
  
  The range of values selected for the AOD in Table I is very high, from 0 to 15. Therefore my question is if these values are spectral (given for a wavelength) or are integrated or broadband values corresponding to the integration over the whole spectral UV range of the model. Values of AOD greater than 1-2 (for a given wavelength or spectral AOD) are already very high (although used and measured values: i.e. AERONET in China) and are into the conditions of multiple scattering where the use or application of Beer-Lambert law is not correct. Values greater that 1-2 are very rare (not frequent) as can be seen in the values used in this article for the comparison. The use of the proposed high range of AOD values may disturb the physical model and create a high number of simulations that are not needed, which may also disturb the applied Machine Learning methods. Therefore, explain this problem about the values of AOD
  
  Answer: The input data for AOD in the TUV is for the wavelength of 550 nm. The spectral dependence is automatically determined using the Angström exponent. This exponent is also a qualitative indicator of aerosol particle size; values of a £ 1 indicate size distributions dominated by coarse mode aerosols that are typically associated with dust and sea salt, and values of a ³ 2 indicate size distributions dominated by fine mode aerosols that are usually associated with urban pollution and biomass burning. I used a = 1.5 in the simulations, as this is a climatologically accepted value as an average value for the presence of coarse and fine particles in the atmosphere (Schuster et al., 2006). Anyway, I included this information in the new version of the article.
  I agree that AOD > 2 are uncommon. However, fire episodes, desert dust uplift and thermal inversions in polluted urban centers can result in very high AODs. Values of this nature are even observed in known climatological databases (i.e., CMIP6). For that reason, I used this range of values (0 - 15). However, note that the step used for interpolation for AOD > 2 was larger and this did not interfere with the quality of the results.
  
  I recommend and subsection where briefly describing the model (input-output-main core) in order to be run by the users, since the various information is scattered throughout the text. The model calculate UVI index and “Vitamin D weighted irradiance” but it is not clear for me if the model gives the UVER (W/m2) as output. As mentioned by one of the referees this type of models based of SML are difficult to replicate.]
  
  Answer: I don't think an explanatory subsection is necessary. UVBoost is an easy-to-use code and the user manual is available with the code at https://doi.org/10.5281/zenodo.6783409. All the necessary information is available on the screen when running UVBoost. The user may choose to calculate the Ultraviolet Index (Erythemal irradiances), or the Vitamin D weighted irradiances (Wm-2). The user is also asked about the input data format (on screen or in a datasheet).
  Please, let me know if you still have any questions. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  
  For all of this, I considers that he paper may be accepted for publication after the recommended revision.
  
  Answer: Again, I thank you for the careful review, and positive comments.
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC3
RC2:
'Comment on egusphere-2022-465', Anonymous Referee #2, 27 Aug 2022

Summary: The author present a radiative transfer model for the ultraviolet (UV) regime. The distinguishing feature of the model is that it is based on a supervised machine learning approach.

The model uses diverse predictors, such as total ozone content, to infer UV radiation.

Major Comments:

I believe that the discussion of UV health effects and machine learning methods in the manuscript are sound and the author demonstrates expertise in both areas.

However, the basis of this machine learning approach is a database from a radiative transfer model.

In this area the author consistently uses wrong terminology and applies the wrong methods to the problem at hand, which suggests insufficient expertise in this field.

As the computation of the regression data is the fundamental basis of the manuscript, I cannot recommend a publication.

A further major and related issue is a completely inadequate description of the atmospheric input data for the radiative transfer model.

This is compounded by a variety of further issues:

- The title of the manuscript mentions a hybrid radiative transfer and machine learning model. This would imply that the model somehow combines traditional radiative transfer techniques and machine learning, which is not the case. This misleading title should be corrected.

- While the introduction section is overall well written and comprehensive, it is missing a paragraph on existing machine learning approaches to radiative transfer, as this is a quite mature field with lots of active development.

- line 85: Please provide adequate proof or a reference for the claim that UVR fluxes cannot be properly predicted with cross-validation techniques.

- line 170: TUV is a radiative transfer model, not a database in itself. What you are probably referring to here is the TUV climatology that is available for download.

- line 173: Which UV irradiances were calculated here, surface, TOA, or something else?

- line 179: How can you vary the total ozone content, but keep the atmospheric structure (i.e. the ozone profile) constant?

- line 180ff: If scattering clouds were not considered in this study, then the application of DISORT with 8 streams makes very little sense. A pure emission solver would be sufficient to accurately account for the ozone absorption of the UV radiation. The only phenomenon that would necessitate a scattering radiative transfer solver is Rayleigh scattering, which is not the dominating factor here, compared to the ozone absorption.

- A description of the crucial ozone spectroscopy used in the radiative transfer model of this study is completely missing.

- line 181: How can large variability and spatio-temporal complexity for planet cloud cover mean less accurate RTM results?

- line 182: Does your model use the Cloud Modification Factor you mention to correct for clouds?

- A thorough description of the atmospheric profile input data for the TUV radiative transfer model is completely missing.

- Subsection 2.3, line 212: The dataset does not include a validation partition to check its generalization capability.

Minor Comments:

Some minor issues related to spelling and expressions should be corrected:

- line 44: Radiative Transfer Models. The word Transference is wrong in this context.

- line 47: , and TUV (Madronich and Flocke, 1997), (the and is missing)

- line 48: Despite showing a good balance between performance...

- line 49: computational cost (singular)

- line 49: under certain conditions / in certain situations (but not under certain situations)

- line 50: Please add the website for Quick TUV in this line to the references instead.

- line 55: allows researchers to predict / allows researchers the prediction of extreme situations

- line 59: See the first major comment. If the UVBoost model is a pure regression model, the description as a hybrid model is misleading. The approach does not combine a traditional radiative transfer method and a regression approach in one single model, instead the radiative transfer model only provides the training data for the final regression model.

- line 63: the CatBoost tool was a suitable powerful and fast algorithm

- line 76: assuming that all other variables X_k for k != j remain the same.

- line 82: Then, the next 1/k-th part of the data

- line 88: Please provide a reference for support vector machines.

- line 92: Please provide a justification, such as a concrete example, for the statement that SVMs have problems when graphically visualizing and theoretically interpreting results.

- line 95: Please provide a reference for decision tree models.

- line 109: Please provide a reference for the Gini impurity coefficient and the entropy coefficient.

- line 123: Please provide a reference for the clustering technique and the aggregating bootstrap technique.

- line 140: Please provide a reference for Boosting.

- line 171: Please put the TUV link in the references.

- line 172: radiative transfer equation. The expression "radiative transference" does not exist.

- line 172: I am assuming that you are talking about the two-stream method here, not two-flow method.

- line 172: Likewise, I am assuming that you are talking about DISORT with n streams, instead of n fluctuations?

- line 205: Angström exponent, not coefficient.

- line 222: Please define the acronym ANOVA.

- line 267: UVBoost as described in the manuscript is not a hybrid model. This statement is misleading.

Citation: https://doi.org/10.5194/egusphere-2022-465-RC2
- AC4:
  'Reply on RC2', Marcelo de Paula Correa, 31 Aug 2022
  Dear anonymous reviewer #2,
  I thank you for the careful review of my paper. You proposed a series of reviews, which were mostly fulfilled. There were some suggestions that were partially answered. However, an appropriate argument for not completing the request was provided by me. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  
  Major Comments:
  
  I believe that the discussion of UV health effects and machine learning methods in the manuscript are sound and the author demonstrates expertise in both areas.
  
  However, the basis of this machine learning approach is a database from a radiative transfer model.
  
  In this area the author consistently uses wrong terminology and applies the wrong methods to the problem at hand, which suggests insufficient expertise in this field.
  
  As the computation of the regression data is the fundamental basis of the manuscript, I cannot recommend a publication.
  
  A further major and related issue is a completely inadequate description of the atmospheric input data for the radiative transfer model.
  
  Answer: Again, I would like to the thank for the clear review of my manuscript. I feel that these recommendations, once addressed, strengthen the manuscript. I hope the revised version is now suitable for publication and look forward to hearing from you in due course.
  Firstly, it is important to clarify that I used the terminology "hybrid model" for UVBoost with the best of intentions. After all, it is an input (SZA, TOC, AOD)/output (irradiances) code with a core based on an SML treatment (CatBoost). Despite its simplicity, it is fast, very efficient and, above all, it provides very accurate results.
  Therefore, I strongly disagree that the methods are wrong and that the study is a mere regression calculus. The article was carefully written and my objective was to contribute to an important scientific gap, such as the use of SML techniques in erythemally/vitamin D weighted UV Irradiance prediction..
  If the reviewer considers this work a simple application of third-party codes, I agree with the executive editor's recommendation to forward this article to the EGU journal ESSD (please check the interactive discussions).
  This is compounded by a variety of further issues:
  
  - The title of the manuscript mentions a hybrid radiative transfer and machine learning model. This would imply that the model somehow combines traditional radiative transfer techniques and machine learning, which is not the case. This misleading title should be corrected.
  
  Answer: Again, I used the terminology "hybrid model" for UVBoost with the best of intentions. According the Merrian-Webster Dictionary, the term 'hybrid' is defined as 'something heterogeneous in origin or composition'. The proposed model is composed by a combination of two heterogeneous methods: a traditional radiative transfer code and a SML regression technique.
  However, if this title gives a wrong impression, I suggest a new title such as "UVBoost (0.5): Erythemal and Vitamin D weighted UV radiation estimator based on a Machine Learning gradient boosting algorithm".
  
  - While the introduction section is overall well written and comprehensive, it is missing a paragraph on existing machine learning approaches to radiative transfer, as this is a quite mature field with lots of active development.
  
  Answer: The author thanks the reviewer for pointing out this deficiency that were corrected in this new version. Please check the new paragraphs included in Section 1 – Introduction:
  “ML methods have also been used for solar radiation prediction; and, this use had increased significantly in recent years. A review published about 5 years ago showed that the use of some ML techniques such as boosting, regression tree or random forest for solar radiation prediction were still rarely used (Voyant et al., 2017). Currently, ML techniques have been applied in different areas, such as solar panel power prediction (Zazoun, 2022), satellite radiation estimates (Cornejo-Bueno et al., 2019), and, as a complement to surface radiation measurements in different spectral bands (Feng et al., 2020; Narvaez, et al., 2020). After all, most weather stations cannot reliably provide global solar radiation observation data and finding an accurate way to predict this is very important.
  A recent and in-depth review study with more than 230 scientific publications in 20 years showed that: a) data quality control before model prediction is essential; b) methods that use training models are more accurate than methods based on statistical filters; c) novel and combined ML techniques tend to be future hot topics (Zhou et al., 2021). These elements are all addressed in this work. As there are still very few studies on UVR prediction using ML methods (e.g. Wu et al., 2022), this study offers a new contribution on the topic.”
  
  - line 85: Please provide adequate proof or a reference for the claim that UVR fluxes cannot be properly predicted with cross-validation techniques.
  
  Answer: I appreciate the reviewer's concern with this detail. But, in line 85, I refer to the cross-validation techniques applied to the MLR (However, even when using cross-validation techniques one cannot properly fit an MLR model to UVR predictions). Table 3 shows these cross-validation statistics. So, I think that this is an adequate proof.
  For clarity, I complemented the sentence with: "as we will see later in this article".
  
  - line 170: TUV is a radiative transfer model, not a database in itself. What you are probably referring to here is the TUV climatology that is available for download.
  
  Answer: I apologize for my lack of attention and I thank the reviewer for pointing out this error. The sentence was fixed: "The database for testing and training was built from calculations performed by the RTM TUV v5.3.2 (Madronich and Flocke, 1997; NCAR, 2022)." PS: I also put the TUV link in the references according a "minor revision" recommendation.
  
  - line 173: Which UV irradiances were calculated here, surface, TOA, or something else?
  
  Answer: I recognize that this was not properly explained in the text. UVBoost estimates downward UVR at surface. I clarified this information in the text.
  
  - line 179: How can you vary the total ozone content, but keep the atmospheric structure (i.e. the ozone profile) constant?
  
  Answer: I thank the Reviewer for pointing out this lack of information. The atmospheric profiles were changed according the geographic position. For this reason, I included the following information in the paragraph: "The TOC atmospheric vertical profile was adjusted by the geographic position according the AFGL Reference Model Atmospheric Profiles (Anderson et al., 1986; Gordon et al., 2022). The following vertical distributions were used: tropical atmosphere profile for the gridpoints between the equator and 30° latitude, mid-latitude profile between 30 and 60° latitude, and subarctic profile above 60° latitude."
  
  - line 180ff: If scattering clouds were not considered in this study, then the application of DISORT with 8 streams makes very little sense. A pure emission solver would be sufficient to accurately account for the ozone absorption of the UV radiation. The only phenomenon that would necessitate a scattering radiative transfer solver is Rayleigh scattering, which is not the dominating factor here, compared to the ozone absorption.
  
  Answer: The reviewer's question is pertinent, but I would like to clarify my option for more complex calculations. I used the DISORT application with 8-streams to be as accurate as possible in view of the presence of aerosols and molecular scattering. Results using simpler methods (e.g. 2-fluxes delta-Eddington, etc) may be reasonable for most no-aerosol clear-sky calculations. However, my goal was to build a robust and very accurate database for training.
  
  - A description of the crucial ozone spectroscopy used in the radiative transfer model of this study is completely missing.
  
  Answer: I disagree with the need to incorporate a discussion on the ozone spectroscopy. While I agree that the paper ozone spectroscopy is an interesting aspect in and of itself, my paper focuses on the SML method for the UVR estimation based on atmospheric parameters input data, such as TOC and AOD. The ozone spectroscopy used in the TUV has already been well discussed in the references of the code itself.
  Therefore, I believe that adding this dimension would not contribute to the scope of my paper. A focus on the ozone (or other gases) spectroscopy could be a goog topic for a follow-up study, e.g., on the use SML for infrared radiation inference.
  
  - line 181: How can large variability and spatio-temporal complexity for planet cloud cover mean less accurate RTM results?
  
  Answer: I thank the reviewer for his concern on this matter. It is well-known that clear-sky RTM calculations are more precise than cloudiness calculations. In general, the results of the clear-sky calculations from most RTMs are nearly identical (Aumann et al., 2018). Using cloudy observations in forecast models is difficult. Firstly, clouds show large spatio-temporal variability. Besides, cloud physics uncertainties, large variability of the vertical distribution of ice and liquid water in clouds, challenge of cloud structure representation, 3D effects, and cloud overlap assumptions. I included this comment and the reference in the paper.
  
  - line 182: Does your model use the Cloud Modification Factor you mention to correct for clouds?
  
  Answer: No, UVBoost estimates only clear-sky irradiances. Cloud Modification Factor (CMF) is defined by the ratio between the measured UV radiation in a cloudy sky and the simulated radiation under cloud-free conditions (Foyo-Moreno et al., 2001). In fact, the CMF is used to estimate, in an approximate way, the radiation attenuation caused by cloud cover. In this case, the effect of clouds is given by the product of clear sky irradiance, previously calculated by an MTR (or, in this case, by the UVBoost), and the cloud modification factor (CMF).
  CMF may be estimated by using clear-sky RTM simulations and ground-based observations. Or even, using look-up tables for different cloud types (e.g.: http://i115srv2.vu-wien.ac.at/UV/booklet/par_4.htm). In fact, UVBoost can even be used for this type of study.
  Foyo-Moreno, I., Alados, I., Olmo, F. et al. On the use of a cloud modification factor for solar UV (290–385 nm) spectral range. Theor Appl Climatol 68, 41–50 (2001). https://doi.org/10.1007/s007040170052.
  
  - A thorough description of the atmospheric profile input data for the TUV radiative transfer model is completely missing.
  
  Answer: It was fixed. Please check the answer for line 179.
  
  - Subsection 2.3, line 212: The dataset does not include a validation partition to check its generalization capability.
  
  Answer: Dear reviewer, SML datasets used in the UVBoost development, including training and test files, are available at thttps://doi.org/10.5281/zenodo.7027724.
  
  Minor Comments:
  
  Answer: I thank you for this careful review. You proposed a series of minor reviews, which were all fulfilled.
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC4

Status: closed

CEC1:
'Comment on egusphere-2022-465', Juan Antonio Añel, 23 Aug 2022

Dear author,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
Your manuscript currently contains several violations of our policy. First, you have archived the SLM material on GitHub. However, GitHub is not a suitable repository. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, you must move all the information in the GitHub repository to one of the suitable ones.
Secondly, you do not provide the input and output data for your work. In your reply to the Topical Editor, you pointed out the CMIP6 repository; however, this information is too generic. A paper using SML to have the exact input and output files is critical to validate the scientific replicability and reproducibility. Therefore, it is mandatory that you upload to the repository such files. Beyond this, your manuscript is mostly about developing the database and applying the boost method, making it more critical to publish the mentioned data.
Also, you cite the UCAR repository for the TUV data, and again, the UCAR repositories are not suitable for long-term permanent archival, and we can not trust them for scientific publication. In this way, you must ask the maintainers of the TUV data to upload the file to one of the suitable repositories, or you must do it yourself.
Therefore, please, publish your code and data in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available for the Discussions stage which closes in one week.
Also, on this way, you must include the modified 'Code and Data Availability' section in a potential reviewed version of your manuscript, the DOI of the code (and another DOI for the dataset if necessary).
Be aware that failing to comply with this request will result in the rejection of your manuscript for publication.
Juan A. Añel
Geosci. Model Dev. Exec. Editor

Citation: https://doi.org/10.5194/egusphere-2022-465-CEC1
- AC1: 'Reply on CEC1', Marcelo de Paula Correa, 27 Aug 2022
  
  Dear Dr. Juan A. Añel, Geosci. Model Dev. Exec. Editor
  Thank you for recommendations. I will provide a new version of the manuscript with the SLM material and input data in one of the appropriate repositories.
  I will send this new version along with the answers for the reviewers as soon as possible.
  Best regards,
  
  Marcelo!
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC1
CC1:
'Comment on egusphere-2022-465', Juan Antonio Añel, 23 Aug 2022

Dear author and topical editor,
Reading this manuscript, it has come to my attention that its main contribution seems to be the construction of a dataset and then using it to test several different approaches. For it, third-party libraries (CatBoost and Sklearn) are used, and the regressor used is included in CatBoost. The implementation is a few lines of code in Python.
In this way, I doubt if Geosci. Model Dev. is the right journal for this work, and I would thank you if you double-check it. It could be that this paper is more a data paper, for example, suitable for the EGU journal ESSD, as the work developed by the author is simply the application of third-party code.
Probably, other reviewers can provide more insight on this issue.
Dr Juan A. Añel

Citation: https://doi.org/10.5194/egusphere-2022-465-CC1
- AC2: 'Reply on CC1', Marcelo de Paula Correa, 31 Aug 2022
  
  Dear Dr. Dr Juan A. Añel, GMD Exec. Editor
  Hereby I submit my revised manuscript for consideration for publication in the GMD or, if you deem it more suitable, for recommendation to the journal EGU ESSD.
  You and reviewers asked pertinent questions and made suggestions that have been addressed in this new version of the manuscript.
  A point-by-point explanation of how I responded to the formal comments of reviewers will be on the following replies.
  Thank you for your consideration and I hope to hear from you soon.
  
  Sincerely,
  Marcelo de Paula Corrêa
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC2
RC1:
'Comment on egusphere-2022-465', Anonymous Referee #1, 24 Aug 2022
Review of GMD manuscript egusphere-2022-465

Title: UVBoost (v0.5): a hybrid radiative transfer and machine learning model for estimating ultraviolet radiation

Author(s): Marcelo de Paula Corrêa

MS No.: egusphere-2022-465

MS type: Model description paper

General Comment: The article describes in detail the development of a UV radiation model (UVboost v0.5) based on SML (Supervised Machine Learning) tools. Seven different methods where chosen and after a process of cross-validation, the CAT Categorical Boosting (CatBoost) technique gave the best results, showing the best accuracy at the lowest computational cost. The accuracy of the model is also based on the chosen physical radiative transfer model TUV, well recognized in the radiative transfer community that works in the UV radiation field. Certainly the model UVboost v0.5 has a clear application, with well defined limits: clear skies and to predict or to estimate a great number of data (well spatial or temporal) and not to predict very specific and accurate situations.

The paper is well structured and well written but a revision is needed for next publication.

The first thing is related to the employed methodology, the SML methods. These methods are classified as part of the Artificial Intelligence (AI) but different classifications appear in the literature, as for instance Neural Network method. I’m not expert on them and hence a short classification (table or scheme) in section 2 would be welcome.

Also, I think that the author needs to check if other models of this type have been published in the literature for solar radiation or UV prediction/estimations or related with this topic. There are no references to this in the article. For instance I have seen the paper “Review of photovoltaic power forecasting by J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F.J. Martinez-de-Pison, F. Antonanzas-Torres. Solar Energy 136 (2016) 78-111, which in my opinion is related to this discussion.

The range of values selected for the AOD in Table I is very high, from 0 to 15. Therefore my question is if these values are spectral (given for a wavelength) or are integrated or broadband values corresponding to the integration over the whole spectral UV range of the model. Values of AOD greater than 1-2 (for a given wavelength or spectral AOD) are already very high (although used and measured values: i.e. AERONET in China) and are into the conditions of multiple scattering where the use or application of Beer-Lambert law is not correct. Values greater that 1-2 are very rare (not frequent) as can be seen in the values used in this article for the comparison. The use of the proposed high range of AOD values may disturb the physical model and create a high number of simulations that are not needed, which may also disturb the applied Machine Learning methods. Therefore, explain this problem about the values of AOD

I recommend and subsection where briefly describing the model (input-output-main core) in order to be run by the users, since the various information is scattered throughout the text. The model calculate UVI index and “Vitamin D weighted irradiance” but it is not clear for me if the model gives the UVER (W/m2) as output. As mentioned by one of the referees this type of models based of SML are difficult to replicate

For all of this, I considers that he paper may be accepted for publication after the recommended revision.
Citation: https://doi.org/10.5194/egusphere-2022-465-RC1
- AC3:
  'Reply on RC1', Marcelo de Paula Correa, 31 Aug 2022
  Dear anonymous reviewer #1,
  I thank you for the careful review of my paper. You proposed a series of reviews, which were mostly fulfilled. There were some suggestions that were partially answered. However, an appropriate argument for not completing the request was provided by me. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  The first thing is related to the employed methodology, the SML methods. These methods are classified as part of the Artificial Intelligence (AI) but different classifications appear in the literature, as for instance Neural Network method. I’m not expert on them and hence a short classification (table or scheme) in section 2 would be welcome.
  
  Answer: Machine learning (ML) is a subfield of artificial intelligence. Neural networks, in turn, make up the structure of ML algorithms. In my work, I don't mention the term artificial intelligence, but the use of a supervised ML that improve traditional mathematical methods. In my point of view, a discussion on these nomenclatures is beyond the scope of the study.
  In any case, if the editor agrees with the reviewer's suggestion, I propose the following text in the introduction section:
  In general, terms such as artificial intelligence (AI), machine learning (ML), and neural networks (NN) are mutually and reciprocally used. However, each is essentially a component of the prior term. Roughly, AI refers to the simulation of human intelligence processed by systems, machines or computers. AI use enhances the speed, precision and effectiveness of several daily tasks, such as cybersecurity, web search, online shopping, and more. In scientific research, AI has been widely used in big data and advanced statistical analysis. Using genomic data to predict protein structures and understanding the climate change effects are some of the main contributions of the use of AI (The Alan Turing Institute, 2022).
  ML, in turn, is defined as the use of a set of algorithms that find data patterns without the need for explicit instruction. ML algorithms may or may not be supervised. In case of supervised ML (SML), the code compares its outputs with the correct outputs during training. On the other hand, if the learning is not supervised (UML), the algorithm merely looks for patterns in the dataset.
  At last, a Neural Network (NN) is a simplified model of the human brain that emulates a set of consecutive algorithms (layers). Similar to linear regression (LR), the final layer represents the answer. The main difference between a usual LR model and a NN is the relevance of change on weights of the function. In NN, the output of a function is the input of the subsequent function. Thus, any change in the weights will affect other inputs in a function. That is, a cascade effect on the other artificial neurons in the network (Appenzeler, 2017; Kavlakoglu, 2020).
  Appenzeler, T.: The AI revolution in science. Science. In: News from Science. doi: 10.1126/science.aan7064, 2017.
  Kavlakoglu, E.: AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference? IBM Cloud Blog. https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks, 2020.
  The Alan Turing Institute - The Royal Society. The AI revolution in scientific research. Available at https://royalsociety.org/-/media/policy/projects/ai-and-society/AI-revolution-in-science.pdf.
  
  Also, I think that the author needs to check if other models of this type have been published in the literature for solar radiation or UV prediction/estimations or related with this topic. There are no references to this in the article. For instance I have seen the paper “Review of photovoltaic power forecasting by J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F.J. Martinez-de-Pison, F. Antonanzas-Torres. Solar Energy 136 (2016) 78-111, which in my opinion is related to this discussion.
  
  Answer: The author thanks the reviewer for pointing out this deficiency that were corrected in this new version. Please check the new paragraphs included in Section 1 – Introduction:
  ML methods have also been used for solar radiation prediction; and, this use had increased significantly in recent years. A review published about 5 years ago showed that the use of some ML techniques such as boosting, regression tree or random forest for solar radiation prediction were still rarely used (Voyant et al., 2017). Currently, ML techniques have been applied in different areas, such as solar panel power prediction (Zazoun, 2022), satellite radiation estimates (Cornejo-Bueno et al., 2019), and, as a complement to surface radiation measurements in different spectral bands (Feng et al., 2020; Narvaez, et al., 2020). After all, most weather stations cannot reliably provide global solar radiation observation data and finding an accurate way to predict this is very important.
  A recent and in-depth review study with more than 230 scientific publications in 20 years showed that: a) data quality control before model prediction is essential; b) methods that use training models are more accurate than methods based on statistical filters; c) novel and combined ML techniques tend to be future hot topics (Zhou et al., 2021). These elements are all addressed in this work. As there are still very few studies on UVR prediction using ML methods (e.g. Wu et al., 2022), this study offers a new contribution on the topic.
  
  The range of values selected for the AOD in Table I is very high, from 0 to 15. Therefore my question is if these values are spectral (given for a wavelength) or are integrated or broadband values corresponding to the integration over the whole spectral UV range of the model. Values of AOD greater than 1-2 (for a given wavelength or spectral AOD) are already very high (although used and measured values: i.e. AERONET in China) and are into the conditions of multiple scattering where the use or application of Beer-Lambert law is not correct. Values greater that 1-2 are very rare (not frequent) as can be seen in the values used in this article for the comparison. The use of the proposed high range of AOD values may disturb the physical model and create a high number of simulations that are not needed, which may also disturb the applied Machine Learning methods. Therefore, explain this problem about the values of AOD
  
  Answer: The input data for AOD in the TUV is for the wavelength of 550 nm. The spectral dependence is automatically determined using the Angström exponent. This exponent is also a qualitative indicator of aerosol particle size; values of a £ 1 indicate size distributions dominated by coarse mode aerosols that are typically associated with dust and sea salt, and values of a ³ 2 indicate size distributions dominated by fine mode aerosols that are usually associated with urban pollution and biomass burning. I used a = 1.5 in the simulations, as this is a climatologically accepted value as an average value for the presence of coarse and fine particles in the atmosphere (Schuster et al., 2006). Anyway, I included this information in the new version of the article.
  I agree that AOD > 2 are uncommon. However, fire episodes, desert dust uplift and thermal inversions in polluted urban centers can result in very high AODs. Values of this nature are even observed in known climatological databases (i.e., CMIP6). For that reason, I used this range of values (0 - 15). However, note that the step used for interpolation for AOD > 2 was larger and this did not interfere with the quality of the results.
  
  I recommend and subsection where briefly describing the model (input-output-main core) in order to be run by the users, since the various information is scattered throughout the text. The model calculate UVI index and “Vitamin D weighted irradiance” but it is not clear for me if the model gives the UVER (W/m2) as output. As mentioned by one of the referees this type of models based of SML are difficult to replicate.]
  
  Answer: I don't think an explanatory subsection is necessary. UVBoost is an easy-to-use code and the user manual is available with the code at https://doi.org/10.5281/zenodo.6783409. All the necessary information is available on the screen when running UVBoost. The user may choose to calculate the Ultraviolet Index (Erythemal irradiances), or the Vitamin D weighted irradiances (Wm-2). The user is also asked about the input data format (on screen or in a datasheet).
  Please, let me know if you still have any questions. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  
  For all of this, I considers that he paper may be accepted for publication after the recommended revision.
  
  Answer: Again, I thank you for the careful review, and positive comments.
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC3
RC2:
'Comment on egusphere-2022-465', Anonymous Referee #2, 27 Aug 2022

Summary: The author present a radiative transfer model for the ultraviolet (UV) regime. The distinguishing feature of the model is that it is based on a supervised machine learning approach.

The model uses diverse predictors, such as total ozone content, to infer UV radiation.

Major Comments:

I believe that the discussion of UV health effects and machine learning methods in the manuscript are sound and the author demonstrates expertise in both areas.

However, the basis of this machine learning approach is a database from a radiative transfer model.

In this area the author consistently uses wrong terminology and applies the wrong methods to the problem at hand, which suggests insufficient expertise in this field.

As the computation of the regression data is the fundamental basis of the manuscript, I cannot recommend a publication.

A further major and related issue is a completely inadequate description of the atmospheric input data for the radiative transfer model.

This is compounded by a variety of further issues:

- The title of the manuscript mentions a hybrid radiative transfer and machine learning model. This would imply that the model somehow combines traditional radiative transfer techniques and machine learning, which is not the case. This misleading title should be corrected.

- While the introduction section is overall well written and comprehensive, it is missing a paragraph on existing machine learning approaches to radiative transfer, as this is a quite mature field with lots of active development.

- line 85: Please provide adequate proof or a reference for the claim that UVR fluxes cannot be properly predicted with cross-validation techniques.

- line 170: TUV is a radiative transfer model, not a database in itself. What you are probably referring to here is the TUV climatology that is available for download.

- line 173: Which UV irradiances were calculated here, surface, TOA, or something else?

- line 179: How can you vary the total ozone content, but keep the atmospheric structure (i.e. the ozone profile) constant?

- line 180ff: If scattering clouds were not considered in this study, then the application of DISORT with 8 streams makes very little sense. A pure emission solver would be sufficient to accurately account for the ozone absorption of the UV radiation. The only phenomenon that would necessitate a scattering radiative transfer solver is Rayleigh scattering, which is not the dominating factor here, compared to the ozone absorption.

- A description of the crucial ozone spectroscopy used in the radiative transfer model of this study is completely missing.

- line 181: How can large variability and spatio-temporal complexity for planet cloud cover mean less accurate RTM results?

- line 182: Does your model use the Cloud Modification Factor you mention to correct for clouds?

- A thorough description of the atmospheric profile input data for the TUV radiative transfer model is completely missing.

- Subsection 2.3, line 212: The dataset does not include a validation partition to check its generalization capability.

Minor Comments:

Some minor issues related to spelling and expressions should be corrected:

- line 44: Radiative Transfer Models. The word Transference is wrong in this context.

- line 47: , and TUV (Madronich and Flocke, 1997), (the and is missing)

- line 48: Despite showing a good balance between performance...

- line 49: computational cost (singular)

- line 49: under certain conditions / in certain situations (but not under certain situations)

- line 50: Please add the website for Quick TUV in this line to the references instead.

- line 55: allows researchers to predict / allows researchers the prediction of extreme situations

- line 59: See the first major comment. If the UVBoost model is a pure regression model, the description as a hybrid model is misleading. The approach does not combine a traditional radiative transfer method and a regression approach in one single model, instead the radiative transfer model only provides the training data for the final regression model.

- line 63: the CatBoost tool was a suitable powerful and fast algorithm

- line 76: assuming that all other variables X_k for k != j remain the same.

- line 82: Then, the next 1/k-th part of the data

- line 88: Please provide a reference for support vector machines.

- line 92: Please provide a justification, such as a concrete example, for the statement that SVMs have problems when graphically visualizing and theoretically interpreting results.

- line 95: Please provide a reference for decision tree models.

- line 109: Please provide a reference for the Gini impurity coefficient and the entropy coefficient.

- line 123: Please provide a reference for the clustering technique and the aggregating bootstrap technique.

- line 140: Please provide a reference for Boosting.

- line 171: Please put the TUV link in the references.

- line 172: radiative transfer equation. The expression "radiative transference" does not exist.

- line 172: I am assuming that you are talking about the two-stream method here, not two-flow method.

- line 172: Likewise, I am assuming that you are talking about DISORT with n streams, instead of n fluctuations?

- line 205: Angström exponent, not coefficient.

- line 222: Please define the acronym ANOVA.

- line 267: UVBoost as described in the manuscript is not a hybrid model. This statement is misleading.

Citation: https://doi.org/10.5194/egusphere-2022-465-RC2
- AC4:
  'Reply on RC2', Marcelo de Paula Correa, 31 Aug 2022
  Dear anonymous reviewer #2,
  I thank you for the careful review of my paper. You proposed a series of reviews, which were mostly fulfilled. There were some suggestions that were partially answered. However, an appropriate argument for not completing the request was provided by me. I am open to answer and try to solve any further questions that may arise with regards to this manuscript.
  
  Major Comments:
  
  I believe that the discussion of UV health effects and machine learning methods in the manuscript are sound and the author demonstrates expertise in both areas.
  
  However, the basis of this machine learning approach is a database from a radiative transfer model.
  
  In this area the author consistently uses wrong terminology and applies the wrong methods to the problem at hand, which suggests insufficient expertise in this field.
  
  As the computation of the regression data is the fundamental basis of the manuscript, I cannot recommend a publication.
  
  A further major and related issue is a completely inadequate description of the atmospheric input data for the radiative transfer model.
  
  Answer: Again, I would like to the thank for the clear review of my manuscript. I feel that these recommendations, once addressed, strengthen the manuscript. I hope the revised version is now suitable for publication and look forward to hearing from you in due course.
  Firstly, it is important to clarify that I used the terminology "hybrid model" for UVBoost with the best of intentions. After all, it is an input (SZA, TOC, AOD)/output (irradiances) code with a core based on an SML treatment (CatBoost). Despite its simplicity, it is fast, very efficient and, above all, it provides very accurate results.
  Therefore, I strongly disagree that the methods are wrong and that the study is a mere regression calculus. The article was carefully written and my objective was to contribute to an important scientific gap, such as the use of SML techniques in erythemally/vitamin D weighted UV Irradiance prediction..
  If the reviewer considers this work a simple application of third-party codes, I agree with the executive editor's recommendation to forward this article to the EGU journal ESSD (please check the interactive discussions).
  This is compounded by a variety of further issues:
  
  - The title of the manuscript mentions a hybrid radiative transfer and machine learning model. This would imply that the model somehow combines traditional radiative transfer techniques and machine learning, which is not the case. This misleading title should be corrected.
  
  Answer: Again, I used the terminology "hybrid model" for UVBoost with the best of intentions. According the Merrian-Webster Dictionary, the term 'hybrid' is defined as 'something heterogeneous in origin or composition'. The proposed model is composed by a combination of two heterogeneous methods: a traditional radiative transfer code and a SML regression technique.
  However, if this title gives a wrong impression, I suggest a new title such as "UVBoost (0.5): Erythemal and Vitamin D weighted UV radiation estimator based on a Machine Learning gradient boosting algorithm".
  
  - While the introduction section is overall well written and comprehensive, it is missing a paragraph on existing machine learning approaches to radiative transfer, as this is a quite mature field with lots of active development.
  
  Answer: The author thanks the reviewer for pointing out this deficiency that were corrected in this new version. Please check the new paragraphs included in Section 1 – Introduction:
  “ML methods have also been used for solar radiation prediction; and, this use had increased significantly in recent years. A review published about 5 years ago showed that the use of some ML techniques such as boosting, regression tree or random forest for solar radiation prediction were still rarely used (Voyant et al., 2017). Currently, ML techniques have been applied in different areas, such as solar panel power prediction (Zazoun, 2022), satellite radiation estimates (Cornejo-Bueno et al., 2019), and, as a complement to surface radiation measurements in different spectral bands (Feng et al., 2020; Narvaez, et al., 2020). After all, most weather stations cannot reliably provide global solar radiation observation data and finding an accurate way to predict this is very important.
  A recent and in-depth review study with more than 230 scientific publications in 20 years showed that: a) data quality control before model prediction is essential; b) methods that use training models are more accurate than methods based on statistical filters; c) novel and combined ML techniques tend to be future hot topics (Zhou et al., 2021). These elements are all addressed in this work. As there are still very few studies on UVR prediction using ML methods (e.g. Wu et al., 2022), this study offers a new contribution on the topic.”
  
  - line 85: Please provide adequate proof or a reference for the claim that UVR fluxes cannot be properly predicted with cross-validation techniques.
  
  Answer: I appreciate the reviewer's concern with this detail. But, in line 85, I refer to the cross-validation techniques applied to the MLR (However, even when using cross-validation techniques one cannot properly fit an MLR model to UVR predictions). Table 3 shows these cross-validation statistics. So, I think that this is an adequate proof.
  For clarity, I complemented the sentence with: "as we will see later in this article".
  
  - line 170: TUV is a radiative transfer model, not a database in itself. What you are probably referring to here is the TUV climatology that is available for download.
  
  Answer: I apologize for my lack of attention and I thank the reviewer for pointing out this error. The sentence was fixed: "The database for testing and training was built from calculations performed by the RTM TUV v5.3.2 (Madronich and Flocke, 1997; NCAR, 2022)." PS: I also put the TUV link in the references according a "minor revision" recommendation.
  
  - line 173: Which UV irradiances were calculated here, surface, TOA, or something else?
  
  Answer: I recognize that this was not properly explained in the text. UVBoost estimates downward UVR at surface. I clarified this information in the text.
  
  - line 179: How can you vary the total ozone content, but keep the atmospheric structure (i.e. the ozone profile) constant?
  
  Answer: I thank the Reviewer for pointing out this lack of information. The atmospheric profiles were changed according the geographic position. For this reason, I included the following information in the paragraph: "The TOC atmospheric vertical profile was adjusted by the geographic position according the AFGL Reference Model Atmospheric Profiles (Anderson et al., 1986; Gordon et al., 2022). The following vertical distributions were used: tropical atmosphere profile for the gridpoints between the equator and 30° latitude, mid-latitude profile between 30 and 60° latitude, and subarctic profile above 60° latitude."
  
  - line 180ff: If scattering clouds were not considered in this study, then the application of DISORT with 8 streams makes very little sense. A pure emission solver would be sufficient to accurately account for the ozone absorption of the UV radiation. The only phenomenon that would necessitate a scattering radiative transfer solver is Rayleigh scattering, which is not the dominating factor here, compared to the ozone absorption.
  
  Answer: The reviewer's question is pertinent, but I would like to clarify my option for more complex calculations. I used the DISORT application with 8-streams to be as accurate as possible in view of the presence of aerosols and molecular scattering. Results using simpler methods (e.g. 2-fluxes delta-Eddington, etc) may be reasonable for most no-aerosol clear-sky calculations. However, my goal was to build a robust and very accurate database for training.
  
  - A description of the crucial ozone spectroscopy used in the radiative transfer model of this study is completely missing.
  
  Answer: I disagree with the need to incorporate a discussion on the ozone spectroscopy. While I agree that the paper ozone spectroscopy is an interesting aspect in and of itself, my paper focuses on the SML method for the UVR estimation based on atmospheric parameters input data, such as TOC and AOD. The ozone spectroscopy used in the TUV has already been well discussed in the references of the code itself.
  Therefore, I believe that adding this dimension would not contribute to the scope of my paper. A focus on the ozone (or other gases) spectroscopy could be a goog topic for a follow-up study, e.g., on the use SML for infrared radiation inference.
  
  - line 181: How can large variability and spatio-temporal complexity for planet cloud cover mean less accurate RTM results?
  
  Answer: I thank the reviewer for his concern on this matter. It is well-known that clear-sky RTM calculations are more precise than cloudiness calculations. In general, the results of the clear-sky calculations from most RTMs are nearly identical (Aumann et al., 2018). Using cloudy observations in forecast models is difficult. Firstly, clouds show large spatio-temporal variability. Besides, cloud physics uncertainties, large variability of the vertical distribution of ice and liquid water in clouds, challenge of cloud structure representation, 3D effects, and cloud overlap assumptions. I included this comment and the reference in the paper.
  
  - line 182: Does your model use the Cloud Modification Factor you mention to correct for clouds?
  
  Answer: No, UVBoost estimates only clear-sky irradiances. Cloud Modification Factor (CMF) is defined by the ratio between the measured UV radiation in a cloudy sky and the simulated radiation under cloud-free conditions (Foyo-Moreno et al., 2001). In fact, the CMF is used to estimate, in an approximate way, the radiation attenuation caused by cloud cover. In this case, the effect of clouds is given by the product of clear sky irradiance, previously calculated by an MTR (or, in this case, by the UVBoost), and the cloud modification factor (CMF).
  CMF may be estimated by using clear-sky RTM simulations and ground-based observations. Or even, using look-up tables for different cloud types (e.g.: http://i115srv2.vu-wien.ac.at/UV/booklet/par_4.htm). In fact, UVBoost can even be used for this type of study.
  Foyo-Moreno, I., Alados, I., Olmo, F. et al. On the use of a cloud modification factor for solar UV (290–385 nm) spectral range. Theor Appl Climatol 68, 41–50 (2001). https://doi.org/10.1007/s007040170052.
  
  - A thorough description of the atmospheric profile input data for the TUV radiative transfer model is completely missing.
  
  Answer: It was fixed. Please check the answer for line 179.
  
  - Subsection 2.3, line 212: The dataset does not include a validation partition to check its generalization capability.
  
  Answer: Dear reviewer, SML datasets used in the UVBoost development, including training and test files, are available at thttps://doi.org/10.5281/zenodo.7027724.
  
  Minor Comments:
  
  Answer: I thank you for this careful review. You proposed a series of minor reviews, which were all fulfilled.
  
  Citation: https://doi.org/10.5194/egusphere-2022-465-AC4

Marcelo de Paula Corrêa

Supplement

https://doi.org/10.5194/egusphere-2022-465-supplement

Marcelo de Paula Corrêa

Viewed

Total article views: 1,215 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
756	399	60	1,215	133	37	77

HTML: 756
PDF: 399
XML: 60
Total: 1,215
Supplement: 133
BibTeX: 37
EndNote: 77

Views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	118	36	7	161
Aug 2022	81	41	11	133
Sep 2022	55	27	6	88
Oct 2022	11	9	1	21
Nov 2022	12	6	0	18
Dec 2022	15	10	0	25
Jan 2023	4	1	1	6
Feb 2023	19	18	0	37
Mar 2023	33	8	0	41
Apr 2023	11	4	0	15
May 2023	3	5	0	8
Jun 2023	15	5	2	22
Jul 2023	9	15	2	26
Aug 2023	3	6	0	9
Sep 2023	13	7	0	20
Oct 2023	20	7	0	27
Nov 2023	9	1	0	10
Dec 2023	11	2	0	13
Jan 2024	12	0	12
Feb 2024	12	6	0	18
Mar 2024	15	6	1	22
Apr 2024	13	2	1	16
May 2024	13	3	1	17
Jun 2024	34	1	1	36
Jul 2024	11	3	1	15
Aug 2024	7	5	2	14
Sep 2024	6	1	0	7
Oct 2024	3	6	1	10
Nov 2024	4	2	0	6
Dec 2024	2	6	0	8
Jan 2025	3	12	2	17
Feb 2025	7	5	2	14
Mar 2025	8	6	1	15
Apr 2025	8	6	0	14
May 2025	14	5	0	19
Jun 2025	6	10	0	16
Jul 2025	10	11	1	22
Aug 2025	6	2	0	8
Sep 2025	5	13	1	19
Oct 2025	11	22	0	33
Nov 2025	14	22	0	36
Dec 2025	15	11	4	30
Jan 2026	20	5	2	27
Feb 2026	23	5	5	33
Mar 2026	20	10	4	34
Apr 2026	12	5	0	17

Cumulative views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	118	36	7	161
Aug 2022	81	41	11	133
Sep 2022	55	27	6	88
Oct 2022	11	9	1	21
Nov 2022	12	6	0	18
Dec 2022	15	10	0	25
Jan 2023	4	1	1	6
Feb 2023	19	18	0	37
Mar 2023	33	8	0	41
Apr 2023	11	4	0	15
May 2023	3	5	0	8
Jun 2023	15	5	2	22
Jul 2023	9	15	2	26
Aug 2023	3	6	0	9
Sep 2023	13	7	0	20
Oct 2023	20	7	0	27
Nov 2023	9	1	0	10
Dec 2023	11	2	0	13
Jan 2024	12	0	12
Feb 2024	12	6	0	18
Mar 2024	15	6	1	22
Apr 2024	13	2	1	16
May 2024	13	3	1	17
Jun 2024	34	1	1	36
Jul 2024	11	3	1	15
Aug 2024	7	5	2	14
Sep 2024	6	1	0	7
Oct 2024	3	6	1	10
Nov 2024	4	2	0	6
Dec 2024	2	6	0	8
Jan 2025	3	12	2	17
Feb 2025	7	5	2	14
Mar 2025	8	6	1	15
Apr 2025	8	6	0	14
May 2025	14	5	0	19
Jun 2025	6	10	0	16
Jul 2025	10	11	1	22
Aug 2025	6	2	0	8
Sep 2025	5	13	1	19
Oct 2025	11	22	0	33
Nov 2025	14	22	0	36
Dec 2025	15	11	4	30
Jan 2026	20	5	2	27
Feb 2026	23	5	5	33
Mar 2026	20	10	4	34
Apr 2026	12	5	0	17

Viewed (geographical distribution)

Total article views: 1,145 (including HTML, PDF, and XML) Thereof 1,145 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Short summary

UVBoost is an UV radiative transfer estimator based on a machine learning regression tool powered by high precision database. The model have increased computational speed in three orders of magnitude, without sacrificing result accuracy. It is a user-friendly code, which can be used by laymen or researchers in other areas. UVBoost can be used to disseminate UV index data online anywhere in different spatiotemporal scales, or for climatological projection studies on a global scale.


Total:	0
HTML:	0
PDF:	0
XML:	0