the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
NorSand4AI: A Comprehensive Triaxial Test Simulation Database for NorSand Constitutive Model Materials
Abstract. To learn, humans observe and experience the world, collect data, and establish patterns through repetition. In scientific discovery, these patterns and relationships are expressed as laws and equations, data as properties and variables, and observations as events. Datadriven techniques aim to provide an impartial approach to learning using raw data from actual or simulated observations. In soil science, parametric models known as constitutive models are used to represent the behavior of natural and artificial materials. Creating datadriven constitutive models using deep learning techniques requires large and consistent datasets, which are challenging to acquire through experiments. Synthetic data can be generated using a theoretical function, but there is a lack of literature on highvolume and robust datasets of this kind. Digital soil models can be utilized to conduct numerical simulations that produce synthetic results of triaxial tests, which are regarded as the preferred tests for assessing soil's constitutive behavior. Due to its limitations for modeling real sands, the Modified Cam Clay model has been replaced by the NorSand model in some situations where sandlike materials need to be modelled. Therefore, for a material following the NorSand model, the present paper presents a firstofitskind database that addresses the size and complexity issues of creating synthetic datasets for nonlinear constitutive modeling of soils by simulating both drained and undrained triaxial tests of 2000 soil types, each subjected to 40 initial test configurations, resulting in a total of 160000 triaxial test results. Each simulation dataset comprises a 4000 × 10 matrix that can be used for general multivariate forecasting benchmarks, in addition to direct geotechnical and soil science applications.

Notice on discussion status
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint
(440 KB)

The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(440 KB)  Metadata XML
 BibTeX
 EndNote
 Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed

RC1: 'Comment on egusphere20231690', Anonymous Referee #1, 26 Oct 2023
Dear authors,
This work aims to provide a useful synthetic dataset in assessing the liquefaction potential of soils for machine learning and deep learning tasks. I consider that more validation and scrutiny are required for such essential groundwork.
Table 1 shows the sampling range of the NorSand model's input. However, justifications for such ranges are not mentioned in the manuscript. In the conclusion section, the authors noted that the Norsand model is used to assess the liquefaction potential of soils. The utility of this work should be highlighted throughout the article (not just in the conclusion session) to identify the target audience better and improve readership. Then, the ranges of the parameters can be justified.
The output of the NorSand model spreadsheet can be technically sound. However, the authors should test the sensitivity of the sampled dataset. For example, why would such sampling be the best dataset to represent the NorSand model? Can one represent the model better with fewer samples, or more samples are required? Without showing a particular use (e.g., surrogate modeling, machine learning) or arguing the representativeness of the dataset, it is difficult to evaluate the value of such a dataset.
Since there are too many possible ways to improve the manuscript, I leave the authors to decide which aspects they would like to work on. I do not recommend the manuscript for publication at this stage.
Citation: https://doi.org/10.5194/egusphere20231690RC1 
AC1: 'Reply on RC1', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023
Dear reviewer,
About Table 1, indeed the original preprint lacked some clarifications on why the ranges adopted were of interest. The ranges adopted come from literature results on the behavior of real granular materials. An initial version of such ranges was first presented by Jefferies and Shuttle (2002) and has been updated ever since. The ranges presented in the paper are based on the latest compilation available, thus Table 1 reflects the information presented in the book by Jefferies and Been (2015). As mentioned, those authors have collected several triaxial tests carried out on a diverse set of granular soils, which eventually led to the creation of Table 1. In the updated version of the manuscript, we will include such information, describing why the values presented are of interest.
Regarding the application to liquefaction modelling, indeed we had not presented this aspect throughout the article. In the updated version we will insert a few paragraphs highlighting how the NorSand is used in that context to make the reader aware of the benefits of our approach.
About the sensitivity of the sampled dataset, we are glad the reviewer pointed that out. At first, we performed a number of empirical simulations to check how big would the dataset be in order to represent the true behavior of the NorSand model. For simplicity, we ended up no including these studies in the manuscript.
On the other hand, after reading the comments, we had to devise a proper methodology (and not present just a series of empirical tests) to demonstrate in a robust and reproducible way that the dataset presented suffices to represent the NorSand model. This way, a completely new methodology has been proposed and applied to assess the quality of the sample size.
As suggested by the reviewer, the best way to show that the sample size is sufficient is to study how a model calibrated (or trained) on such dataset performs. So, we chose the most direct (and actually most important) learning task one could face while working with the dataset generated: backcalculation of the constitutive parameters of the model based solely on the triaxial test results. In short, from the triaxial tests we will learn the values of the parameters which govern the behavior of the material.
This way, it is possible to recall that a total of 14 parameters (10 constitutive and 4 related to test conditions) are used to generate the triaxial test results (4000 × 10 array where 4000 denotes the number of time steps of the loading process and 10 is the number of quantities monitored during the test). Let cp_i be the ith vector which contains the constitutive parameters (1x14) and let ttu_i and ttd_i be the results of the triaxial test under undrained and drained conditions, respectively (4000x10 arrays, each).
We will consider the following learning problem: From a sample TS_{n,m}, which considers n different types of soil and m different test configuration (therefore of total size nm), we will use the ttu_i (or ttd_i), for i = 1, …, nm, to learn the vectors of parameters cp_i, for i = 1, …, nm. We wish to show that n=2000 and m=40 suffices to produce accurate results.
In order to do so, following standard learning tasks in a Machine Learning context, we need training, validation and testing data. It is worth noticing that our methodology needs to be robust, so we really need the validation dataset because we will be performing hyperparameter tuning on the models chosen.
The dataset presented in the preprint considered n=2000 and m=40. It was generated by a Latin Hypercube Sampling (LHS) algorithm, which is known to provide lowdiscrepancy sequences of values (i.e., the samples are spread in the domain of the sampled variables). Despite being a really powerful technique, LHS does not have a really interesting property: sequences obtained by LHS are not extensible. To put it simply, being extensible means that a sample of size j contains the values of the sample of size k, j>k. This way, it would not be possible to subsample from our original sample in order to build smaller datasets without loosing the spacefilling capability of the dataset. This way, we needed to consider another sampling scheme to perform our investigation.
We chose to combine two quasiMonte Carlo low discrepancy sequency generation techniques (Sobol and Halton), which are also extensible, to perform our tests. In that case, we generated a whole new dataset with n=2048 and m=42 using Sobol sampling for the constitutive parameters (10 parameters) and Halton sampling for the experimental test condition variables (4 variables). By using these parameters, we ran the NorSand triaxial test simulation and obtained the corresponding triaxial test results for both drained and undrained cases. Let us call this new dataset and qTS_{2048,42}.
By using the extensibility property of the sequences considered, 49 subsamples were built: qTS{n,m} for n in [32,64,128,256,512,1024,2048] and m in [6,16,18,24,30,36,42]. It is worth noticing that none of the entries of TS_{2000,40} were in qTS_{2048,42}, which indicates that using qTS_{n,m} for training and validation and TS_{2000,40} does not allow for any data “leakage”. Besides, there is a clear benefit in using TS_{2000,40} as a test set: all the models will be tested on the same dataset.
For the learning task considered, we used the sklearn Python package and chose 4 algorithms: Ridge Regressor, KNeighbors Regressor and two variants of the Ridge Regressor which incorporate nonlinear mappings of the input and output values. The first two algorithms mentioned belong to two different classes: linear and neighborsbased regressors. They were chosen to illustrate how different types of algorithms learn our chosen task. The variants of the Ridge Regressor were chosen to account for nonlinearities by using the kernel trick. Considering the high dimensionality of the input datasets, using traditional kernels is not computationally feasible, so we used Nystroem kernels, which approximate a kernel map using a subset of the training data. By combining Nystroem kernels and Ridge Regressors, we can map the inputs to a nonlinear feature space and then consider a linear regression on these features. This is a similar approach as the one considered to build Support Vector Machine Regressors, but with a slightly different regularization for the decision boundary. We also considered mapping the output values (14 constitutive parameters, in our case) to the [0,1] range by using a TransformedTargetRegressor combined with a QuantileTransformer, which transforms the components to follow a uniform distribution. Therefore, for a given component, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers.
The 4 algorithms considered were Ridge, KNeighbors, RidgeK (with nonlinear kernel on inputs) and RidgeKT (with nonlinear kernel on inputs and also QuantileTransformer on the outputs). For all the algorithms considered, we also used a QuantileTransformer to preprocess the input values (prior to nonlinear kernels, when applicable).
This way, we considered the following sequence:
For n in [32,64,128,256,512,1024,2048]:
For m in [6,16,18,24,30,36,42]:
 From the dataset qTS_{n,m}, select only the columns corresponding to p, q and e, which are the variables commonly measured and reported in triaxial tests. The other 7 columns are manipulations of these three. This reduced dataset is of shape 4000x3. We then downsample the 4000 timesteps to 40, by using a logarithmic range of values (more values in the beginning of the time steps, where more changes are observed). This reduces the input dataset qTS_{n,m} from 4000x10 to 40x3, leading to the creation of dataset qTSN_{n,m}.
 Train and validate an algorithm A using qTSN_{n,m};
 Test the trained algorithm A_t on TS_{2000,40}, after performing the same subsampling described (from 4000x10 to 40x3);
 Obtain the mean absolute percentage error in the predictions of all the 14 input parameters corresponding to TS_{2000,40};
 Get the overall mean error, corresponding to all the input parameters.
In the first step, which accounts for training and validation, we considered a GroupKFold cross validation technique, which is a Kfold iterator variant with nonoverlapping groups. This approach makes sure no material (group) is present both in train and validation set, which would lead to data leakage.
A Bayesian optimization was performed to look for the best hyperparameters using the crossvalidation folds generated, making the algorithm A the “best” possible algorithm with respect to hyperparameter tuning.
Finally, after the best hyperparameters are found, they are fixed and the algorithm A is retrained with the full dataset qTSN_{n,m}. This calibrated version is then used to test the quality of the model on the dataset TS_{2000,40}.
By plotting the result of the 49 models (each trained and validated with samples of different sizes) we got the figures attached as FigMAPE.zip. It is clear in the figures that, for either drained or undrained conditions, for contours of 0.5% gains in MAPE, that the sample size of 2000x40 is actually more than enough for the learning task considered. This can be stated by noticing that the contours with lower error encompass samples with an exponential range of sizes (the xaxis is in log scale). This indicates a really small error gradient in the n x m space, implying a good sample size. This happens for all 4 algorithms, indicating that not only linear and neighborsbased regressors have reached their maximum ability to learn, but also the nonlinear variants considered. It can be seen that the two nonlinear transformations applied (to inputs and to both inputs and outputs) present a similar behavior, although with considerably smaller MAPEs.
We sincerely thank the reviewer for pointing out such interesting and important issues. Now, we added 49 extra datasets to the paper (which will be hosted in Zenodo as well), making a total of 50 datasets which can be used for several learning tasks. We hope the paper is now suitable for publication.
References:
Jefferies, M. G. and Shuttle, D. A. (2002) Dilatancy in general Cambridgetype models. Géotechnique 52(9), 625638
Jefferies, M. and Been, K. (2015) Soil Liquefaction: A Critical State Approach, Second Edition, CRC Press, 2 edn., https://doi.org/10.1201/b19114.
2015.

AC1: 'Reply on RC1', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023

RC2: 'Comment on egusphere20231690', Anonymous Referee #2, 27 Oct 2023
General comments
This paper tried to establish a comprehensive triaxial test simulation database for soil science. It is attractive to develop this kind of model, but I have some questions about this approach. Especially, what is the advantage improved from the previous approach needs to be clearly introduced and explained in this study. In addition, for better presentation quality, I strongly suggest reorganizing the manuscript because the current manuscript contains so many paragraphs. Please address the following questions.
Specific comments
 Many paragraphs throughout the manuscript: Repeatedly, there are lots of paragraphs, and some paragraphs seem to be merged. Please reorganize for better readability.
 L2634: These two paragraphs started with “Montans et al. (2019) emphasize…” and ended with “(Montans et al., 2019)”. It is unclear what is authors’ statements and what is referred statements. These two paragraphs may be merged.
 L57: From here, the introduction is suddenly changed to soil science. To fill the gap in the general introduction, some background information for soil science will be needed.
 L57: In addition to the above comment, it is hard to follow these previous studies. One idea is to prepare a brief summary for a clear introduction. Please reconsider.
 Table 1: A brief description of these parameters will be helpful for readers. The abbreviation of “OCR” should be explicitly defined within this manuscript. This might be covered within the previous studies, but why the sampling range can be set as listed in Table 1? Even though NorSandTXL has been already described in previous studies, a kinder introduction for Table 1 is required because this manuscript itself should be standalone.
 L156159 and L170171: It is still unclear why this Python coding is needed and excel spreadsheet is not acceptable. This point seems to be argued in L102107, but for practical use, how about calculating time by spreadsheet and Python, or how about the operational advantages?
 L160 (Section 5): I was impressed that this section seems to be moved to the Appendix part.
Technical comments
 L51: Use “DNN”.
 L89: No need to repeat “(Jefferies, 1993)” here.
 L105: Please insert after Table 1, and there maybe no need to change the paragraph here.
 Tables 1, 2, and 3: The caption should be placed at the top of the table.
Citation: https://doi.org/10.5194/egusphere20231690RC2 
AC2: 'Reply on RC2', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023
Dear reviewer,
We recognize the structure of the paper needs enhancement. In the updated version we will certainly incorporate the suggestions presented. About the paper, overall, there are two main advantages of using the results and datasets of our paper. The first one is that there are no known implementations of the NorSand model in Python. So, we built a bridge which connects a wellknown VBA implementation to the Python environment. This allows other researchers to consider our code as a step in their Pipelines, allowing them to use the full power of Python packages (such as sklearn, TensorFlow, Pytorch etc) during their analyses. The second advantage is that, each evaluation of the NorSand Model in the VBA code takes some time and effort to be completed. So, by providing massive simulation results, we save a considerable number of hours (even days) from other researchers which need such datasets.
Specific comments
 We will take these comments about paragraphs into account, for sure. The structure of the paper can benefit from such reorganization.
 Indeed, the paragraphs with “Montans et al. (2019)” can be merged and the reference can be better placed.
 This issue about soil science background has been also pointed out by the other reviewer. Indeed, the paper lacked a more comprehensive review on soil sciences. We will add such information in the updated version.
 We will structure the introduction to account for summaries on previous studies, following this nice suggestion by the reviewer.
 About Table 1, indeed the original preprint lacked some clarifications on why the ranges adopted were of interest and also what each parameter means. For example, OCR stands for over consolidation ratio. The ranges adopted come from literature results on the behavior of real granular materials. An initial version of such ranges was first presented by Jefferies and Shuttle (2002) and has been updated ever since. The ranges presented in the paper are based on the latest compilation available, thus Table 1 reflects the information presented in the book by Jefferies and Been (2015). As mentioned, those authors have collected several triaxial tests carried out on a diverse set of granular soils, which eventually led to the creation of Table 1. In the updated version of the manuscript, we will include such information, describing why the values presented are of interest as well as what each parameter controls in the soil’s behavior.
 About the Python coding, it is not that the excel spreadsheet is not acceptable. A first thing to notice is that there are no known implementations of the NorSand model in Python. So, we built a bridge which connects a wellknown VBA implementation to the Python environment. In the end, we still rely on the VBA code as the “processing kernel” of our Python implementation. This new Python code allows, on the other hand, other researchers to use the full power of Python packages (such as sklearn, TensorFlow, Pytorch etc) during their analyses involving NorSand. The second advantage is that, each evaluation of the NorSand Model in the VBA code takes some time and effort to be completed (setting parameters, choosing simulation tipe, running the VBA macros and collecting results). So, by providing massive simulation results, we save a considerable number of hours (even days) from other researchers which need such datasets.
 We chose to move the coding part to the Appendix because we wanted to focus on the datasets in the main “body” of the manuscript. But we will get the codes back from the appendix and insert them in Section 5, as suggested. Besides, we will include additional codes in the manuscript and create a github repo to make their sharing easier. We will incorporate a new simplified code which simply outputs the simulation values instead of directly saving them to a .h5 file. This will make the incorporation of the code into existing Pipelines easier.
Technical comments
We will incorporate all the issues above in the final version of the manuscript, which will be submitted after the discussion period is over. We sincerely thank the reviewer for such careful analysis on the paper, especially the code parts.
Citation: https://doi.org/10.5194/egusphere20231690AC2
Interactive discussion
Status: closed

RC1: 'Comment on egusphere20231690', Anonymous Referee #1, 26 Oct 2023
Dear authors,
This work aims to provide a useful synthetic dataset in assessing the liquefaction potential of soils for machine learning and deep learning tasks. I consider that more validation and scrutiny are required for such essential groundwork.
Table 1 shows the sampling range of the NorSand model's input. However, justifications for such ranges are not mentioned in the manuscript. In the conclusion section, the authors noted that the Norsand model is used to assess the liquefaction potential of soils. The utility of this work should be highlighted throughout the article (not just in the conclusion session) to identify the target audience better and improve readership. Then, the ranges of the parameters can be justified.
The output of the NorSand model spreadsheet can be technically sound. However, the authors should test the sensitivity of the sampled dataset. For example, why would such sampling be the best dataset to represent the NorSand model? Can one represent the model better with fewer samples, or more samples are required? Without showing a particular use (e.g., surrogate modeling, machine learning) or arguing the representativeness of the dataset, it is difficult to evaluate the value of such a dataset.
Since there are too many possible ways to improve the manuscript, I leave the authors to decide which aspects they would like to work on. I do not recommend the manuscript for publication at this stage.
Citation: https://doi.org/10.5194/egusphere20231690RC1 
AC1: 'Reply on RC1', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023
Dear reviewer,
About Table 1, indeed the original preprint lacked some clarifications on why the ranges adopted were of interest. The ranges adopted come from literature results on the behavior of real granular materials. An initial version of such ranges was first presented by Jefferies and Shuttle (2002) and has been updated ever since. The ranges presented in the paper are based on the latest compilation available, thus Table 1 reflects the information presented in the book by Jefferies and Been (2015). As mentioned, those authors have collected several triaxial tests carried out on a diverse set of granular soils, which eventually led to the creation of Table 1. In the updated version of the manuscript, we will include such information, describing why the values presented are of interest.
Regarding the application to liquefaction modelling, indeed we had not presented this aspect throughout the article. In the updated version we will insert a few paragraphs highlighting how the NorSand is used in that context to make the reader aware of the benefits of our approach.
About the sensitivity of the sampled dataset, we are glad the reviewer pointed that out. At first, we performed a number of empirical simulations to check how big would the dataset be in order to represent the true behavior of the NorSand model. For simplicity, we ended up no including these studies in the manuscript.
On the other hand, after reading the comments, we had to devise a proper methodology (and not present just a series of empirical tests) to demonstrate in a robust and reproducible way that the dataset presented suffices to represent the NorSand model. This way, a completely new methodology has been proposed and applied to assess the quality of the sample size.
As suggested by the reviewer, the best way to show that the sample size is sufficient is to study how a model calibrated (or trained) on such dataset performs. So, we chose the most direct (and actually most important) learning task one could face while working with the dataset generated: backcalculation of the constitutive parameters of the model based solely on the triaxial test results. In short, from the triaxial tests we will learn the values of the parameters which govern the behavior of the material.
This way, it is possible to recall that a total of 14 parameters (10 constitutive and 4 related to test conditions) are used to generate the triaxial test results (4000 × 10 array where 4000 denotes the number of time steps of the loading process and 10 is the number of quantities monitored during the test). Let cp_i be the ith vector which contains the constitutive parameters (1x14) and let ttu_i and ttd_i be the results of the triaxial test under undrained and drained conditions, respectively (4000x10 arrays, each).
We will consider the following learning problem: From a sample TS_{n,m}, which considers n different types of soil and m different test configuration (therefore of total size nm), we will use the ttu_i (or ttd_i), for i = 1, …, nm, to learn the vectors of parameters cp_i, for i = 1, …, nm. We wish to show that n=2000 and m=40 suffices to produce accurate results.
In order to do so, following standard learning tasks in a Machine Learning context, we need training, validation and testing data. It is worth noticing that our methodology needs to be robust, so we really need the validation dataset because we will be performing hyperparameter tuning on the models chosen.
The dataset presented in the preprint considered n=2000 and m=40. It was generated by a Latin Hypercube Sampling (LHS) algorithm, which is known to provide lowdiscrepancy sequences of values (i.e., the samples are spread in the domain of the sampled variables). Despite being a really powerful technique, LHS does not have a really interesting property: sequences obtained by LHS are not extensible. To put it simply, being extensible means that a sample of size j contains the values of the sample of size k, j>k. This way, it would not be possible to subsample from our original sample in order to build smaller datasets without loosing the spacefilling capability of the dataset. This way, we needed to consider another sampling scheme to perform our investigation.
We chose to combine two quasiMonte Carlo low discrepancy sequency generation techniques (Sobol and Halton), which are also extensible, to perform our tests. In that case, we generated a whole new dataset with n=2048 and m=42 using Sobol sampling for the constitutive parameters (10 parameters) and Halton sampling for the experimental test condition variables (4 variables). By using these parameters, we ran the NorSand triaxial test simulation and obtained the corresponding triaxial test results for both drained and undrained cases. Let us call this new dataset and qTS_{2048,42}.
By using the extensibility property of the sequences considered, 49 subsamples were built: qTS{n,m} for n in [32,64,128,256,512,1024,2048] and m in [6,16,18,24,30,36,42]. It is worth noticing that none of the entries of TS_{2000,40} were in qTS_{2048,42}, which indicates that using qTS_{n,m} for training and validation and TS_{2000,40} does not allow for any data “leakage”. Besides, there is a clear benefit in using TS_{2000,40} as a test set: all the models will be tested on the same dataset.
For the learning task considered, we used the sklearn Python package and chose 4 algorithms: Ridge Regressor, KNeighbors Regressor and two variants of the Ridge Regressor which incorporate nonlinear mappings of the input and output values. The first two algorithms mentioned belong to two different classes: linear and neighborsbased regressors. They were chosen to illustrate how different types of algorithms learn our chosen task. The variants of the Ridge Regressor were chosen to account for nonlinearities by using the kernel trick. Considering the high dimensionality of the input datasets, using traditional kernels is not computationally feasible, so we used Nystroem kernels, which approximate a kernel map using a subset of the training data. By combining Nystroem kernels and Ridge Regressors, we can map the inputs to a nonlinear feature space and then consider a linear regression on these features. This is a similar approach as the one considered to build Support Vector Machine Regressors, but with a slightly different regularization for the decision boundary. We also considered mapping the output values (14 constitutive parameters, in our case) to the [0,1] range by using a TransformedTargetRegressor combined with a QuantileTransformer, which transforms the components to follow a uniform distribution. Therefore, for a given component, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers.
The 4 algorithms considered were Ridge, KNeighbors, RidgeK (with nonlinear kernel on inputs) and RidgeKT (with nonlinear kernel on inputs and also QuantileTransformer on the outputs). For all the algorithms considered, we also used a QuantileTransformer to preprocess the input values (prior to nonlinear kernels, when applicable).
This way, we considered the following sequence:
For n in [32,64,128,256,512,1024,2048]:
For m in [6,16,18,24,30,36,42]:
 From the dataset qTS_{n,m}, select only the columns corresponding to p, q and e, which are the variables commonly measured and reported in triaxial tests. The other 7 columns are manipulations of these three. This reduced dataset is of shape 4000x3. We then downsample the 4000 timesteps to 40, by using a logarithmic range of values (more values in the beginning of the time steps, where more changes are observed). This reduces the input dataset qTS_{n,m} from 4000x10 to 40x3, leading to the creation of dataset qTSN_{n,m}.
 Train and validate an algorithm A using qTSN_{n,m};
 Test the trained algorithm A_t on TS_{2000,40}, after performing the same subsampling described (from 4000x10 to 40x3);
 Obtain the mean absolute percentage error in the predictions of all the 14 input parameters corresponding to TS_{2000,40};
 Get the overall mean error, corresponding to all the input parameters.
In the first step, which accounts for training and validation, we considered a GroupKFold cross validation technique, which is a Kfold iterator variant with nonoverlapping groups. This approach makes sure no material (group) is present both in train and validation set, which would lead to data leakage.
A Bayesian optimization was performed to look for the best hyperparameters using the crossvalidation folds generated, making the algorithm A the “best” possible algorithm with respect to hyperparameter tuning.
Finally, after the best hyperparameters are found, they are fixed and the algorithm A is retrained with the full dataset qTSN_{n,m}. This calibrated version is then used to test the quality of the model on the dataset TS_{2000,40}.
By plotting the result of the 49 models (each trained and validated with samples of different sizes) we got the figures attached as FigMAPE.zip. It is clear in the figures that, for either drained or undrained conditions, for contours of 0.5% gains in MAPE, that the sample size of 2000x40 is actually more than enough for the learning task considered. This can be stated by noticing that the contours with lower error encompass samples with an exponential range of sizes (the xaxis is in log scale). This indicates a really small error gradient in the n x m space, implying a good sample size. This happens for all 4 algorithms, indicating that not only linear and neighborsbased regressors have reached their maximum ability to learn, but also the nonlinear variants considered. It can be seen that the two nonlinear transformations applied (to inputs and to both inputs and outputs) present a similar behavior, although with considerably smaller MAPEs.
We sincerely thank the reviewer for pointing out such interesting and important issues. Now, we added 49 extra datasets to the paper (which will be hosted in Zenodo as well), making a total of 50 datasets which can be used for several learning tasks. We hope the paper is now suitable for publication.
References:
Jefferies, M. G. and Shuttle, D. A. (2002) Dilatancy in general Cambridgetype models. Géotechnique 52(9), 625638
Jefferies, M. and Been, K. (2015) Soil Liquefaction: A Critical State Approach, Second Edition, CRC Press, 2 edn., https://doi.org/10.1201/b19114.
2015.

AC1: 'Reply on RC1', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023

RC2: 'Comment on egusphere20231690', Anonymous Referee #2, 27 Oct 2023
General comments
This paper tried to establish a comprehensive triaxial test simulation database for soil science. It is attractive to develop this kind of model, but I have some questions about this approach. Especially, what is the advantage improved from the previous approach needs to be clearly introduced and explained in this study. In addition, for better presentation quality, I strongly suggest reorganizing the manuscript because the current manuscript contains so many paragraphs. Please address the following questions.
Specific comments
 Many paragraphs throughout the manuscript: Repeatedly, there are lots of paragraphs, and some paragraphs seem to be merged. Please reorganize for better readability.
 L2634: These two paragraphs started with “Montans et al. (2019) emphasize…” and ended with “(Montans et al., 2019)”. It is unclear what is authors’ statements and what is referred statements. These two paragraphs may be merged.
 L57: From here, the introduction is suddenly changed to soil science. To fill the gap in the general introduction, some background information for soil science will be needed.
 L57: In addition to the above comment, it is hard to follow these previous studies. One idea is to prepare a brief summary for a clear introduction. Please reconsider.
 Table 1: A brief description of these parameters will be helpful for readers. The abbreviation of “OCR” should be explicitly defined within this manuscript. This might be covered within the previous studies, but why the sampling range can be set as listed in Table 1? Even though NorSandTXL has been already described in previous studies, a kinder introduction for Table 1 is required because this manuscript itself should be standalone.
 L156159 and L170171: It is still unclear why this Python coding is needed and excel spreadsheet is not acceptable. This point seems to be argued in L102107, but for practical use, how about calculating time by spreadsheet and Python, or how about the operational advantages?
 L160 (Section 5): I was impressed that this section seems to be moved to the Appendix part.
Technical comments
 L51: Use “DNN”.
 L89: No need to repeat “(Jefferies, 1993)” here.
 L105: Please insert after Table 1, and there maybe no need to change the paragraph here.
 Tables 1, 2, and 3: The caption should be placed at the top of the table.
Citation: https://doi.org/10.5194/egusphere20231690RC2 
AC2: 'Reply on RC2', Luan Carlos de Sena Monteiro Ozelim, 11 Nov 2023
Dear reviewer,
We recognize the structure of the paper needs enhancement. In the updated version we will certainly incorporate the suggestions presented. About the paper, overall, there are two main advantages of using the results and datasets of our paper. The first one is that there are no known implementations of the NorSand model in Python. So, we built a bridge which connects a wellknown VBA implementation to the Python environment. This allows other researchers to consider our code as a step in their Pipelines, allowing them to use the full power of Python packages (such as sklearn, TensorFlow, Pytorch etc) during their analyses. The second advantage is that, each evaluation of the NorSand Model in the VBA code takes some time and effort to be completed. So, by providing massive simulation results, we save a considerable number of hours (even days) from other researchers which need such datasets.
Specific comments
 We will take these comments about paragraphs into account, for sure. The structure of the paper can benefit from such reorganization.
 Indeed, the paragraphs with “Montans et al. (2019)” can be merged and the reference can be better placed.
 This issue about soil science background has been also pointed out by the other reviewer. Indeed, the paper lacked a more comprehensive review on soil sciences. We will add such information in the updated version.
 We will structure the introduction to account for summaries on previous studies, following this nice suggestion by the reviewer.
 About Table 1, indeed the original preprint lacked some clarifications on why the ranges adopted were of interest and also what each parameter means. For example, OCR stands for over consolidation ratio. The ranges adopted come from literature results on the behavior of real granular materials. An initial version of such ranges was first presented by Jefferies and Shuttle (2002) and has been updated ever since. The ranges presented in the paper are based on the latest compilation available, thus Table 1 reflects the information presented in the book by Jefferies and Been (2015). As mentioned, those authors have collected several triaxial tests carried out on a diverse set of granular soils, which eventually led to the creation of Table 1. In the updated version of the manuscript, we will include such information, describing why the values presented are of interest as well as what each parameter controls in the soil’s behavior.
 About the Python coding, it is not that the excel spreadsheet is not acceptable. A first thing to notice is that there are no known implementations of the NorSand model in Python. So, we built a bridge which connects a wellknown VBA implementation to the Python environment. In the end, we still rely on the VBA code as the “processing kernel” of our Python implementation. This new Python code allows, on the other hand, other researchers to use the full power of Python packages (such as sklearn, TensorFlow, Pytorch etc) during their analyses involving NorSand. The second advantage is that, each evaluation of the NorSand Model in the VBA code takes some time and effort to be completed (setting parameters, choosing simulation tipe, running the VBA macros and collecting results). So, by providing massive simulation results, we save a considerable number of hours (even days) from other researchers which need such datasets.
 We chose to move the coding part to the Appendix because we wanted to focus on the datasets in the main “body” of the manuscript. But we will get the codes back from the appendix and insert them in Section 5, as suggested. Besides, we will include additional codes in the manuscript and create a github repo to make their sharing easier. We will incorporate a new simplified code which simply outputs the simulation values instead of directly saving them to a .h5 file. This will make the incorporation of the code into existing Pipelines easier.
Technical comments
We will incorporate all the issues above in the final version of the manuscript, which will be submitted after the discussion period is over. We sincerely thank the reviewer for such careful analysis on the paper, especially the code parts.
Citation: https://doi.org/10.5194/egusphere20231690AC2
Peer review completion
Journal article(s) based on this preprint
Data sets
Dataset  NorSand4AI: A Comprehensive Triaxial Test Simulation Database for NorSand Constitutive Model Materials Luan Carlos de Sena Monteiro Ozelim, Michéle Dal Toé Casagrande, and André Luís Brasil Cavalcante https://doi.org/10.5281/zenodo.8170537
Viewed
HTML  XML  Total  BibTeX  EndNote  

345  152  27  524  12  11 
 HTML: 345
 PDF: 152
 XML: 27
 Total: 524
 BibTeX: 12
 EndNote: 11
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1
Luan Carlos de Sena Monteiro Ozelim
Michéle Dal Toé Casagrande
André Luís Brasil Cavalcante
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(440 KB)  Metadata XML