Optimal Classification of Minerals by Microscopic Image Analysis Based on Seven-State “Deep Learning” Combined with Optimizers ()
1. Introduction
The cooling and crystallization of molten rock, often called magma, is the origin of igneous rocks [1] [2]. The planet’s primarily molten origins can use magma as marking the start of the rock cycle and the origins of igneous rocks are documented in their composition [3]. Analyzing, examining and interpreting the information that its rocks carry will allow us to deduce the processes occurring in the Earth. The same information will help to understand volcanic activities on the earth’s surface, as well as their socio-economic importance in mineral and/or water resources [4]. However, it would be necessary to identify the interior of these rocks. Indeed, they are made up of minerals. These minerals each have a specific crystal structure produced naturally in pure and ordered form [5] [6]. Each structure defines the texture of a rock as a whole. Physically identifying igneous rocks by texture, grain size, colors, defects and patterns is a difficult process. This task requires the use of an informed geologist with a background in rocks and also constituent minerals. The precise characterization of minerals present in rocks constitutes a fundamental step in the in-depth understanding of geological processes (identification, geological prospecting, materials science and engineering) and conditions of formation of the earth’s crust [7] [8]. The act of identifying and classifying minerals and rocks can be arduous and time-consuming for people from other fields. To overcome this difficulty, geology uses the optical properties of these minerals. Said properties are observed when the image of a thin section of rock is visualized under a polarizing microscope [9] [10]. [11] used these properties to automate the identification and classification procedure of minerals by efficiently integrating color variations under plane polarization (PPL) and cross polarization (XPL) illumination modes into the CieLab space. This technique has the advantage of not ignoring the pixel path of mineral images rotated in this space. In recent decades, the use of machine learning methods from microscopic images boosts the limits of old practices for identifying, naming and classifying rocks [12] [13] [14]. This new approach makes the classification of rock types a booming subject in geosciences. It uses computer science and pattern recognition techniques [15]. The most characteristic visual properties for identifying minerals in rocks concern color and then texture. Since computer technologies provided consistent definitions of colors, colors have been used for several identification and classification purposes [16]. In the case of minerals, color, expression of birefringence, is of capital importance during the formation of their images. The work of [17] indicates that minerals can be identified with high accuracy if their colors and birefringence colors are not changed depending on slide thickness or lighting conditions. The authors used an artificial neural network (ANN) for mineral recognition (quartz, muscovite, biotite, chlorite and opaque minerals) via image processing. Thus, computers perform operations such as recognition [8] [18] and measurement of target objects on images using color and/or pixels. These operations replace the eyes and provide a more adapted form of object detection unlike human observation [19]. Image classification appears to be a means that helps distinguish different types of images based on the characteristics extracted from them [20] [21]. [22] proposed a CNN that chooses and extracts features from image samples to recognize the granularity of a rock. The accuracy of the model reached 98.5% but deviations were revealed and are linked to the use of single polarization images. The redundancy of information and the lack of differentiation of textural features extracted from a rock image slow down the training of a CNN and give it poor classification accuracy. [23] proposed a hybrid DCT-CNN method to overcome the problem. The technique reveals short training time with more accurate classification result. The study we are carrying out aims to verify the quality of the images of minerals acquired with a view to using their content through deep learning. A coherent image of rock and minerals can be fed into intelligent systems that use image processing and machine learning techniques. These techniques aim to do recognition and classification. Specifically, we must: 1) update the network weights using the sgdm, Adam and RMSprop optimizers to minimize the loss function, 2) study their convergence by the transfer learning method with the pre-trained parameters of the AlexNet model, 3) evaluate its optimizers by the performance results (score) recorded. To do this, variations in learning steps and the number of epochs (recursions) were used as parameters.
2. State of the Art
The classification and identification of igneous rocks and rocks in general, remains an inherent subject for geologist and geology as they contribute to the understanding of the relative mineral wealth of the rock [15]. The classification of minerals in rocks has evolved significantly over the decades. It has moved from classical methods to more advanced approaches and is based on emerging technologies [24]. In this section, we will explore previous work. We will highlight classic methods, the first applications of machine learning, and finally, recent advances through the use of deep learning.
Classical methods of classifying minerals from rocks relied largely on manual analyzes through optical microscopy. It begins by producing a thin section in the laboratory [25]. Geologists, with the naked eye or under a microscope, observed the samples for identification of minerals on the basis of their optical properties and/or their morphology [26] [27]. With a magnifying glass (approximately ×10), a texture is determined by estimation [28]. Although these methods were invaluable for their time, they sometimes had errors due to the complexity of the rock samples. It was also necessary to give yourself time, and sometimes even refer cases to an expert in order to have a more reassured decision [29].
The emergence of Machine Learning (ML) has made it possible to introduce automated approaches. The classification of minerals with image processing and computer vision has improved the visual observation of thin sections and the description of rocks using their images. [30] successfully classified minerals using supervised learning by combining data from three spectroscopic methods: vibrational Raman scattering, reflective visible-near infrared (VNIR), and laser-induced breakdown spectroscopy (LIBS). The results demonstrate that multi-scale spectroscopy associated with ML leads to rapid and precise characterization of rocks and minerals. However, the authors note that differences in the spectral data set may affect some results. [31] conducted a comparative classification study between various combinations of manual features such as first and second order statistics. These characteristics respectively describe the distribution of the intensity of the pixels in the image and the information relating to the position of the different intensities. Unsupervised learning based on the K-Means algorithm is subsequently used, in order to understand the way in which the gray levels are distributed across the pixels. They claim that while there are no guarantees of the model, the method outperformed any manual feature setup. They also learned the feature representation through self-learning based on unsupervised learning from a dataset of unlabeled rocks. Under polarized reflected light microscopy, [32] carried out the recognition of hematite grains. Good results are recorded with a better adjustment of the parameter controlling the sensitivity of the Euclidean distance of pixels in RGB space. [33], developed a correspondence relationship between the characteristics of an image and the type of rocks. The goal is to automatically classify rocks on the basis of their different images. This was done in different color spaces using 1000 images of rock from the Odors Basin in China in the Shaanxi province. The accuracy of the method is estimated at 95.0%. However, the authors are planning a study on the influence of the types and number of images which would make it possible to better train the network in order to improve this score. [34] following the results of their work, stated this: “this is the first time that computer vision based on machine learning algorithms has succeeded in the automated recognition of mineral grains from digital images acquired with a simple optical microscope”. The model uses a simple linear iterative Clustering segmentation to generate super-pixels to isolate sand grains. An action impossible with traditional segmentation techniques. The limits of the model highlighted by the authors lie in the origin (regions) of the mineral grains and the light sources (plane, circular polarized, infrared, etc.) during the acquisition. For simplicity, some researchers often consider rock images as samples with two phases: porosity and minerals [35] [36] [37]. The application of this simplification is not universal and significantly influences subsequent calculations of rock physical parameters. In particular, the velocities of P waves (compression waves or primary waves) and S waves (shear waves or secondary waves) ([38] Andrä et al., 2013). Some studies have used discrete cosine transform (DCT), local binary model (LBP) to index and calculate the signature of minerals by macroscopic view, as well as wide margin support (SVM) [39] [40]. However, these machine learning automatons are often dependent on manually extracted features. This limits their adaptability because of the diversity of minerals, their complexity of formation, the fineness of their modal structure, the similarities and even their life process in rocks. It is therefore necessary to innovate in the processing of information for complex and very varied cases or because the characteristics extracted from the image are independent of the classifier.
Recent advances in the field of artificial intelligence, in particular deep learning, have considerably transformed the scientific field of classification in medicine, security, agriculture [41] [42] [43]. Based on deep neural networks, its algorithms can learn complex features directly from microscopic images. They thus eliminate the need for manual extractions. A deep neural network (CNN) has the advantage of extracting features from the image without the need for a human to intervene manually [44]. CNNs remain the most efficient neural networks in terms of pattern recognition [45].
Regarding geology, for the first time, [21] implemented transfer learning technique to automate lithology identification and classification of rock images. They were able to effectively distinguish graphite, phyllite and breccia (various minerals from several fragments). [46], using AI, used the Transfer Learning method to identify minerals in the field from smartphones. The basic application is modeled on the architecture of the CNN ShuffleNet network. [8], thanks to an intelligent lithology identification method, based on the Faster R-CNN architecture, managed to predict lithological information and detect rock targets. [47] propose multi-class classification beyond universal CNN models. They combine deep learning and transfer learning using VGGNet, InceptionNet and ResNet architectures for online multi-coal and multi-class sorting. The size of images influences recognition during the learning process of certain models. The authors invite to diversify the components of the dataset in terms of rock types in order to better evaluate the scalability capacity of the classifier. Also overcoming the problem of estimating the background and sample points with transfer learning is another challenge.
3. Material and Methods
With the improvement in the performance of calculation units, CNNs with convolutional filter layers linked to an artificial neural network make it possible to identify the content of images by extraction of characteristics. These operations are carried out at the same time as the learning stage of the algorithm. This algorithm requires a large number of parameters which must be defined by the user as well as a large number of data. Fortunately, there are CNN architectures such as AlexNet, GoogleNet, VGGNet and others [48]. Knowing the quality of mineral images acquired using the AlexNet architecture and Transfer Learning is the major objective of this study.
In this part, we will detail how to put this approach into practice to achieve this.
3.1. Diagram of the Experimental Approach
The structure in Figure 1 labels the main procedure of our experimental approach. First of all, we have a database of various images of minerals. These images will be preprocessed with a certain number of operations (cutting, resizing, etc.). Then divided into a training and validation data set, the images are used to train a convolutional neural network via AlexNet. Finally, the membership classification of each mineral image is calculated and the precision rate is recorded serving as the performance of the chosen optimizers.
3.2. Data Set and Protocol
For the needs of the project and in order to give it a particular meaning, the creation of the database began with the collection of 15 samples of magmatic rocks in the field. These samples were used to prepare 15 thin sections for each sample. Each slide was admitted under homogeneous polarized light from an MD500 microscope (using Amscope 3.7 software) on which a camera is mounted and transmitted to a computer (Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz, RAM size 16.0 GB (15.8 GB usable)) to collect slide images.
3.2.1. Acquisition of the Image Database
For our research, we needed a fairly large number of images. By capture, the microscope scene images are acquired following the rotation of its stage ranging from 0 to 315 degrees in increments of 45 degrees. Images are observed at magnification (×40). The flowchart for this step is illustrated in Figure 1. All images (RGB) were stored at the size of 2592 × 1944 pixels with a resolution of 120 dpi, and saved in “JPG” format. The increment made is due to the fact that different minerals have different quenching properties.
3.2.2. Preprocessing and Organization of Data
The images acquired include one or more minerals at a time with different hues. The pretreatment began with the isolation of each mineral from its matrix, resulting in a resizing. The operation was done by screenshot with the XnView
Figure 1. Structure of our work plan.
software. The images occupy on average a storage space of between 3.5 and 8 KB with each a resolution of 144 ppi (pixels per inch). The minerals dataset in this experiment is a new dataset. A total of 700 images (due to 100 images per class) of single species minerals with different hues constituted our database (Figure 2). The classes were chosen because of their abundance in igneous rocks and whose proportions are used to name the rocks based on the Streckeisen diagram. We distinguish: amphibole, biotite, alkali feldspar, muscovite, plagioclase, pyroxene and quartz. Feldspathoids were not taken into account in this study because they are very rare in rock formations in Ivory Coast. To reduce the model parameters, the original images were compressed to 224 × 224 pixels. However, each image was labeled according to its class.
The exact distribution of data by classes and types is described in Table 1. The number of images in the test folder has no impact on that of the training or learning divided into two (training images or train, and validation images) with the respective proportion (70% - 30%) [49]. We just need a few different images to test the prediction once processing is complete.
3.2.3. Training and Validation
After acquiring and organizing the data, it’s time for training which consists of different processes.
Figure 2. Images of different shades of the same mineral in analyzed polarized light (LPA).
Table 1. Distribution of mineral data by class.
|
Train |
Validation |
Test |
Amphibole |
70 |
30 |
25 |
Biotite |
70 |
30 |
25 |
Alkali feldspar |
70 |
30 |
25 |
Muscovite |
70 |
30 |
25 |
Plagioclase |
70 |
30 |
25 |
Pyroxene |
70 |
30 |
25 |
Quartz |
70 |
30 |
25 |
1) Choice of model
In this work, mineral classification is implemented using a convolutional neural network with the AlexNet model. The choice of AlexNet is both a function of the quantity of our images (700 in total for the 7 classes or 100/class) and for its convolutional and fully connected layers. The latter mentioned are adjusted so as to maximize the extraction of descriptor details at the intermediate level of features. The network was developed by Alex Krizhevsky in 2012, hence its name with his collaborators Ilya Sutskever and Geoffrey Hinton [50]. Winner of the ILSRC2012 ImageNet challenge, AlexNet is a CNN model with more depth and width allowing it to adapt to graphics processing units (GPUs) with their great potential for parallel calculations. The reasons for our choice of AlexNet are as follows:
To our knowledge, no literature mentions the application of transfer learning based on AlexNet for the recognition of rock mineral images.
The application of AlexNet in various deep learning problems shows promising results [51] [52].
AlexNet is considered the first deep CNN architecture which has shown satisfactory results in image recognition and classification tasks [53].
Unlike the all-connected layers of CNNs, its convolutional layers maintain spatial coherence of information. This is the first work that propelled convolutional neural networks (CNN). It takes as input a 224 × 224 pixels’ image with 3 color channels. Its architecture includes 8 layers including 5 convolutional and 3 fully connected (Figure 3).
The block of the first convolutional layers filters the input image (224 × 224 × 3) with 96 kernels of size (11 × 11 × 3) with a step of 4 pixels. The second layer, which has the output (normalized and grouped) of the first as input, filters with 256 kernels of size (5 × 5 × 48). The next 3 layers are connected to each other without any intermediate normalization or pooling layers. However, the third layer has 384 cores of size (3 × 3 × 256) connected to the outputs (normalized and grouped) of the second. The fourth also has 384 cores but of size (3 × 3 × 192). The fifth has 256 cores of size (3 × 3 × 192). The fully connected layers have 4096 neurons each and perform global classification [55].
AlexNet promotes a fast GPU implementation of CNNs for image recognition with the “sigmoid” or softmax function as activation function [56]. This function acts as an output layer and calculates the output probability.
2) Optimization algorithms
Typically, in machine learning, the goal of a model is to create a prediction function
from a data set D, also known as a set of training. In the context of supervised learning, D consists of pairs of examples (x, y), where x represents an input vector to the model and y is a target vector indicating what we are trying to predict. Regardless of how the data was acquired, it is then put through a learning algorithm that aims to model the relationship between the inputs and the targets. The inputs to a network are generally denoted
, acting as the explanatory variables of the model. The weights associated with these inputs are represented by the parameters α and also β which must be estimated during the learning procedure. The output of the model y, represents the variable to be explained or the target of the model, formulated as:
. (1)
Learning therefore consists of estimating these parameters (α, β) by minimizing the prediction error. To measure the prediction error, we use a function called the loss function which defines the learning rate and gradient descent is a technique to do this. It updates the parameters for each example of data x and labels y ; we speak of an iterative algorithm. The lower its value, the more robust the model. If the model has correctly learned on all the data, the value of the cost function becomes zero.
a) Stochastic gradient descent momentum (sgdm) optimizers
In order to improve the efficiency of stochastic gradient descent (SGD) [57], the concept of momentum [58] (combination of the current direction of the gradient and from the previous direction) which in the case of optimization gives a certain “inertia” to the updating of the model parameters. Therefore, rather than moving only in the direction of the instantaneous gradient at each iteration, the momentum allows a sort of “memory” of previous directions to be retained. Thus, the algorithm keeps track of the direction in which the model parameters move. This helps accelerate the convergence of the optimization by attenuating unwanted oscillations or noisy variations in the gradient descent (Figure 4).
For a loss function
, the network escapes from traps using the relation using the momentum coefficient [58] defined by:
(2)
where:
represents the momentum coefficient,
vt, the momentum vector at iteration t,
η, the learning rate,
, the gradient of the cost function with respect to the parameters at iteration t
The complete formula for updating the parameters with SGDM (stochastic gradient momentum descent) is then:
(3)
where : θ represents the model parameters. Using the momentum coefficient can help speed up the convergence of the optimization algorithm and smooth out oscillations when training the model.
Figure 3. Example of the CNN called AlexNet [54].
Figure 4. Acceleration and reduction of SGD oscillations by the momentum method.
b) Optimizers Root Mean Square Propagation (RMSprop) Algorithm
This optimizer was designed to address some issues related to updating Learning rate in gradient descent. Instead of accumulating all the squares of the previous gradients, we restrict the window of accumulated gradients to a fixed size. Therefore, we apply an exponential moving average of the squares of the gradients to automatically adapt the learning rate for each parameter of the model rather than storing the squares of the previous gradients. The parameter update (θ) with RMSprop is calculated as follows at each iteration t.
(4)
(5)
where:
gt, the gradient with respect to the parameters at iteration t;
[g2]t, the exponential moving average of the squares of the gradients at iteration t;
α, the learning rate;
β, attenuation coefficient, generally close to 1 (for example, 0.9);
ε, small constant added to avoid division by zero.
[59], propose to set β at 0.9 while a good default value for a better learning step α is 0.001.
c) Optimizers Adaptive moment estimation (Adam)
Adam, is a method which calculates a learning step qualified as adaptive for each parameter [60]. During training, in addition to storing an exponentially decreasing average of the squares of the previous gradients vt as with RMSprop, it also keeps an exponentially decreasing average of the previous gradients mt following the equations:
(6)
(7)
It should be noted that mt makes it possible to update the exponential average moments of the gradient and is qualified as an order 1 estimation while vt updates the exponential average moments of the squares of the gradient and is qualified as an order estimation 2. According to these authors, mt and vt are initialized as vectors of 0 and are biased around this point during the first steps, specifically when the decay coefficients (β1, β2) are small (close to 1). They propose a bias correction for the estimates with the relationships:
(8)
(9)
Then comes the settings update (θ):
(10)
where:
η, the learning rate;
β1 et β2, attenuation coefficients (typically close to 1, for example, 0.9 and 0.999);
ε, small constant added to avoid division by zero.
For the parameters β1, β2 and ε, the authors propose default values respectively 0.9, 0.999 and 10−8.
3) AlexNet and Transfer Learning
Lack of knowledge of open source reference images of minerals is a disadvantage in having a sufficient quantity of images. This leaves us unable to train the AlexNet network from scratch. Our solution to this deficit takes us to work through Transfer Learning [61] [62] [63]. This method refers to the ability of a system to recognize and use knowledge and skills acquired in previous tasks, to apply them to new tasks, often in different domains. Thus, the influence of transfer learning follows the principle of Figure 5.
This change consists of modifying the classification of the 1000 classes of ImageNet into a classification of seven (7) classes. This will involve determining as outputs the presence of images of quartz, biotite, amphibole, plagioclase, feldspar, muscovite, pyroxene. The new network to obtain follows the following steps: This principle works as follows:
First step: Initialize AlexNet settings.
Second step: Train good performance in image recognition and classification. This by removing the last fully connected layers of AlexNet which were used to classify the 1000 classes of ImageNet.
Figure 5. Principle of the transfer learning approach.
Third step: Freeze all the weights of the pre-trained layers in order to perform the adaptation used to solve the new classification problem.
Fourth step: The extension of AlexNet is done on the new model by adding a new classifier next to the output layer. Thus, all layers of the new model are trained on the target data and then the sigmoid function finds the probability that the data belongs to a class.
In each configuration, the parameters of the new model are optimized using the grid search method. This method uses two hyper-parameters: the number of epochs and the “Learning rate”. The grid method consists of defining a certain number of values (or grid of values) for each hyper-parameter [64]. The metric chosen to evaluate the performance of the said model is precision. However, the “loss” curve can be used (Figure 6 (curve in orange)). The combination that provides the best performance is then selected to certify the quality of the images.
4. Experimentation and Analysis of Results
4.1. Identification of Rock Minerals Using AlexNet
Trained to classify images into 1000 ImageNet categories, we need to adjust the output layer of the AlexNet pre-trained network to match the number of classes in our work. This leads to replacing the last dense layer with a new dense layer with the appropriate number of neurons for our case. Through transfer learning and data augmentation operations, the AlexNet pre-trained network will optimize its parameters to solve the classification of rock mineral images in the database. In this work, to prevent the size of the images from affecting the classification results [65], each image is compressed to [224 × 224 pixels] in order to fully assess the accuracy. The ReLu activation function is used to introduce non-linearity into the model, facilitating the learning of complex relationships between rock minerals. At the output of the network, Softmax is used specifically to convert scores into class probabilities.
4.2. Model Evaluation
Deep learning with its networks with complex architectures as well as its important training parameters, which by using transfer learning freezes some of its layers to a certain point requires evaluation. The evaluation phase will therefore determine what is correct in the model. A way to quantify which predictions is correct. The performance of a model results from the confusion matrix (Table 2) presented by the model including the metrics: precision, recall, accuracy and trade-off between precision and recall (F1-score).
For a confusion matrix the metrics raised are determined as follows:
(11)
(12)
(13)
(14)
With TP, TN, FP, FN respectively true positive, true negative, false positive and false negative.
4.3. Training and Results
This section of this article details the results after training our AlexNet model on image of rock minerals. Three optimizers were selected in order to understand and know their mode of operation by varying the Learning rate for a series of experiments and then doing the same for different epochs on our dataset. For a method, the learning step remains a very sensitive parameter for its convergence [66]. Figure 6 shows an example of the curve obtained after training. At this step, the learning step is executed for N iterations, thus updating the parameters N times using N examples from the dataset. However, an epoch constitutes T updates, that is to say the equivalent of the entire dataset intended for this phase.
The results of each variation linked to each optimizer are recorded in Table 3(a), Table 3(b), Table 4(a), Table 4(b), Table 5(a) and Table 5(b). The proportions of data for each training and validation set are respectively 70% - 30% and remain the same in each case.
Figure 6. Training to test the network on the database.
Table 2. Matrix of confusion.
|
|
Classes prédites |
|
|
Classe 0 |
Classe 1 |
Classes réelles |
Classe 0 |
TP |
TN |
Classe 1 |
FP |
FN |
Table 3. (a) Results of variations in the Learning rate (LR), with Epoch set to 350; (b) results of Epoch variations where the Learning rate (LR) is set at 10−3.
(a) |
Learning rate (LR) |
10−6 |
10−5 |
10−4 |
10−3 |
10−2 |
10−1 |
Score (%) |
93.3 |
94.4 |
95.2 |
96.2 |
92.9 |
82.4 |
(b) |
Epoch |
50 |
100 |
150 |
200 |
250 |
300 |
350 |
400 |
Score (%) |
84.8 |
90 |
91.4 |
92.9 |
96.2 |
96.7 |
96.2 |
91.9 |
Table 4. (a) Results of Learning rate (LR) variations, with Epoch set to 350; (b) results of Epoch variations where the Learning rate (LR) is set at 10−1.
(a) |
Learning rate (LR) |
10−6 |
10−5 |
10−4 |
10−3 |
10−2 |
10−1 |
Score (%) |
86.7 |
92.4 |
89.5 |
93.8 |
94.4 |
95.2 |
(b) |
Epoch |
50 |
100 |
150 |
200 |
250 |
300 |
350 |
400 |
Score (%) |
76.2 |
82.9 |
89.0 |
93.8 |
89.0 |
92.4 |
82.4 |
81.4 |
Table 5. (a) Results of Learning rate (LR) variations, with Epoch set to 350; (b) results of Epoch variations where the Learning rate (LR) is set at 10−1.
(a) |
Learning rate (LR) |
10−6 |
10−5 |
10−4 |
10−3 |
10−2 |
10−1 |
Score (%) |
78.1 |
79.0 |
89.5 |
74.3 |
76.7 |
75.2 |
(b) |
Epoch |
50 |
100 |
150 |
200 |
250 |
300 |
350 |
400 |
Score (%) |
41.9 |
55.7 |
66.2 |
70.5 |
71.9 |
75.2 |
84.3 |
74.3 |
4.3.1. Results of Variations of Learning Rate and Epoch with Sgdm
Table 3(a) and Table 3(b) describe the results obtained with the “sgdm” optimizer with the AlexNet architecture using images of rock minerals using Transfer Learning. Part (3a) describes the accuracy scores for different Learning rate values. The highest score is 96.2% at the learning rate of value 10−3. As for Table 3(b), we record scores in two phases. The first phase has the same evolution as the values of the grid epochs. The step is set to 10−3 during the experiments. The highest score recorded is 96.7%. Subsequently, the second phase is positioned with a drop in precision values.
The stochastic gradient descent with momentum (sgdm) optimizer recorded its lowest score value (82.4%) for an estimated learning rate of 10−1 and 84.8% when the algorithm went through the entire training data set 50 times. During the different types of training to evaluate sgdm, with the different Learning Rate values. The curve of the loss function or “Loss” (curve shown in orange) completely begins to converge around 100 epochs, or around 400 iterations.
4.3.2. Results of Learning Rate and Epoch Variations with Adam
Table 4(a) and Table 4(b) show the results obtained with the “Adam” optimizer, still using Transfer Learning. Part (4a), however, describes the accuracy scores for different learning rate values. The highest score value is 95.2% at the learning rate value 10−1.
As for Table 4(b), we note the scores also corresponding to different epochs with a learning rate, this time maintained at 10−1 during the experimental phase. The highest score is 93.8% after 200 epochs.
The Adam optimizer recorded its lowest score values, which are 86.7% with the lowest learning rate value set to 10−6 and 76.2% when the algorithm performed 50 epochs on the entire training data.
4.3.3. Results of “Learning Rate” and “Epoch” Variations with RMSprop
Table 5(a) and Table 5(b) show the results obtained with the “RMSprop” optimizer. Part (5a) shows the model accuracy scores for different learning step values. The highest score with RMSprop is 89.5% for 10−4 as the optimal value of the learning step. As for Table 5(b), the learning rate was maintained at 10−4 and we record the scores corresponding to the epochs proposed for the experiment. The highest score is 84.3% at 350 epochs.
This optimizer presents its lowest score value which is 75.2% with the high value of the Learning rate estimated at 10−3. Unlike the other epochs, the logic is respected, i.e. a low score of 41.9% at the start of training and which increases as the epochs also increase before making a learning jump after 350.
5. Discussion
The performance results of the three optimizers from the point of view of accuracies are approximately more than 80% with regard to the “Learning rate” hyper-parameter. This shows that optimization algorithms such as gradient descent use the “Learning rate” as a scalar to determine the step size (modification made to the model parameters) at each iteration in order to tend towards the minimum of the loss function in order to achieve a good score [67] (Murphy, 2012). It is defined as an adjustable hyper-parameter and influences the result of a learning model. It metaphorically symbolizes how quickly a pre-trained learning model assimilates new information [68]. Thus, the average precision value achieved is considered a significant performance in image classification tasks. This observation reveals that all of the image data acquired has lower noise levels [69]. However, the best accuracy is obtained by setting the optimizer to the Momentum algorithm. The step value (10−3) of this optimizer that resulted in its highest performance value also achieved the best performance at the expense of the performance values of the other two optimizers.
However, setting a learning rate is sometimes problematic, in the sense that for a high step value, the learning will undergo a jump and will be placed above the minimums (unstable convergence). This hypothesis is verified in our case with the learning rate values at 10−2 and 10−1 of respective scores 92.9% and 82.4% for the optimizer “sgdm” after its limit at 10−3. Conversely, for a low learning step value, learning will take too long to converge since the risks of being stuck in a local minimum are significant [70]. [71] points out that for these cases, there is a limit from which the error stops decreasing and begins to increase inversely to the score. Concerning the “Adam” optimizer which is a combination of RMSprop and Momentum (sgdm), its highest score is in line with the highest value of the “Learning rate”. And since a higher Learning rate value can allow a model to move more quickly through the parameter space and explore different regions of the search space for better model generalization, then images contain functionalities usable by CNNs.
For the results related to the number of “Epochs”, the “sgdm” and “Adam” optimizers recorded precision performances around an average of more than 80% as well. These scores confirm the performance of CNNs for Image Classification tasks and giving credit a second time to the quality of our images. It is again “sgdm” which again achieves the best precision score (96.60%) after each image in the database has been observed by the model 300 times (300 “Epochs”). However, a growth in scores is observed for each optimizer. This is justified by the progressive initialization of the training weights from the pre-trained model.
Unlike the other two optimizers, “RMSprop” starts its precision performance around 40%, or half for the start of the other two. This weakness in precision demonstrates the idea that “RMSprop” does not accumulate all the squares of the previous gradients. The algorithm operates by applying an exponential moving average of the squares of the previous gradients. The current average on the precision at step t depends only on the previous average and the current gradient. This means that the initialization of the network weights makes the model ineffective while it takes time to update the new data. Some authors have worked to improve this aspect of initialization [72] [73]. However, the algorithm ended up reaching the significant value of CNNs for image classification at 350 “Epochs” [74]. This result reassures us of the quality of the images.
6. Conclusions
Image quality is of great importance in developing an intelligent identification and classification system. The consistency of mineral images allows better features to be extracted to analyze, examine and interpret certain information that its rocks carry. Which information could be useful to industry, engineering, academics and many others dealing with the field of geoscience.
In this article, we compared the accuracy-related performance results of 3 optimizers for rock mineral image quality. The approach applied Transfer Learning based on the architecture of the CNN AlexNet model. The idea is to show that at the limit of our data, the simplest architecture of AlexNet can perform better when implemented with Transfer Learning and optimized model parameters.
The results of this study show that Transfer Learning can be useful to improve the robustness of the model on new data. Furthermore, it is beneficial when the number of training images is small. In our case, this was possible thanks to the use of feature extractors from the pre-trained model and the limitation of the training to only the classification layers (fully connected layers). This approach was beneficial to us instead of using all the parameters of the pre-trained model.
The proposed method can therefore provide satisfactory quality of images of rock minerals. The best precision is recorded with the optimizer using the “sgdm” algorithm applied to the two hyper-parameters. The scores are 96.7% and 96.20% respectively for the number of “Epochs” and the “Learning rate”.