Different types of AI
In the first two pills, Introduction to AI and How AI Learns, we had a basic understanding of which kinds of problems can be solved with AI and the algorithm used to train an AI system.
What has been described insofar is a branch of AI called Machine Learning which is a discipline, inspired by statistics and probability theory, that can be defined as a process whereby a machine provides solutions to complex problems without being explicitly programmed but only learning from data.
We have analyzed Supervised Learning Algorithms like regression and classification (even though many others exist) that work on labeled data and mentioned the Unsupervised Learning Algorithms that work on unlabelled data (other variations of these two main categories exist as well).
But the great popularity of AI comes from Deep Learning (a Neural network with many layers is called a Deep Neural Network, we will explain layers soon in this article) and its exciting innovations in the area of computer vision, language translation, self-driving cars, and more recently in the generative AI.
In Figure 1, the relationship between AI, Machine Learning, and Deep Learning is represented.
Figure 1
Deep learning systems are "data-hungry" (need a tremendous amount of data) and need specialized hardware for parallel computing (Graphical Processing Units - GPUs) to perform the training process that is usually quite expensive and lengthy (potentially months).
Also, Deep Learning algorithms are usually considered "black-box" while there are Machine Learning models that can be interpretable (more on this concept in the last article).
However, the benefits of using deep learning are that it outperforms traditional machine learning, discovering complex and non-linear relations between the features and the target and providing superior results for non-tabular data (unstructured data) such as speech, images, text, etc.
Deep Learning takes inspiration from how the animal brain is structured: a network of intensely interconnected cells, called neurons: in a deep network, a cell is an element receiving input from connected cells, elaborating these inputs, and finally passing a new value to other cells.
In a real deep neural network, we can find billions of parameters representing the weight of these interconnections between the cells. These weights are initially configured with random numbers but are updated during the learning process (that will be explained in the last article) until the network produces the desired output.
A deep network is based on the network based on multiple layers: the first layer is the input that receives the data (features' values) coming from a training dataset (during the learning phase). The number of cells sitting at the input layer (dimension) is equal to the number of features of the training dataset.
The output is the last layer of a deep neural network and generates the final result (target); the number of cells sitting at the output layer is determined by the problem we are solving (regression or classification) and the number of classes in a classification problem.
Between the first and the last layer, one or more hidden layers are present whose goal is to create a complex model introducing non-linearity (I am going to explain soon this important concept). The number of layers and the dimension of each layer (number of cells) is determined by the complexity of the problem and is something that only expert AI scientists can evaluate.
In Figure 2, a sample deep learning network is shown where the Input layer has 3 cells, there are two hidden layers, the first with 4 cells and the second with 2, and only one cell in the output layer (because it is supposed to be a regression model, so the target is a real number).
Each cell in a layer is connected to all the cells in the next layer and a parameter (Weight) is associated with each connection (for the sake of simplicity we are omitting Bias values that are additional parameters associated with each cell).
Recommended by LinkedIn
Figure 2
We will now understand how a deep learning network can learn non-linear relations and emulate the activation process of a human brain, where a neuron can be activated by other connected neurons.
To explain this idea let's focus on a single cell of the deep learning network as in Figure 3; in particular, let's see what happens the way from the input to the output (left to right, called Forward Propagation) where the network transforms inputs to outputs.
As an example, let's focus on a particular cell, the top one of the first hidden layer having three input connections (with the relative weights) with each input cell, and whose output will feed the two cells sitting at the second layer.
If we are not familiar with matrix multiplication, we can still describe this simple process as follows: the input values are multiplied for the respective weights and summed up; in our example, this is 1*0,3 + 0*0,3 - 2*0,2 = - 0,1.
The output value (-0.1) becomes the input value for the two cells sitting at the second hidden layer; -0.1 will be multiplied by the weights (between this cell and the two cells in the second layer), and summed with the output (multiplied by their weights) coming to the other 3 cells sitting at the first layer.
Figure 3
This simple process is repeated moving from left to right for each cell until the output is produced; but, there is a big but: whatever complex is the network (in terms of layers and the number of cells), there is a linear relation between input and output which is not enough sophisticated to represent real problems.
This is why, each hidden layer is also equipped with a so-called Activation Function that receives as input the output of the previous calculus and further elaborates it to produce the final output of the cell; Figure 4 shows a typical activation function called Rectified Linear Unit (ReLU): a very simple function returning y = 0 for any x < 0, and y = x for any x > 0; in this way, each cell can be activated (when x > 0) assuming a non null y value or not (when x < 0).
Figure 4
Using the previous example, we know that the hidden layer cell calculated -0.1; now, this cell will plug this value in the ReLU function as an x value, and being negative, the ReLU output will be zero. Hence, for any possible weight, zero will be the output of this cell and zero will be the input provided by this specific cell to the cells sitting at the second layer.
This means that the cell is not activated and it is not contributing to the activation of the second layer.
This is the trick used by hidden layers to create complex and non-linear relations.
We have not finished yet with deep networks and there is still a lot to say; I hope this was inspiring insofar as you will be eager to read the fourth and final article to complete this high-level overview of deep learning and read my conclusion about the AI learning process.
Luigi, grazie mille per la condivisione. Davvero molto interessante.
Sara Assicurazioni - Direzione Vendite
1yWell done Luigi