Deep Neural Networks
I am assuming that we are already familiar with the first three pills (Introduction to AI, How AI Learns, Different Types of AI) so we have a good understanding of the deep neural network.
In this pill, I will complete the overview (even though very simplified) of the topic.
Let's start with an example where we have a Training Dataset with 100.000 observations; for each observation, we have 3 features and one target value (let's assume we are dealing with a regression problem).
Also, we have a Test Dataset with 10.000 observations and of course the same number of features and one target.
So, we are dealing with two tabular data 100.000 rows x 4 columns and 10.000 rows x 4 columns.
When we build a deep neural network, the input dimension is fixed by the number of features (3 in this case), so we will have three input cells; the dimension of the output is determined by the target; in this case, we have a single value target (one column) so the output layer will have one only neuron.
The number of hidden layers and the number of cells is arbitrary (it is beyond the scope of this article how to design a proper architecture); let's assume that we think that two layers are enough, the first layer with 4 cells and the second one with two (see Figure 1).
In a 'simple' deep neural network (in the generative AI, we will learn about the not simple neural network), each cell in a layer is connected with each cell in the following layer; as we have seen in Different Types of AI each connection has a weight and each cell has an addition parameter (bias); weights and biases are parameters of our deep network.
Let's calculate how many parameters there are in this simple network:
Finally, we have 10 additional bias parameters (1 per cell), so the total number of parameters equals 32 (in a real deep neural network we can have billions of parameters).
Now, we need our learning process to find optimal values for these 32 parameters to minimize a loss function (as explained in How AI Learns).
The good news is that what we learned in How AI Learns (the gradient descent algorithm) is valid for deep neural networks as well; there are several algorithms to manage the optimization problem for real deep networks with millions or billions of parameters but they are specialized gradient descent algorithms.
In a real deep neural network, the training dataset is divided into batches to speed up the convergence and work on a manageable number of observations; in our example, we divide our 100.000 observations of our training dataset into 100 batches 1000 observations each (see figure 1).
Recommended by LinkedIn
We can now feed-forward a neural network passing the data in one direction: left to right from inputs, through all the layers, to the outputs (Forward Propagation).
Each observation is evaluated by the network, the output ŷ is produced and the loss function is evaluated on a full batch based on the (errors) distance between ŷ and the real value (y) in the Training Dataset.
We can now use the ŷ values and the gradient (partial derivatives) of the loss function to update weights and losses right-to-left (Backward Propagation); in How AI Learns we used a single parameter, in this case, we have 32 derivates and 33 parameters updated in this process.
Of course, we do expect that the errors will be smaller with the new weights and bias values obtained after a Backward Propagation.
This process is repeated for each batch inside a dataset; so, when the full dataset has been used, the parameters have been adjusted 100 times (in our example); a full cycle on the entire dataset is called an Epoch and a learning process will go through the full dataset many times; many epochs are needed before the loss will be considered good enough for our problem.
As explained in Introduction to AI, once the training process is completed we will use the Test Dataset (in our example 10.000 observations) to check the validity of the model on unseen data.
To do that we have to feed again the deep neural network but this time using only the Forward Path because we don't want to modify the model parameters learned during the first phase.
Well, this was the last intuition I wanted to share about AI and even if techy people may have turned up their noses, I wanted to make the concepts as simple as possible for everyone.
There is a lot more to learn about AI in particular when it comes to Generative AI where we have to deal with non-tabular contents (speech, video, audio,..) and we have different goals from prediction and classification, like text generation, summarization, translation, etc .. but if you will be patient I will prepare soon a new series "Generative AI in Simple Terms for non-techy People"!
Until now, everything I wrote was a technical summary about AI, even if oversimplified; I would like to close with two of my comments and I am open to the discussion:
Very clear and interesting articles, thanks a lot Luigi! P.s. in Figure 2 of "Deep Neural Networks" I guess the top left label should be "Test Dataset", not "Training Dataset". Deformation after years of IPTV PQR testing in ValCannuta 250...
Your series "AI in Simple Terms for non-techy People" has been a fantastic resource for demystifying Deep Neural Networks, and it's exciting to hear about your upcoming series on Generative AI! 🌟 Generative AI can not only enhance the quality of your work but also significantly reduce the time you spend on content creation, allowing you to focus on strategy and engagement. 🚀 I'd love to show you how generative AI can revolutionize your content production process. Let's chat about the possibilities - join our WhatsApp group to book a call! 📅 Cindy