What is Generative AI and how can we build an application using Generative AI?
.What is Generative AI and how do we build a Generative AI application?
What is Generative AI? Is it just about Prompt Engineering and ChatGPT? How do we use Generative AI technology to build our own application? This article will elucidate all these at a conceptual level, without delving into the Math or details of the algorithm. I typically cover a python based hands-on based and in depth (including the math) on just Generative AI typically in a 16-hour workshop. This article is an introductory article which explains at a high level the components of Gen AI, and how it all is used to build a GenAI application. The math appears quite intimidating (though in reality it is not, once we understand some basic math, statistics, calculus and probability concepts and guidelines).
Let’s start from the basics. What is Generative AI?
Generative AI is a set of AI technologies that allows creation of content. The content could be text, image, video, audio, or any other augmented data. In Generative AI we can ask it to create something by giving an input (called as “Prompt”), and the GenAI would give us the output (called as “Completion”). We could use it for a number of tasks. We could ask it a question, tell it to summarize an article, translate a text into different language. We can give it the current time series data on a stock performance, and ask it to predict the next value. We could use in almost all ML applications areas. It can be used in many areas in company, government and still new applications are being developed using Gen AI. The AI technologies that the Generative AI uses are Machine Learning, Deep Learning, Large Language Models, Diffusion Models. The primary component of Generative AI is Large Language Models (LLM).
Let’s go through each of them one by one at a conceptual level. What is Machine Learning? ML is a set of algorithms that helps us make predictions on data (Supervised Machine Learning), draw insights on data and also simplify the data by combining multiple columns data into a fewer set of columns (Unsupervised ML), learn the rules of how a system works (Reinforcement Learning). Supervised ML is used to make predictions on unseen data. In order to make predictions, we first need to teach the algorithm on how the value to be predicted (let us call it y) relates to the other inputs (lets us call it x). It needs a historical past data and labelled data which simply means historical values of x along with its corresponding y values. It learns how the y value relates to x by a variety of methods. In ML execution there are 2 phases – Training and Prediction. During the Training phase, it learns how y and x relate to each other. This can be a set of rules, or mathematical functions relating the input x to y. It uses this set of “Rules” or mathematical functions to predict the value of the unseen data presented to it during the inferencing stage. Each method (Rules/mathematical function) forms an algorithm. All ML (and DL, LLM) algorithms need a lot of data for learning this pattern. In traditional ML algorithm we do lot of data gathering, cleaning and processing before we feed it to the models. Unsupervised ML comprises a set of algorithms that understand the data pattern, and insights by trying to find the commonality in the data. UML also includes algorithm that reduces the columns in the data by identifying the most common independent factors/combinations-of-columns and using those columns only (though this results in some information loss). Deep Learning is a subset of ML and includes those algorithms based on Neural Network where we have a layer of mathematical functions processing the data. Neural Network ostensibly mirrored the way human neurons work – i.e., output of one neuron goes as input to the next neuron and a set of neurons jointly fire to trigger a visible action/thought/feeling in the human. Likewise in the Neural Network, there are multiple layers of units each processing the input. Each unit of the Neural network appropriately weights each input and generates output based on the sum of the weighted inputs. This output again passes as input to the next layer units where again each of them is weighted, and the next layer generates output to the weighted combination of previous layer output. The weights that are used for each input are the key parameters that a NN learns. Once we learn the weights (called as parameters) we can predict the output (completions) by simply applying the series of mathematical functions on each weighted inputs. LLM are based on such Neural Networks and are the primary component of a Generative AI application
What is LLM? LLM is a combination of Neural Network layers based on Transformer architecture (explained below).
How does it work? How is it able to answer so many questions almost functioning at a human expert level (Although it is known to give wrong answers and cook up answers due to conditions called as Hallucinations, and it has not reached human level in cognitive tasks)? How is it able to learn to do many new tasks especially when all we need to do is give a few examples? It is It is able to do so because of the following reasons:
a) It has been trained on lot of data -sometimes even on Trillions of words. You might wonder how labelled data could be fed to the LLM. The LLM has been trained in an unsupervised way. i.e., we do not label the data and give both the data and the expected answer. It figures out the expected answer by itself - i.e., we do not manually feed what is the next word in a sentence. If you give is corpus of well-formed documents, it can figure out the next word itself and then it trains the model with this self-generated next word as labelled data for its training.
b) It has understood the language that we speak/write and its language constructs very well,
c) It has been trained on variety of documents - typically (though it is not known which data was used for training) the documents used for training included legal books, contracts, code, religious document, books, computer science books, medical articles. Thus, it has some “knowledge” baked into it during training.
d) It has so many parameters that has it learned to “interpret” the prompt based on the statistical models it has created based on the data it has been trained on, and thus perform the task. This statistical model has between 10M (BERT) to 1.7 trillion (GPT4) parameters. The parameters are the building blocks of a LLM. These are the weights that are used in a neural network that tells how much importance to give to each input the Neuron receives from the previous layer neurons. When it has been trained to perform a task, it learns the values of the weights. You can think of a weight as a parameter, as a logical construct it uses to perform any task. Parameter is the fundamental cell of the LMM something like “neurons” of human brain (this is just a simplistic explanation as this is still far away from human level neuron).
Recommended by LinkedIn
This statistical model is called LLM (Large Language Model). This LMM has been trained on predicting the next word of a sentence (or variations of that). So, if we give it a sentence – “It was a cold and chilly evening. So I had to wear a ?“ and ask it to predict the next word, it would predict the most likely word such as {sweater, jacket,…}. It learns to predict the next word by picking out the word that has highest probability of appearing. The secret to the LLM working so well (still there are areas where we can improve it) is that it is based on an revolutionary architecture called Transformer. Transformer was Neural Network based architecture that “understood” the context of text very well. It understood which words in a sentence are important and constructed mirroring the way us humans understand the meaning by picking out the most relevant word in the sentence (called as “Attention”). The earlier architecture of RNN, Bi Directional RNN, LSTM, LSTM with Peephole etc. were also good at knowing the relevant words and understanding the context. But these were not suitable for processing long passage of text as it was not good in preserving context in large paragraphs. Also the RNN family of architecture processed text sequentially. Transformer changed that by a) processing text in parallel (embedding positional information in each text to allow it to retain the order of the words as well), and learning how much attention to give to each word. Generative AI besides LLM also uses Reinforcement Learning primarily to learn the Human preferences among different completions generated. It used 2 types of RL – RLHF (Reinforcement Learning from Human Feedback ) and RLAIF (Reinforcement Learning from constitutional AI Feedback. In RLHF, during the training time each prompt completions was given human rating, and through RL it learned the ideal prompt completions based on human feedback. RLAIF is similar to RLHF, instead we used AI to give rating, and based on this rating we chose the appropriate response. RLAIF was a more scalable architecture than RLHF as human were not involved in labelling the Completions during Training
Is Generative AI just about Prompt Engineering and ChatGPT and Chat GPT Plus? No. Chat GPT family is a very important application of Generative AI. But the Chat GPT family cannot be custom trained i.e., we cannot modify the parameters, for a custom task. There are lot of other LLMs we can use for Generative AI – BERT, Flan T5, Bloom, LLaMa 1, 2 etc. In fact, depending upon the core task, we can choose an LLM and then customize that model based on our requirement. Sure, we can use ChatGPT family as is also. We can use it as part of Prompt Engineering and improve its performance using in-context learning (explained below). The most popular Gen AI application are Chat GPT and Chat GPT Plus. Chat GPT also based on Large Language Models GPT 3.5 and has been Instruction Tuned with lot of code and also with RLHF. Chat GPT Plus is based on GPT 4 and it can directly take a file for input and give output. There are models like Dall E 2, 3 which generate images when you give text as prompt based on a technology called on Diffusion learning.
How can we customize a Generative AI model? There are multiple ways of customizing an LLM.
The first step is we need to define a problem and based on that choose a LLM as a base model. The choice of LLM depends on whether we want it to do only task such as summarize a legal document, or do multiple things – Summarize a legal document, extract the key entities from the legal document, translate it multiple languages automatically.
Having chosen the base LLM, we can custom train the model by either Prompt Engineering or Fine Tuning the model. Prompt Engineering method allows us to quickly adapt the model for our requirement. They will however not modify the parameters of the model. It uses a concept called in-context learning. As part of using the LLM, we give input (called as Prompt) and get the output of the model (Called as completions). If the model performs the task without any example prompt and completions it is called zero shot learning – i.e., we just ask the model to do the task and give it a prompt. The model understands the task we give and gives the output. If the model takes 1 input + output as prompt, it is called one shot learning, and if it takes few examples of inputs and outputs it is called few shots learning. Prompt Engineering is an example of few shots in-context learning. If the completion is not correct, we can teach the model by giving examples of what is the correct output. By giving examples of the correct input and output we try to make the model learn on the fly. In case of Chat GPT or Chat GPT Plus this is the only way of “teaching” the model to perform our task, as the underlying parameters are not known. Often for many use cases we can programmatically drive the ChatGPT to give us the required answers. Each of the Generative AI typically would have an API (you will need to subscribe to it) using which we can programmatically get the completions. This is the most common way of building a Generative AI application on top of the existing Generative AI via API.
Many times, this approach may not give us the desired output. In that case we have to Fine Tune the model. Fine Tuning consumes lot of resources from both storage and compute tasks. In order to fine tune, we first need to create a custom dataset of prompts and completions of those prompts. As usual we can partition them to train, cv, and test dataset. We then train the model over many iterations using the famed back propagation method. During the training we modify the parameters and thus teach it how to perform for our task. We can also use another method called Instruction Fine Tuning, where we also pass in the prompt as instruction explaining what task we want to teach the model. This fine-tuning approach will take lot of time, and compute resources. Thus, here we teach the generic base model to specialize on our task. Many times, while learning the specialized task, it forgets the other generic task that it was originally capable of. This is called “catastrophic forgetting”. In order to overcome this and not make it “forget” what was innate to the base model, we will need to train it with all base tasks also.
As part of adapting the model for our task, instead of Fine Tuning the model, we can use a much cheaper (from compute and memory resources) technique called PEFT (Parameter Efficient Fine Tuning). Without going into too much details (in this article), in PEFT we freeze the model parameters and add a small set of additional parameters and just train those. We do not change/train the original parameters of the model. There are many techniques for training them such as LoRA (Low Rank Adaption of LLM), Additive (Soft Prompts), Selective PEFT. Also, this approach can be used for multi-task training easily. We have a set of low rank parameters for one task, and another set of another task. If we want to predict for one task, we can use the low rank parameters associated with that task, and swap it out if we want to use it for another task. Though this approach does not give as good a performance as Fine Tuning/Instruction Fine tuning, its performance is comparable and many times the ease of training and flexibility of swapping required parameters outweighs the benefit of higher performance obtained from the Fine-tuning methods.
After we customize the model, we can deploy the model and integrate with our application eco-system. It is possible that our compute budget may not support huge models. Also having a large model will result in higher time in predicting the model completions. So many times, it may benefit us to reduce the model size. We can reduce the model size by quantizing the parameters, or building a distilled version (can be done only some types of LLM that are based on Encoder architecture), or pruning the model (removing the dead neurons).
For many use cases, the model may need to know the latest news. For e.g. if a model is trained in 2022 with 2022 related global news/events, and as part of training data it has been fed articles on the then Prime Minister of New Zealand Ms Jacinda Ardern (I am just taking an example), if you ask the model who is the current PM of New Zealand, the model may wrongly say Jacinda Ardern, instead of the current PM Mr. Chris Hipkins. Also, if it has not been given sufficient data on a particular domain, it may cook up information (this is called as hallucination). To avoid this for many such use cases, we should feed the model with updated current information as part of passing the input. The model will use this input also to give more authentic completions. We can create such set of additional inputs using a technique/framework called RAG (Retrieval Augmented Generations) and pass that to the model. There are lot of other techniques like Chain Of Thought Reasoning, PAL (Program Aided Language model chain), Chain Of Verification (CoVe by Meta) and LangChain framework also that can be used to build a Generative AI application.
That’s it. Hopefully by now you have a high-level idea on what is Generative AI, and conceptually how to build a Generative AI application. All the best in this exciting new technology.
Owner at Mahadev Enterprise
1moREQUIRE FOR AI ENGINEER GENERATIVE AI