Full fine-tuning, LoRA, and RAG, explained visually👇
𝗙𝘂𝗹𝗹-𝗺𝗼𝗱𝗲𝗹 𝗙𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝘃𝘀. 𝗟𝗼𝗥𝗔 𝘃𝘀. 𝗥𝗔𝗚 explained visually: . . All three techniques are used to augment the knowledge of an existing model with additional data. 1) 𝗙𝘂𝗹𝗹 𝗺𝗼𝗱𝗲𝗹 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 This involves adjusting all the weights of a pre-trained model on task-specific data. While this works pretty well, it is not practically feasible on large models — LLMs, for instance, primarily because of: ↳ Their size. ↳ The cost involved in fine-tuning all weights. ↳ The cost involved in maintaining all large fine-tuned models. 2) 𝗟𝗼𝗥𝗔 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 LoRA fine-tuning addresses the limitations of traditional fine-tuning. The idea is to decompose the weight matrices (some or all) of the original model into low-rank matrices. Next, we train only the LoRA network and freeze the large model. In the graphic below: - the top network represents the model with LoRA layers. - the bottom network represents the large pre-trained model Notice the difference in the number of connections both networks have. This immensely reduces the computational requirements. 3) 𝗥𝗔𝗚 Both full-model and LoRA fine-tuning discussed above involve further training. RAG helps us augment additional information, without fine-tuning the model. There are 7 steps, which are also marked in the above visual: - 𝗦𝘁𝗲𝗽 𝟭-𝟮: Take additional data, and dump it in a vector database after embedding. (This is only done once. If the data is evolving, just keep dumping the embeddings into the vector database. There’s no need to repeat this again for the entire data) - 𝗦𝘁𝗲𝗽 𝟯: Use the same embedding model to embed the user query. - 𝗦𝘁𝗲𝗽 𝟰-𝟱: Find the nearest neighbors in the vector database to the embedded query. - 𝗦𝘁𝗲𝗽 𝟲-𝟳: Provide the original query and the retrieved documents (for more context) to the LLM to get a response. Of course, there are many problems with RAG too, such as: - It involves similarity matching between the query and dumped vectors. However, questions are structurally very different from answers. Thus, we typically retrieve many irrelevant documents. - Typical RAG systems are well-suited only for lookup-based question-answering systems. For instance, we cannot build a RAG pipeline to summarize the additional data. The LLM never receives info about all the documents in its prompt. This is because the similarity matching step only retrieves top matches. We covered RAG from basics to advanced here (with implementation): https://lnkd.in/gRccJPcZ It covers these (with implementation): - RAG fundamentals - RAG evaluation - RAG optimization - Multimodal RAG - Graph RAG - Multivector retrieval using ColBERT - RAG over complex real word docs ft. ColPali -- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc -- 👉 Over to you: what are some other problems with RAGs?