Daily Dose of Data Science’s cover photo
Daily Dose of Data Science

Daily Dose of Data Science

Software Development

New delhi, Haryana 25,305 followers

A column with extensive insights on data science. Relevant for professionals at big tech, startups, and students.

About us

Daily Dose of Data Science is a daily newsletter that delivers High-quality insights on Data Science and ML/AI Engineering, along with best practices. It is relevant for professionals at big tech, startups, and engineering students.

Industry
Software Development
Company size
2-10 employees
Headquarters
New delhi, Haryana
Type
Self-Owned
Founded
2022

Locations

Employees at Daily Dose of Data Science

Updates

  • Full fine-tuning, LoRA, and RAG, explained visually👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    𝗙𝘂𝗹𝗹-𝗺𝗼𝗱𝗲𝗹 𝗙𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝘃𝘀. 𝗟𝗼𝗥𝗔 𝘃𝘀. 𝗥𝗔𝗚 explained visually: . . All three techniques are used to augment the knowledge of an existing model with additional data. 1) 𝗙𝘂𝗹𝗹 𝗺𝗼𝗱𝗲𝗹 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 This involves adjusting all the weights of a pre-trained model on task-specific data. While this works pretty well, it is not practically feasible on large models — LLMs, for instance, primarily because of: ↳ Their size. ↳ The cost involved in fine-tuning all weights. ↳ The cost involved in maintaining all large fine-tuned models. 2) 𝗟𝗼𝗥𝗔 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 LoRA fine-tuning addresses the limitations of traditional fine-tuning. The idea is to decompose the weight matrices (some or all) of the original model into low-rank matrices. Next, we train only the LoRA network and freeze the large model. In the graphic below: - the top network represents the model with LoRA layers. - the bottom network represents the large pre-trained model Notice the difference in the number of connections both networks have. This immensely reduces the computational requirements. 3) 𝗥𝗔𝗚 Both full-model and LoRA fine-tuning discussed above involve further training. RAG helps us augment additional information, without fine-tuning the model. There are 7 steps, which are also marked in the above visual: - 𝗦𝘁𝗲𝗽 𝟭-𝟮: Take additional data, and dump it in a vector database after embedding. (This is only done once. If the data is evolving, just keep dumping the embeddings into the vector database. There’s no need to repeat this again for the entire data) - 𝗦𝘁𝗲𝗽 𝟯: Use the same embedding model to embed the user query. - 𝗦𝘁𝗲𝗽 𝟰-𝟱: Find the nearest neighbors in the vector database to the embedded query. - 𝗦𝘁𝗲𝗽 𝟲-𝟳: Provide the original query and the retrieved documents (for more context) to the LLM to get a response. Of course, there are many problems with RAG too, such as: - It involves similarity matching between the query and dumped vectors. However, questions are structurally very different from answers. Thus, we typically retrieve many irrelevant documents. - Typical RAG systems are well-suited only for lookup-based question-answering systems. For instance, we cannot build a RAG pipeline to summarize the additional data. The LLM never receives info about all the documents in its prompt. This is because the similarity matching step only retrieves top matches. We covered RAG from basics to advanced here (with implementation): https://lnkd.in/gRccJPcZ It covers these (with implementation): - RAG fundamentals - RAG evaluation - RAG optimization - Multimodal RAG - Graph RAG - Multivector retrieval using ColBERT - RAG over complex real word docs ft. ColPali -- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc -- 👉 Over to you: what are some other problems with RAGs?

    • No alternative text description for this image
  • A visual explanation of 5 most popular agentic AI design patterns👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    5 𝗺𝗼𝘀𝘁 𝗽𝗼𝗽𝘂𝗹𝗮𝗿 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝗱𝗲𝘀𝗶𝗴𝗻 𝗽𝗮𝘁𝘁𝗲𝗿𝗻𝘀, clearly explained (with visuals): . . Agentic behaviors allow LLMs to refine their output by incorporating self-evaluation, planning, and collaboration! The following visual depicts the 5 most popular design patterns employed in building AI agents. 1) 𝗥𝗲𝗳𝗹𝗲𝗰𝘁𝗶𝗼𝗻 𝗽𝗮𝘁𝘁𝗲𝗿𝗻: - The AI reviews its own work to spot mistakes and iterate until it produces the final response. 2) 𝗧𝗼𝗼𝗹 𝘂𝘀𝗲 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 Tools allow LLMs to gather more information by: - Querying a vector database - Executing Python scripts - Invoking APIs, etc. This is helpful since the LLM is not solely reliant on its internal knowledge. 3) 𝗥𝗲𝗔𝗰𝘁 (𝗥𝗲𝗮𝘀𝗼𝗻 𝗮𝗻𝗱 𝗔𝗰𝘁) 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 ReAct combines the above two patterns: - The Agent can reflect on the generated outputs. - It can interact with the world using tools. This makes it one of the most powerful patterns used today. 4) 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 Instead of solving a request in one go, the AI creates a roadmap by: - Subdividing tasks - Outlining objectives This strategic thinking can solve tasks more effectively. 5) 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 - We have several agents. - Each agent is assigned a dedicated role and task. - Each agent can also access tools. All agents work together to deliver the final outcome, while delegating task to other agents if needed. I'll soon dive deep into each of these patterns, showcasing real-world use cases and code implementations. 👉 Over to you: Which Agentic pattern do you find the most useful? -- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc -- Find me → Avi Chawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

    • No alternative text description for this image
  • Latest open-weights models from DeepSeek that's on par with closed OpenAI o1👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    If you don't know about DeepSeek yet, read this👇 (96% cheaper than OpenAI o1 + open-source) DeepSeek AI has released some open-weight reasoning models (like o1). What's crazy is that it achieves a similar performance as OpenAI o1 but at much lower costs (about 95% cheaper). For instance, per 1M tokens: • OpenAI o1: $60.00 • DeepSeek R1: $2.19 (95% cheaper). Here's how to access them: - Run it locally using Ollama. - Use the chat interface chat[.]deepseek[.]com (select "DeepThink") - DeepSeek API. Here are the performance benchmarks: • AIME 2024: ↳ o1 at 79.2% vs. R1 at 79.8% • MATH-500 ↳ o1 at 96.4% vs. R1 at 97.3% • Codeforces ↳ both o1 and R1 rank in the top 3.7%. • MMLU ↳ o1 at 91.8 vs. R1 at 90.8% (slightly worse than o1). How did they do it? Most models heavily depend on supervised fine-tuning. ☑ DeepSeek-R1-Zero (one of the open-source models) resorted to pure reinforcement learning. ↳ This helped it achieve competitive reasoning scores just through self-evolution. ☑ DeepSeek-R1 (another model) combined RL with a small dataset for a "cold start." ↳ This resulted in readable, human-friendly outputs while maintaining strong performance. ☑ Another interesting thing they did was distilling the reasoning capabilities of R1 into smaller models (Qwen, Llama, etc.) ↳ This way, even the 14B model outperforms the SOTA open-source QwQ-32B-Preview. ↳ The distilled versions of 32B and 70B models set a new record on the reasoning benchmarks. Over to you: Does it make business sense to continue using Open o1 when an open-source model is outperforming it? Let me know in the comments. -- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc

    • No alternative text description for this image
  • A 10x FASTER alternative to GPUs is here👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    This can make GPUs obsolete in AI workflows ⚡️ (a 10x FASTER alternative to GPUs) . . GPUs are not fully efficient for AI workloads. In fact, GPUs were not even originally designed for AI/ML workloads. That is why engineers are developing hardware that directly caters to AI workloads. SambaNova Systems built the world’s fastest AI inference using its specialized hardware stack (RDUs)—a 10x faster alternative to GPU. In fact, their specialized SN40L chip can load models as big as trillions of parameters. The video below shows a real-time inference demo with Llama3.1-405B. Check it out here: https://fnf.dev/3ZI4K1j With SambaNova: - Llama 3.1-8B generates 1k+ tokens/s - Llama 3.1-70B generates 700+ tokens/s - Llama 3.1-405B generates 200+ tokens/s This inference speed matters when: - you need real-time inference. - you need production readiness. - you need cost efficiency at scale. Thanks to SambaNova for showing me their inference engine and partnering with us on today's post.

  • A production-ready RAG app in a few mins with Ragie 👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    This 1-min post will save you at least 1 month of building a production-ready RAG app👇 . . Imagine you want to build a RAG infra for your users to connect their data. This requires: - Setting up OAuth flows for third-party integrations - Creating UIs for user folder selection - Maintaining sync infrastructure to keep data updated - Indexing and partitioning between users efficiently So far, we have only been able to gather the data from your user. And after this, there's the actual RAG mechanism—chunking, indexing, retrieval, and generation. Ragie Connect solves this by providing the entire infra to handle authentication, authorization, and syncing for your users’ data. Ragie Connect: https://ragie.ai/connect With Ragie Connect, you can simply generate a redirect URL for the integration you need, and Ragie takes care of the rest. Ragie Connect supports most of the popular integrations: ☑ Google Drive ☑ Salesforce ☑ Notion ☑ Jira ☑ Many many more. This way, your app receives indexed, ready-to-use data without any extra infrastructure. 👉 Over to you: What are some other challenges in setting up a RAG infra for your users?

    • No alternative text description for this image
  • Build RAG pipelines with any data source in a few minutes 👇

    View profile for Akshay Pachaar

    Co-Founder DailyDoseOfDS | BITS Pilani | 3 Patents | X (187K+)

    Let's build a robust multi-tenant RAG app that easily integrates with any data source! . . Setting up a RAG infra can be an absolute nightmare. Here's everything you need to do: - OAuth for third-party integrations - Designing UIs for data ingestion - Continuous sync for data updates - Optimizing user data partitioning So far, you have only collected the data from the user. You still need to do RAG—chunking, indexing, retrieval, and generation. What is I tell you there's a very easy solution! ✨ Ragie Connect solves this by providing the complete infrastructure to handle authentication, authorization, and syncing for your users' data. In this video, I provide a detailed walkthrough on how to do it. We also build an app using their open-source base-chat project! The best part? Ragie Connect supports integration with: - Google Drive - Salesforce - Notion - Jira - etc. And almost every popular data source—you name it! If you're looking to integrate RAG into your platform, you should check out Ragie. Get started here: https://ragie.ai/connect

  • Build a 100% local Agentic RAG app with CrewAI👇

    View profile for Akshay Pachaar

    Co-Founder DailyDoseOfDS | BITS Pilani | 3 Patents | X (187K+)

    I just created a 100% local Agentic RAG app! . . It's powered by a locally running Llama 3.2 and has ability to search through your docs and fallback to web search incase it doesn't find the answer there. Tech Stack: - CrewAI for multi-agent orchestration  - Qdrant to self-host a vector DB - Firecrawl for web search Primarily the app features two agents: 1️⃣ Retriever Agent The retriever agent is responsible for retrieving the right context for the user query and is assigned a task to do so. 2️⃣ Response Gen Agent The Response Gen Agent is responsible taking the user query and context provided by the retriever agent and generate a coherent response to the user I have shared link to all the code in comments, it's fairly easy to follow along and customise to your needs! I encourage you to try it out, I'll help you if you're struck! _____ Find me → Akshay Pachaar ✔️ For more insights & tutorials on AI and Machine Learning.

  • Fix these default settings in PyTorch dataloader 👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    PyTorch dataloader has 2 terrible default settings. Fixing it gave me ~5x speedup⚡️ In any standard PyTorch model training with GPU: - .to(device) transfers the data to the GPU. - Everything executes after this happens on the GPU. This means when the GPU is working, the CPU is idle, and when the CPU is working, the GPU is idle. Memory pinning can optimize this. Here's what it does: - When the model is trained on the 1st mini-batch, the CPU can transfer the 2nd mini-batch to the GPU. - This ensures that the GPU does not have to wait for the next mini-batch of data as soon as it completes processing an existing mini-batch. Enabling this is quite simple in PyTorch: - Set pin_memory=True in the DataLoader object. - During the data transfer step, do this: .to(device, non_blocking=True) Along with this, also specify num_workers in the Dataloader object. The speedup is evident from the image below. --- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc

    • No alternative text description for this image
  • Turn any GitHub repository into LLM-ready text ⚡️ Simply replace "hub" with "ingest" in a GitHub URL and receive a prompt-friendly text ingest for LLMs. Gitingest is 100% open-source and provides: - Directory structure - A brief summary of the project - The entire content as LLM-ready text -- Interested in ML/AI Engineering? Sign up for our newsletter for in-depth lessons and get a FREE eBook with 150+ core DS/ML lessons: https://lnkd.in/gB7yHExC

  • A simple technique to optimize model training👇

    View profile for Avi Chawla

    Co-founder DailyDoseofDS | IIT Varanasi | ex-AI Engineer MastercardAI | Newsletter (130k+)

    Not many people know this simple technique to optimize model training👇 . . Imagine an image classification task, say, MNIST, for simplicity. Normalizing/scaling the pixel values is a common technique to stabilize model training. So here’s what the implementation looks like: - First, we load the dataset, transform it, define the model, etc. - Next, we have the regular training loop where the data is transferred to the GPU. Here's the problem with this approach: If you look at the profiler: - Most of the time/resources will be allocated to the kernel (the actual training code). - However, a significant amount of time will also be dedicated to data transfer from CPU to GPU. Reducing the data transfer is simple. Recall that the original dataset was composed of pixel values. These were 8-bit integers, and we normalized them to 32-bit floats. Next, we transferred these 32-bit floating point tensors to the GPU. This meant that normalizing the data led to more data being transferred. Solution? Moving the normalization step after the data transfer will solve this since we shall be transferring 8-bit integers instead of 32-bit floats. As a result, you will notice a significant drop in the data transfer step. -- I had trained several models before I accidentally discovered this, but it never occurred to me that there could be such a subtle way to optimize model training. Of course, this technique doesn’t apply to all neural network use cases, like NLP, where we inherently deal with 32-bit float embeddings. However, whenever I have identified any possibility to use this trick, I have experienced noticeable gains from it. -- If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc -- 👉 Over to you: What are some lesser-known ways of optimizing model training that you are aware of?

    • No alternative text description for this image

Similar pages

Browse jobs