Katrin Selbak’s Post

View profile for Katrin Selbak, graphic

English-Estonian sworn translator @Pilvekiri // Translation Studies MA @UniTartu (EMT network) // legal/IT/marketing translation/MTPE

The trilemma of AI/LLMs. Legal (privacy/confidentiality/copyright), accuracy and environmental (resource) issues. ... For getting more accurate results, we need more training data, which could raise more legal issues. ... Also, to get more accurate results from the huge training data, we need more computational power and thus more resources, which raises environmental issues. It reminds me how at Translating Europe Forum #TEF2024 ... we talked much about the accuracy part (how to use AI in translation and terminology management to get the best results), ... the legal issues arose a bit here and there (like is everything available on the Internet also not copyright-protected and thus ok to feed to AI), ... and the environmental issues were several times raised in the questions from the audience and chat, but not really addressed by any speakers nor panelists. Is this something that will continue to evolve - from accuracy issues to legal and then environmental? ... Has it been happening in some other areas - the first concern being about whether it works right and gives the expected results, then whether this working is legal, and after that whether it is also environmentally reasonable? ... What if we included all these different concerns right from the beginning?

View profile for ANANT VERMA, graphic

M.Tech | IIT Patna | Artificial Intelligence and Data Science Engineering

🚀 Optimizing Large Language Models: Diving into Quantization for Efficiency and Performance Today, I focused on the fascinating realm of quantization, exploring both symmetric and asymmetric techniques. In the ever-evolving world of AI, fine-tuning large language models (LLMs) presents both exciting opportunities and significant challenges, particularly around computational costs and resource requirements. One promising solution is quantization, a technique designed to make these massive models more efficient by reducing the precision of their data. 💡 Real-World Example: The LLaMA3.1–70B model with FP32 precision requires a staggering 336 GB VRAM, making inference feasible only with multiple high-end GPUs. But with 4-bit quantization, the memory footprint reduces by ~90% to just 42 GB, enabling efficient deployment on a single A100 GPU. This demonstrates quantization's transformative potential in democratizing LLM accessibility. What is Linear Quantization? Linear quantization is one of the most widely adopted methods for compressing LLMs, mapping model weights from higher precision (e.g., FP32) to lower precision (e.g., FP16, BF16, INT8). 🔑 Two Main Modes: 1️⃣ Asymmetric Linear Quantization: Flexible for datasets with varying ranges. 2️⃣ Symmetric Linear Quantization: Simple and hardware-friendly. Types of LLM Quantization 🔸 Post-Training Quantization (PTQ): Quick and efficient, applied after training. 🔸 Quantization-Aware Training (QAT): Yields higher accuracy by training with quantization in mind. Quantization isn't just about making models smaller; it's about making them smarter, scalable, and accessible for everyone. Stay tuned for the next update as we explore advanced techniques for quantization and their real-world applications! #LLM #FineTuning #Quantization #AI #MachineLearning #Optimization

To view or add a comment, sign in

Explore topics