Small language models shine for domain-specific or specialized use cases, while making it easier for enterprises to balance performance, cost, and security concerns. Credit: Andrey_Popov/Shutterstock Since ChatGPT arrived in late 2022, large language models (LLMs) have continued to raise the bar for what generative AI systems can accomplish. For example, GPT-3.5, which powered ChatGPT, had an accuracy of 85.5% on common sense reasoning data sets, while GPT-4 in 2023 achieved around 95% accuracy on the same data sets. While GPT-3.5 and GPT-4 primarily focused on text processing, GPT-4o — released in May of 2024 — is multi-modal, allowing it to handle text, images, audio and video. Despite the impressive advancements by the GPT family of models and other open-source large language models, Gartner, in its hype cycle for artificial intelligence in 2024, notes that “generative AI has passed the peak of inflated expectations, although hype about it continues.” Some reasons for disillusionment include the high costs associated with the GPT family of models, privacy and security concerns regarding data, and issues with model transparency. Small language models with fewer parameters than these LLMs are one potential solution to these challenges. Smaller language models are easier and less costly to train. Additionally, smaller models can be hosted on-premises, providing better control over the shared data with these language models. One challenge with smaller models is that they tend to be less accurate than their larger counterparts. To harness the strengths of smaller models while mitigating their weaknesses, enterprises are looking at domain-specific small models, which must be accurate only in the specialization and use cases they support. This domain specialization can be enabled by taking a pre-trained small language model and fine-tuning it with domain-specific data or using prompt engineering for additional performance gains. Let’s look at the top five use cases where organizations consider leveraging small language models, and the leading small language models for each use case. PII masking One of the key concerns for organizations is the exposure of personally identifiable information (PII) from their data when used for training or asking questions to an LLM. An example of PII information is customer’s social security number (SSN) or credit card number. Hence, an extremely important use case is around building a solution that can mask PII data. In addition to masking, another key requirement is to maintain the lineage of the data. For example, the same SSN should be masked by the same identifier so that a downstream application can use the relationship in building effective applications. Phi-3 and Gliner perform very well in PII masking, but the best-performing model for this use case at the time of this writing is the Llama-3.1-8B model. Toxicity detection This use case identifies the presence of undesirable hateful comments in text. An example of toxic text is the use of swear words. As more companies adopt language models to automate customer service interactions, it is extremely important to ensure that no toxic content finds its way into the models’ responses. The RoBERTa model is well-suited for this task. Coding assistance Coding assistance was one of the first use cases for generative AI, and coding assistants have been widely adopted by developers across enterprises. Microsoft claims that 70% of GitHub Copilot users are more productive. Task-specific variants of Llama (Code Llama) and Gemma (CodeGemma) are excellent alternatives to large language models like GPT-4 for this use case. Medical data summarization Medical data summarization and understanding is a specialized use case in the healthcare industry, relying on models trained on the use of medical terms specific to the domain. Examples where the solution makes a high impact is in the summarization of conversations between patients and doctors and between doctors and medical sales representatives. Given the uniqueness of these types of conversations, small language models are well suited to the domain and can make a significant impact. The T5 model is a strong contender among the smaller language models for this task. Vendor invoice processing Lastly, vendor invoice processing is crucial for enterprise procurement departments dealing with invoices at scale. The ability to automatically scan these invoices for information extraction is a non-trivial task due to thousands of variations in invoice structures. Phi-3-vision is an excellent choice of model for the invoice processing pipeline. While large language models are powerful and accurate, they are expensive, and data privacy and security remain significant concerns for enterprises. Small language models make it easier for enterprises to balance performance, cost, and security concerns and help reduce the time needed to get solutions into production. The five use cases we’ve discussed represent just some of the ways enterprises have successfully implemented small language models to address specific needs while mitigating the challenges associated with larger models. Aravind Chandramouli is head of the AI center of excellence at Tredence. — Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.