Language Models: Going from Large to Small. An Enterprise push for performance, security, deployment, and customizability

Darwin Ling

Founder / General Partner @ Good AI Capital; Distinguished Alumni @ Purdue; Guest Lecturer, Advisory Council for Healthcare Initiative @ Chicago Booth

Published Aug 5, 2024

Present-day Enterprise AI Adoption

Organizations integrating Generative AI often take two initial approaches. One is simply offering an enterprise ChatGPT that can answer questions about their own proprietary data, which can provide an intuitive productivity boost. The other is integrating large language models (LLM) in business software, such as function-specific chatbots, virtual agents, content generators, etc.

A common and practical way organizations can leverage LLMs in these applications is through a Retrieval Augmented Generation (RAG) architecture. This enables LLM-powered chatbots and agents to retrieve relevant proprietary information from internal databases (e.g. vector database or general enterprise search software) and tailor their responses using prompt engineering techniques.

RAG architecture diagram referenced from

Building Specialized AI Applications: A Healthcare Case Study

To illustrate the potential of advanced AI integration, let's consider the development of an AI medical scribe agent. This agent would perform charting and note-taking tasks, allowing healthcare providers to focus more on patient care. The process of building such a specialized AI application involves several key steps:

Fine-tuning an LLM on comprehensive healthcare knowledge, including medical terminology, CPT codes, pharmacology and drug interactions, and HIPAA guidelines.
Implementing a RAG architecture to:
Utilizing prompt engineering to shape the assistant's output into standardized medical reports.

This approach demonstrates how AI can be tailored to specific industry needs, combining vast knowledge bases with real-time data retrieval and customized outputs.

On the topic of healthcare, last year, we wrote about the potential for LLMs to impact various health and biotech areas such as patient consultation, precision medicine, and protein folding. We’ve seen lots of progress on those fronts materialize since that article. Hospitals and primary care providers are adopting AI assistants in various practice settings. Startups are aggregating patient prescription history, lab test results, and wearable device data to provide personalized health goals and advice. Many of these solutions have been implemented using a RAG solution. However, challenges with model security, deployment, and fine-tuning arise from this.

Challenges in Implementing LLM Solutions

While RAG architectures offer significant advantages, they also present several challenges. One major issue is the computational overhead and latency introduced by the retrieval step. In RAG systems, searching through large databases to find relevant information can be time-consuming and resource-intensive, especially for real-time applications. Additionally, prompts may then be directed to an external third-party LLM for processing and response. This can lead to slower response times and increased costs, particularly when dealing with large-scale deployments or high-frequency queries.

Another challenge is the potential for hallucinations when combining retrieved information with the language model's generated text. LLMs may produce responses that contradict or misinterpret the retrieved data, leading to unreliable outputs. This issue becomes untenable for industries like healthcare, law, or finance, where accuracy is critical.

Security concerns pose another significant hurdle, especially in highly regulated industries like healthcare. Hospitals and other healthcare providers cannot simply adopt off-the-shelf solutions like ChatGPT due to strict data privacy regulations and the sensitive nature of patient information. These organizations need to implement robust security measures and often develop custom solutions to circumvent potential security issues. We plan to cover these security challenges and solutions deeply in a future post.

Evolution Towards AI Agents and Small Language Models

A healthcare AI architecture diagram illustrating the interactive nature of multi-agent systems. Referenced from

As the industry grapples with these challenges, we're observing a shift towards more sophisticated AI agent architectures. Nvidia, with its NIM (Nvidia Inference Microservices) architecture, is driving this evolution with industry partnerships with companies like Hippocratic AI, which was presented during GTC 2024. Small language models (SLMs) will be a key component in enabling and improving these solutions.

Small Language Models offer several advantages that directly address the challenges faced by traditional LLM implementations:

Self-maintained models: SLMs can be more easily maintained and updated within an organization's infrastructure, reducing reliance on external service providers.
On-premise & on-device hosting: Unlike cloud-based solutions, SLMs can be hosted on-premises, giving organizations greater control over their data and processing.
Enhanced security: With on-premises deployment, sensitive data never leaves the organization's control, significantly mitigating security risks.
Easier fine-tuning: SLMs are more manageable in size, making it easier and less resource-intensive to fine-tune them for specific tasks or domains.
Reduced latency: Fine-tuned SLMs can reduce or eliminate the need for extensive retrieval operations, thus addressing the computational overhead and latency issues associated with RAG architectures.
Improved consistency and accuracy: SLMs can be tailored to specific domains, reducing the risk of hallucinations or inconsistencies when generating responses.

The race to develop and release SLM models has intensified, with tech giants like Meta, OpenAI, Apple, and Microsoft rapidly iterating on their offerings—Llama 3.1 8B, GPT-4.0 Mini, DCLM-7B, and Phi-3-mini, respectively. These companies promise improved performance, cost-efficiency, and customizability, which could reshape the AI landscape and make advanced AI capabilities more accessible to a broader range of organizations, applications, and edge devices.

Specialized Expertise and Industry Collaboration

Implementing SLM-based solutions requires technical knowledge and close collaboration with subject matter experts (SMEs). My experience at Palantir underscores this critical point.

At Palantir, working closely with client SMEs was essential to building valuable data-augmented applications. The same principle applies to AI development. AI engineers must engage intensively with domain experts at every level of the application stack - from data preparation to model fine-tuning to application design.

This collaboration ensures that the AI solution is not just technically sound but practically valuable and tailored to specific industry needs. It bridges the gap between technical capabilities and real-world application, much like I did at Palantir with data solutions.

The challenge lies in finding AI professionals who can effectively engage with SMEs and domain experts willing to dive deep into the AI development process. Organizations that foster this collaborative environment will be well-positioned to develop genuinely transformative AI solutions, presenting promising opportunities for investors in the AI landscape.

Our Continued Investment Focus

As investors in AI and data, we see a significant opportunity to focus on the infrastructure and tools that enable organizations to build, deploy, and manage SLMs effectively. This focus, in turn, can accelerate AI adoption in Good AI's investment verticals of healthcare, automation, and enterprise solutions.

At Good AI, we have a front-row seat to how our portfolio companies and industry partners are leveraging AI into their workplaces and products. If you're a founder advancing the SLM or AI frontier, particularly in areas that support our focus verticals, please reach out. We'd love to hear from you.

Language Models: Going from Large to Small. An Enterprise push for performance, security, deployment, and customizability

Darwin Ling

Founder / General Partner @ Good AI Capital; Distinguished Alumni @ Purdue; Guest Lecturer, Advisory Council for Healthcare Initiative @ Chicago Booth

Present-day Enterprise AI Adoption

Building Specialized AI Applications: A Healthcare Case Study

Challenges in Implementing LLM Solutions

Recommended by LinkedIn

Evolution Towards AI Agents and Small Language Models

Specialized Expertise and Industry Collaboration

Our Continued Investment Focus

Good AI's Newsletter

1,196 follower

More articles by this author

Insights from the community

Others also viewed

The Unsung Heroes of AI: Data Annotation, Synthetic Data, and Real-Time Data Curation

Transforming Industries with Mistral's New SDK for AI Fine-Tuning

Enterprise AI Revolution: A Comprehensive Framework for Implementing Advanced GenAI Systems on Databricks with SAP, Salesforce & Workday Integration

Scalable Embeddable AI

The Future of AI Relies on Data Annotation: Here’s Why ?

Top 13 AI Implementation Partners to Consider in 2025

Top 10 AI Consulting Companies 2024

Demystifying Enterprise AI

Building a State-of-the-Art Generative AI Center of Excellence (CoE) for SMBs

Data Annotation Tools Market Size, Share, Growth, Insights, Trends and Forecasts Analysis 2031

Explore topics

Present-day Enterprise AI Adoption

Building Specialized AI Applications: A Healthcare Case Study

Challenges in Implementing LLM Solutions

Recommended by LinkedIn

Evolution Towards AI Agents and Small Language Models

Specialized Expertise and Industry Collaboration

Our Continued Investment Focus

Good AI's Newsletter

1,196 follower

Good AI 2024 Year-End Update: Building Resilience, Celebrating Milestones

Dec 27, 2024

AI Infrastructure Evolution: Bringing Fortune 500-Level Data Tools to SMBs

Nov 10, 2024

Doubling Down on Healthcare

Sep 19, 2024

From launching GROOT to investing in Serve Robotics, how NVIDIA is ushering us to an autonomous future

Jul 29, 2024

From fending off 89 competitors with one computer graphics textbook to making non-consensus bets on AI, how NVIDIA has become a $2T market leader

Apr 24, 2024

From Mark Cuban's CostPlusDrugs to Good AI, how healthcare can finally be made accessible to all

Feb 21, 2024

FDA approves two gene therapies for sickle cell disease, including first that uses CRISPR/Cas9

Jan 8, 2024

Meeting Dr. Carolyn Bertozzi, the 2022 Chemistry Nobel Laureate

Nov 4, 2023

A Tale of Two Exits

Aug 24, 2023

How DataBricks' $1.3B acquisition of MosaicML reveals the latest trend in Generative AI: Co-Piloting for the Enterprises

Jul 16, 2023

Insights from the community

Others also viewed

The Unsung Heroes of AI: Data Annotation, Synthetic Data, and Real-Time Data Curation

Transforming Industries with Mistral's New SDK for AI Fine-Tuning

Enterprise AI Revolution: A Comprehensive Framework for Implementing Advanced GenAI Systems on Databricks with SAP, Salesforce & Workday Integration

Scalable Embeddable AI

The Future of AI Relies on Data Annotation: Here’s Why ?

Top 13 AI Implementation Partners to Consider in 2025

Top 10 AI Consulting Companies 2024

Demystifying Enterprise AI

Building a State-of-the-Art Generative AI Center of Excellence (CoE) for SMBs

Data Annotation Tools Market Size, Share, Growth, Insights, Trends and Forecasts Analysis 2031

Explore topics