Devil is in the deets

Jaya Gupta

Partner @ Foundation Capital

Published Jun 2, 2023

In the past six months, I've found myself immersed in hackathons and AI meetups almost every weekend, connecting with software engineers venturing into the AI realm with little to no machine learning experience. These pioneers have now started "graduating" from these six-month-long hackathon sprints, showcasing a newfound appreciation for the complexities and intricate details that arise when implementing LLMs in real-world scenarios.

One such "graduate" recently caught my attention. His reflection: "I remember your tweet about demos not making into production. After going through this journey, I've realized that the value of next-generation AI applications lies within the minute, often overlooked, and complex details." This underscores a principle I strongly resonate with – the true essence of value is hidden in the nuances and fine points that are often overlooked amid broader narratives.

And so, as we tread the path of integrating LLMs into real-world applications, we face a series of challenges with varying degrees of complexity. It's worth noting that these challenges extend beyond LLM agents, for which I am yet to witness a production use case.

Two of the most common and prominent hurdles for enterprise use cases remains managing context windows in LLMs and prompt engineering. Consider this: Enterprises have a wealth of information at their disposal, but can only access a small segment of it for a specific use case. This mirrors the operational realm of LLMs. The key challenge lies in choosing the most suitable 'context' for the model to consider. While vector databases and queries are excellent tools for retrieving relevant contexts, they may not be ideal for all tasks. Different tasks necessitate different types of indices. For instance, list indices may outperform in retrieving and summarizing documents, underscoring the importance of flexibility and adaptability in the systems implementing LLMs.

Another overstated challenging aspect is prompt engineering—crafting prompts that guide the model's responses. Predictability is a significant hurdle here. The same prompt may produce different results at different times, which can lead to ambiguities and inconsistencies, particularly when parsing specific information. Furthermore, future models might not respond effectively to best-practice prompts. When dealing with intricate products involving chains of prompts, inconsistencies can increase, leading to hallucinations or incorrect, irrelevant outputs. This unpredictability hampers reproducibility, as the same prompt with the same settings can yield different outcomes, posing significant challenges to product consistency.

Recommended by LinkedIn

Prompt Engineering Best Practices, the ODSC West 2024…

Open Data Science Conference (ODSC) 2 months ago

Introducing Resolve: An AI Production Engineer

Greylock 2 months ago

An AI Assistant For The Entire Software Development…

Jyoti Bansal 1 year ago

Another challenge that often goes unnoticed is keeping the source data updated. Fresh, relevant data is the lifeblood of an AI system. Outdated data can severely hinder an LLM, leading to inaccurate or irrelevant outputs. The quest for fresh data extends to downstream dependencies, such as vector stores, docstores, and indices. Maintaining a robust AI system requires a relentless pursuit of data freshness, a seemingly minor detail carrying substantial implications for real-world applications.

Perhaps the most daunting challenge lies in the evaluation stage. Assessing LLMs or LLM-based applications is notoriously complex. Traditional evaluation metrics often fall short in capturing the complexities of real-world enterprise scenarios. Many enterprises have found value in error analysis—categorizing the AI's errors to identify patterns. This process is similar to a data scientist's workflow, requiring constant refinement and reevaluation to optimize the model for superior outcomes.

These challenges underscore the intricate reality of deploying AI—particularly LLMs—in real-world applications. The journey might be strenuous, demanding continuous effort, diligent error analysis, and thoughtful solution-seeking. Often, a collaborative "Tiger Team" approach, blending product & software engineering skills with ML and data science expertise, can help navigate these complexities. Companies deploying LLMs beyond mere knowledge retrieval will be the ones to truly reap the benefits of increased valuation.

Navigating this labyrinth underlines that mastering AI involves a blend of technological innovation, persistent problem-solving, and a deep understanding of minute details. Amid all the hype, it's essential to focus on these often-overlooked nuances that invite founders to innovate and ultimately drive the real value in next-generation AI applications.

Vik Chaudhary

VP Product and Alliances at DevZero.io. Mantra: Reach out to help those climbing behind you.

Jaya, in your quest for sifting through AI, this thoughtful report reflects your depth. Question: why do you feel Enterprises can only access a small portion of their data? There are no technical or (insurmountable) privacy barriers. Is it a gap in vision and tech talent?

Cody Collier

a builder ⌁ ml・ai・data

This is a great report based on frontline observations. It overlaps some of the same observations and conclusions I've experienced. Please keep investigating and sharing! nice: "These challenges underscore the intricate reality of deploying AI"

Devil is in the deets

Jaya Gupta

Partner @ Foundation Capital

Recommended by LinkedIn

More articles by this author

Insights from the community

Others also viewed

Navigating the Future: Software Engineering in an AI-Dominated World

Production-Grade Prompt Engineering: A Comprehensive Analysis of Strategies, Methodologies, and Applications.

The Future of Software Engineering with Devin AI

Eight Prompt Engineering Implementations [Updated]

Breaking Boundaries: The Revolutionary Vision of DSPyGen

Cognilytica’s Prompt Engineering Best Practices Guide: Prompt Patterns (Part 1 of 6)

From Code to Creativity, a synopsis from my recent webinar.

EDITION 12: Top 4 Pitfalls to Avoid in Modern Observability

April brings some exciting news at Cactus that we want to share with you and encourage you to discover more about our work and plans.

Summer Wrap - Code, Inference, RAG, Data, Security

Explore topics

Recommended by LinkedIn

Service as Software Part 3: How Systems of Agents will collapse the enterprise stack

Dec 3, 2024

A System of Agents brings Service-as-Software to life

Oct 31, 2024

From Systems of Intelligence to Systems of Agents: The New Moats in Enterprise Software

Oct 15, 2024

Overhauling logistics with AI: a $79 billion opportunity

Oct 14, 2024

Shock-proofing supply chain with AI: a $62 billion opportunity

Sep 24, 2024

The Observability Crisis

Aug 20, 2024

Goodbye AIOps: Automating SREs - the next $100B opportunity

Aug 13, 2024

Beyond LLMs: Building magic

Jun 27, 2024

6'5. Blue Eyes. Trust Fund. Finance.

Jun 25, 2024

10 takeaways from interviewing AI researchers

Jun 24, 2024

Insights from the community

Others also viewed

Navigating the Future: Software Engineering in an AI-Dominated World

Production-Grade Prompt Engineering: A Comprehensive Analysis of Strategies, Methodologies, and Applications.

The Future of Software Engineering with Devin AI

Eight Prompt Engineering Implementations [Updated]

Breaking Boundaries: The Revolutionary Vision of DSPyGen

Cognilytica’s Prompt Engineering Best Practices Guide: Prompt Patterns (Part 1 of 6)

From Code to Creativity, a synopsis from my recent webinar.

EDITION 12: Top 4 Pitfalls to Avoid in Modern Observability

April brings some exciting news at Cactus that we want to share with you and encourage you to discover more about our work and plans.

Summer Wrap - Code, Inference, RAG, Data, Security

Explore topics