A dream LLM can turn into a deployment nightmare
Generated with Midjourney

A dream LLM can turn into a deployment nightmare

This is the first article of Beyond Entropy, a space where the chaos of the future, the speed of emerging technologies and the explosion of opportunities are slowed down in short posts. For longer and detailed posts, check out my newsletter Turning bits into dreams (link below).


A dream LLM can turn into a deployment nightmare

Are Large Language Models (LLMs) changing the Machine Learning workflow? If so, what are the main differences and issues?

Over the past year, I am sure that many AI specialists, from data scientists to project owners, are pondering these questions. Having personally worked on several LLM projects over the last few months, I am starting to get a feel for them and would like to try and answer these questions. I will try to summarise my thoughts in this short post.

On the one hand, LLMs make us dream, they open up many possibilities since it is easy to create something beautiful. On the other hand, it can become a nightmare to build production-ready applications with them. It is necessary to consider and be aware of LLMs limitations. Below I list the ones I have come across and consider most relevant.

The ambiguity of prompting

In computer science, instructions written in programming languages such as Python, C++ or JavaScript are known to be mostly exact. On the other hand, when it comes to LLMs, instructions are written in natural languages. Given the ambiguous nature of the latter, prompt engineering is a programming paradigm that lacks rigour. This makes it very flexible and easy to use, but can cause frustration. If one adds the nascent nature of prompt engineering, this leads to a rather negative development experience.

Silent failures

If you receive standard software, e.g. written in Python, and you add a random character or accidentally remove a line, it will basically not work and will rise an error. On the other hand, if you slightly modify a prompt, any LLM will still work, but will give very different results. This shows how prompting can lead to many silent failures.

Stochasticity

Generative algorithms, such as current LLMs, are stochastic. They always produce slightly different results, in contrast to non-generative programmes or standard machine learning models that are deterministic. Deterministic results are generally safer, especially when an application is based on more than one linked level. The unpredictability of results can increase enormously if there is no control over individual outputs. Therefore, when it comes to LLMs, we must accept ambiguity. However, despite its scientific interest, stochasticity is an underappreciated feature in industrial applications.

Maintenance

In prompt engineering, there are some general rules or paradigms, but for best performance, each LLM requires specific variations. For example, if a set of prompts is perfectly designed to solve a task with one LLM, there is no way to guarantee that all prompts will also work with a newer LLM. This could lead to severe headaches and a huge maintenance cost.

Robustness and dependence

Prompt programming is not yet robust to changes. For example, if you have started a series of prompts using the role-playing technique, such as "You are a creative writer and you must help me to...", and a few months later Llama or GPT4 are updated by Meta or OpenAI with the role-playing already integrated, then all your prompts need to be modified to incorporate the new changes.


A question arises: how can we tackle these problems? To answer this question completely, I think I need more time and experience. In a few months, everything will perhaps be clearer. For the moment, I can suggest you to explore Prompt versioning on W&B and this guide from OpenAI with some tricks and best practices. In addition, to mitigate stochasticity you can set temperature to zero even if this does not completely solve the problem, as explained in this discussion. Do you know other techniques to overcome these problems?

In this short post, without wanting to dampen enthusiasm for LLMs, I wanted to point out a few aspects that demonstrate some of their problems. If this has discouraged you towards their use, then don't worry. To regain your interest, I immediately invite you to read this recent post of mine on the fascinating emerging properties of LLMs, which is part of my newsletter Turning bits into dreams. There, I deal more extensively with topics of scientific interest ranging from AI to Fundamental Physics.


Opportunities, talks, and events

I share some opportunities in you might find interesting (please, if interested contact me for more info):

Job & Research opportunities

👗 A startup in Milan, using Machine & Deep Learning in the fashion sector is looking for a junior Data Scientist or AI developer;

🍷 A startup in Milan, using Deep Learning & NLP in the food and wine sector is looking for a 2-4 year experienced Data Scientist;

🇪🇸 A Barcelona-based venture studio, Antai Venture, is looking for a full-time AI Specialist;

⚛️ The quantum company Quantinuum is looking for a Research Software Engineer;

⚛️ Covestro is opening a PhD opportunity in Quantum Computing for computational chemistry;

Talks, Conferences, and Courses

🎙 Tech Talk at Pi School (August, 30th): Classic and Explainable AI Methods in Vaccine Development by Francesco Patanè ;

🌊 CodingWaves is organizing several AI courses and workshop in Milan and in many European outdoor locations.

If you would like to get in touch with me or view my lectures and courses (technical and non-technical), you can find everything here.

Marco Cello

🤖 Building Decentralized AI Agents...

1y

Interesting reading Cristiano! We treat GPT output as a regular API output (same input-same output, predetermined format). But this is so far from reality, so we need to work around, try and error with prompting.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics