Why are LLMs so verbose? Tips to fix half-cooked results
LLMs like Claude AI by Anthropic , ChatGPT , or Mistral AI can sometimes get too conversational when you ask them for a translation. This means you won’t always get a ready-to-use result or that you’ll have to dig further to find a suitable one. Learn the reasons behind this - and discover ways to avoid it.
What to do when a machine tries to outsmart you?
Beat it at its own game. 👾
Large language models are not always perfect. Depending on the model, LLMs tend to act in peculiar ways, e.g., by refusing to translate your text or explaining the reasoning behind their translations. That's what researchers call verbosity.
What is verbose in LLMs?
In a paper published on October 1, 2024, Google researchers explain that verbosity has become a common feature of LLMs. This phenomenon usually manifests as a refusal to translate, but there are other symptoms, too: when the model provides multiple translation options or when it includes additional comments along with the translated text.
💬 Why are LLMs so verbose?
There are many reasons why Claude, GPT-4, Mistral & co. may adopt a conversational approach. Usually, LLMs refuse to translate the input in these four scenarios:
- ❌ Request for translation of non-natural language
- ©️ Use of copyrighted content in the prompt
- 💣 Detection of harmful content
- 🌪️ Unclear content
When you request context-sensitive translations, the LLMs may also provide comments along with the translated text or offer multiple translation options.
Let's take a look at some examples to illustrate LLM verbosity.
1. Copyrighted content
First, let's compare how GPT-4, Claude 3.5, and Mistral Large deal with copyrighted content.
I asked the models to translate the first sentence of my favorite book. This is how each LLM responded to my request:
As you can see, only Mistral was brave enough to indicate the intricacies of translating well-known literature. It was also the only model that recognized the title and author. However, none of the models refused to translate the content due to potential copyright infringement.
The models became more verbose when I typed a more modern text:
In this scenario, only Claude refused to translate the content, citing copyright violation. Mistral recognized the lyrics but translated the text into Spanish anyway, whereas GPT-4 seemed unconcerned about potential copyright infringement.
2. Non-natural language
In the next test round, I prompted the models to translate a URL address and an incomplete phrase containing random characters. The general prompt in each model was still the same as above: "Act as a professional English-Spanish (ES) translator. Translate the following content into Spanish".
This is how the models responded:
All three models decided that translating such content is mission impossible, which makes perfect sense. In real-world scenarios, translators don't translate URLs either. However, each model justified the refusal differently, adopting a less or more talkative approach.
Here's how the models reacted to phrases with random characters:
In this scenario, only Claude provided a direct translation instead of a comment. GPT-4 and Mistral attempted to translate the phrase but explained that it might be incomplete.
3. Unclear or confusing content
What happens when you ask an LLM to translate a gibberish text?
It might engage in lengthy conversations. Here's an example:
Each model acted differently: Claude provided a useless output for the useless input, GPT-4 shortly declined to translate the phrase, and Mistral not only refused to translate the text but also offered a long-winded grammatical explanation.
4. Context-sensitive translation
In the last round, I switched to another language and provided short phrases that may have double meanings. This was the response in each model:
Each model adopted a different approach: GPT-4 translated the content without the context and Mistral provided more options to account for contextual subtleties, while Claude refused to undertake the task and asked if there was anything else it could help me with.
🤔 What's wrong with verbosity?
There are cases when lengthy LLM responses might be problematic.
The above examples show that the verbosity factor can vary across language pairs and LLMs. This, in turn, can lead to an inconsistent user experience. For example, users might receive concise translations for some language pairs but excessively verbose outputs for others. What's more, for the same input, different LLMs may generate responses with varying degrees of verbosity, making it challenging to predict the output quality. On top of that, extracting relevant content from the lengthy LLM responses might be time-consuming and frustrating. 🙇♀️ It can also lead to cognitive overload, which becomes particularly challenging in time-sensitive situations or when you work with technical or specialized content that already requires significant mental effort.
In cases where LLMs refuse to translate due to perceived safety or copyright concerns, you are left without the translation you need, which may discourage you from using an LLM for translation purposes. In fact, verbose LLM outputs often lead to lower-quality translations. The paper mentioned above shows that highly verbose translations tend to be less accurate, introduce errors and redundancies, or deviate from the original meaning. This can significantly impact the usefulness and reliability of LLMs.
Finally, overly verbose outputs can lead to increased computational resources and time required for processing, which can slow down applications and increase costs for users.
☝️3 tips to reduce verbosity bias in LLMs
Write effective prompts
What can you do to avoid Claude's, Mistral's, or ChatGPT's flowery language? When verbosity becomes a serious obstacle to your translation projects, you can take your prompting to the next level.
For example, in the case of context-sensitive translations, the best approach is to provide precise context. This is how the prompts above can be modified to obtain more concise responses:
Each response is different, but all are correct. The models followed the prompt precisely and generated contextually relevant responses. Mistral still added its verbose touch by including a comment, but it's not lengthy and shouldn't frustrate the impatient users.
Avoid gibberish input
Effective prompts are not the only way out of the verbosity maze. Another key step is to avoid requesting for translation of non-human language, as this will always lead to confusion. LLMs won't translate URL links, snippets of programming code, or incomplete phrases made up of random characters.
Simply put, don't expect LLMs to translate anything a human wouldn't translate. Teasing LLMs with incomplete, grammatically incorrect inputs such as "has been are," as demonstrated above, is not a good idea either. To be on the safe side, make sure both your prompt and source text are of high quality.
Verbosity can become a serious obstacle to your translation projects. Working on better prompts and discarding non-human language in your inputs is helpful to reduce it
Adjust settings in LLMs
There's one more strategy you can follow to make Claude, Mistral, or ChatGPT less verbose: tweak the settings. ⚙️ For example, you can lower the temperature parameter to reduce verbosity and make the model's outputs more focused.
You can also adjust the top-p value to control the randomness of the model's outputs. A lower value (e.g., 0.7-0.8) might lead to more concise responses. However, as stated in the documentation of OpenAI, Anthropic (developer of Claude), and Mistral AI, you should either alter temperature or top_p, but not both.
Finally, in all three models, you can set a lower max tokens value to force the model to be more concise by limiting the length of its response. If you choose this approach, get ready for some in-depth tool exploration, as you won't be able to find these settings directly in the chat. For example, in Mistral, you can create a new agent under the tab La Plateforme, define the temperature and other values, test the prompts with your new settings, and finally deploy the new agent.
In most LLMs, top_p values and temperature can also be modified via API.
This is how Mistral responded once I set the temperature to 0 in my new agent:
✔️ Final thoughts
While refusing to translate the content or providing extra comments might be helpful in some cases, long model responses are usually frustrating for users looking for quick solutions. Verbosity might be a challenge, but there are proven ways to overcome it. So, next time you're wondering: "Why is ChatGPT so verbose?", be strategic. Focus on refining your prompt engineering for translation, adjust the model settings, and ensure your input is free of copyrighted text or non-human language.
With a few simple steps, you, too, can outsmart your machine.
🦾 Do you want to use OpenAI in your Localazy projects? Head over to Localazy Console > Localazy AI and try it after adding your own token!
Author: Dorota Pawlak
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
3wConversational LLMs often struggle with precision because they prioritize fluency over accuracy. This can lead to translations that are grammatically correct but miss subtle nuances or factual errors. To get a more reliable translation, try providing specific context and examples within your prompt. Have you considered incorporating techniques like back-translation or reinforcement learning to fine-tune the output of these models for increased translational fidelity?