Rakesh Sharma’s Post

View profile for Rakesh Sharma, graphic

Data & AI Architect| Multi Cloud Engineering and delivering high impact solutions

#GlitchTokens are like weed in a crop, it can reduce the yield . Similarly a Glitch token can impact the performance of LLM Unlike words, which represent clear and discrete units of meaning in human language, tokens in LLMs can represent anything from whole words to fragments of words or even punctuation. This modular approach allows LLMs to process and generate language with remarkable efficiency and subtlety. But what about glitch tokens? Occasionally, the tokenization process can produce unexpected results—these are known as glitch tokens. These anomalies can occur due to the complex interplay of encoding and training data irregularities. For instance, a glitch token might represent a piece of text that doesn't conform to typical linguistic patterns due to encoding errors or unusual data inputs. Understanding these glitches is crucial for refining model outputs and enhancing the overall robustness of LLMs. A proper training of LLM reduces the changes of glitch, there are ways to find the glitch tokens . I will try to put more about the same in future. #AI #tokens #LLM

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics