Slim-Llama is an LLM ASIC processor that can tackle 3-bllion parameters while sipping only 4.69mW - and we'll find out more on this potential AI game changer very soon

Slim Llama
(Image credit: KAIST)

  • Slim-Llama reduces power needs using binary/ternary quantization
  • Achieves 4.59x efficiency boost, consuming 4.69–82.07mW at scale
  • Supports 3B-parameter models with 489ms latency, enabling efficiency

Traditional large language models (LLMs) often suffer from excessive power demands due to frequent external memory access - however researchers at the Korea Advanced Institute of Science and Technology (KAIST), have now developed Slim-Llama, an ASIC designed to address this issue through clever quantization and data management.

Slim-Llama employs binary/ternary quantization which reduces the precision of model weights to just 1 or 2 bits, significantly lowering the computational and memory requirements.

To further improve efficiency, it integrates a Sparsity-aware Look-up Table, improving sparse data handling and reducing unnecessary computations. The design also incorporates an output reuse scheme and index vector reordering, minimizing redundant operations and improving data flow efficiency.

Reduced dependency on external memory

According to the team, the technology demonstrates a 4.59x improvement in benchmark energy efficiency compared to previous state-of-the-art solutions.

Slim-Llama achieves system power consumption as low as 4.69mW at 25MHz and scales to 82.07mW at 200MHz, maintaining impressive energy efficiency even at higher frequencies. It is capable of delivering peak performance of up to 4.92 TOPS at 1.31 TOPS/W, further showcasing its efficiency.

The chip features a total die area of 20.25mm², utilizing Samsung’s 28nm CMOS technology. With 500KB of on-chip SRAM, Slim-Llama reduces dependency on external memory, significantly cutting energy costs associated with data movement. The system supports external bandwidth of 1.6GB/s at 200MHz, promising smooth data handling.

Slim-Llama supports models like Llama 1bit and Llama 1.5bit, with up to 3 billion parameters, and KAIST says it delivers benchmark performance that meets the demands of modern AI applications. With a latency of 489ms for the Llama 1bit model, Slim-Llama demonstrates both efficiency and performance, and making it the first ASIC to run billion-parameter models with such low power consumption.

Although it's early days, this breakthrough in energy-efficient computing could potentially pave the way for more sustainable and accessible AI hardware solutions, catering to the growing demand for efficient LLM deployment. The KAIST team is set to reveal more about Slim-Llama at the 2025 IEEE International Solid-State Circuits Conference in San Francisco on Wednesday, February 19.

You might also like

Wayne Williams
Editor

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Read more
Half man, half AI.
Yet another tech startup wants to topple Nvidia with 'orders of magnitude' better energy efficiency; Sagence AI bets on analog in-memory compute to deliver 666K tokens/s on Llama2-70B
SambaNova runs DeepSeek
Nvidia rival claims DeepSeek world record as it delivers industry-first performance with 95% fewer chips
d-Matrix Corsair card
Tech startup proposes a novel way to tackle massive LLMs using the fastest memory available to mankind
Cerebras WSE-3
DeepSeek on steroids: Cerebras embraces controversial Chinese ChatGPT rival and promises 57x faster inference speeds
Apple
Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech
Project DIGITS - front view
I am thrilled by Nvidia’s cute petaflop mini PC wonder, and it’s time for Jensen’s law: it takes 100 months to get equal AI performance for 1/25th of the cost
Latest in Pro
An American flag flying outside the US Capitol building against a blue sky
Sean Plankey selected as CISA director by President Trump
A young man working on laptop in office writing notes
Ending the fix/break cycle of End User Computing support
OpenAI
OpenAI wants to help your business build its next generation of AI agents
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
Nation-state threats are targeting UK AI research
A hand reaching out to touch a futuristic rendering of an AI processor.
Business investors are positive about AI’s impact on the economy
Scam alert
Fake jobs and phone calls: How Americans lost $12.5 bn to fraud in 2024
Latest in News
Mufasa is joined by another lion, a monkey and a bird in this promotional image
Mufasa: The Lion King prowls onto Disney+ as it finally gets a streaming release date
An American flag flying outside the US Capitol building against a blue sky
Sean Plankey selected as CISA director by President Trump
An Nvidia GeForce RTX 4060 on a table with its retail packaging
Nvidia RTX 5060 GPU spotted in Acer gaming PC, suggesting rumors of imminent launch are correct – and that it’ll run with only 8GB of video RAM
Indiana Jones talking to a friend in a university setting with a jaunty smile on his face
New leak claims Indiana Jones and the Great Circle PS5 release will come in April
A close up of the limited edition vinyl turntable wrist watch from AndoAndoAndo
This limited-edition timepiece turns the iconic Technics SL-1200 turntable into a watch, and I want one
A close up of Gemma sitting down in Severance season 2 episode 7
'I'm like Gemma – I'm in the dark': Severance star Dichen Lachman shares disappointing filming update for the popular Apple TV+ show's third season