AI news
This newsletter provides a comprehensive update on the hottest AI advancements from the last few months. We explore groundbreaking LLMs like Meta's Llama-3 and OpenAI's GPT-4 Turbo alongside the cutting-edge AI accelerators unveiled by NVIDIA and Google. We also discuss some general AI news and the future of the field with Andrew Ng's concept of agentic workflows. So, buckle up and get ready for a tour of AI innovation.
Updates on new AI models
The world of AI is evolving at a breakneck pace, with new models and updates being released by tech giants and open-source communities every day. In this section, we'll explore some groundbreaking language models, enhanced visual understanding capabilities, and impressive strides in speech recognition and translation. From Meta's Llama-3 to Databricks' open-source powerhouse DBRX and from Google's Gemini 1.5 Pro to OpenAI's resurgent GPT-4 Turbo, these developments showcase the incredible potential of AI to transform various domains.
Meta makes a splash with Llama-3
Meta releases Llama-3 in two versions: 8B and 70B parameters, both boasting a hefty 8K context window. Trained on a staggering 15 trillion tokens, Llama-3 promises impressive capabilities. The upcoming versions will offer free access on popular platforms, longer context lengths, and the ability to handle different data formats (text, images, etc.). Benchmarks show Llama-3 8B surpassing Mistral 7B, while the 70B version goes neck-and-neck with Gemini Pro 1.5. Notably, Llama-3 8B achieves this with fewer parameters, making it the current efficiency champion.
Databricks DBRX: open-source titan
Databricks unveils DBRX, a groundbreaking open-source LLM ready to shake things up. DBRX already surpasses Llama-2, Grok, and Mixtral in various tests. Here's what makes it unique:
Grok 1.5 gets its “eyes”
Grok, the open-source AI model developed by Elon Musk's team, has taken a significant leap forward with the release of version 1.5V. This latest iteration introduces a game-changing feature: visual processing capabilities.
Grok 1.5V can now analyze and interpret a wide range of information, including documents, diagrams, charts, and images. A new benchmark, RealWorldQA, has been introduced to assess a model's grasp of the real world, and Grok shines in this test, demonstrating its superior ability to reason about physical objects based on visual data. Furthermore, akin to Tesla's car technology, Grok 1.5V possesses the remarkable ability to perceive and interpret the visual landscape with amazing acuity.
Gemini 1.5 Pro is supercharged for the future
Google introduces Gemini 1.5, which shows dramatic improvements across several dimensions. 1.5 Pro achieves comparable quality to 1.0 Ultra while using less compute.
GPT-4 turbo regains the throne
OpenAI's GPT-4 Turbo reclaims its top spot on the Arena leaderboard, outperforming competitors across various domains, such as coding, long questions, and handling multiple languages. It even excels in English-only prompts and conversations involving code snippets. OpenAI CEO Sam Altman boasts that GPT-4 is now "significantly smarter and more pleasant to use.”
Mistral 7b v0.2
Mistral-7B receives an update with version 0.2. This iteration features:
AI accelerators
As AI continues to revolutionize industries worldwide, tech giants are locked in a fierce battle to develop the most powerful and efficient AI accelerators. These specialized chips are designed to handle the complex computations required for LLMs and other AI applications, offering unprecedented speed and performance while reducing costs and energy consumption.
From NVIDIA's game-changing Blackwell platform to Intel's impressive Gaudi 3 chips and from Groq's lightning-fast Language Processing Unit to the cutting-edge offerings from Meta, Microsoft, and Google, the AI accelerator landscape is rapidly evolving. As these companies push the boundaries of what's possible with AI hardware, we're witnessing a new era of innovation that promises to transform the way we interact with technology.
NVIDIA’s Blackwell platform
NVIDIA's new Blackwell platform allows organizations to build and run LLMs with trillions of parameters, all at a fraction of the cost and energy consumption compared to previous solutions. Here's why it's so powerful:
Intel Gaudi 3 challenges the status quo
Intel is making news with its Gaudi 3 AI accelerator chips. They claim that Gaudi 3 outperforms NVIDIA's H100s in both speed and cost, even competing with NVIDIA's latest Blackwell platform.
Groq demonstrates striking LLM Inference speed.
Groq, powered by the world’s first Language Processing Unit Inference Engine, boasts mind-blowing speeds of up to 500 tokens per second. This translates to a massive performance leap – 100 to 600 times faster than traditional GPUs. Groq's LPU card is also significantly cheaper than NVIDIA's H100, making it an attractive option for cost-conscious users.
Meta and Microsoft join the fray.
Meta unveiled their next-generation Meta Training and Inference Accelerator (MTIA), designed to propel AI capabilities forward.
Microsoft Azure is also in the mix, offering the Azure Maia 100 GPU (codenamed Athena or M100) and the Cobalt 100 CPU, a power-efficient 128-core Arm processor.
Google doubles down on AI hardware.
Google isn't letting the competition steal the show. They've introduced their new TPU v5 chip, built to run in massive pods of 8,960 chips. This powerhouse delivers double the performance of the previous generation TPUs. Additionally, Google unveiled the Axion chip, offering a 30% performance boost over general-purpose Arm chips and a 50% improvement over current Intel x86 chips.
Miscellaneous
In this section, we'll explore some exciting announcements from Google Cloud Next 2024. Google unveiled Vertex AI Agent Builder, a powerful tool for creating conversational AI agents, along with new, even more powerful Gemini models. We'll also discuss other captivating advancements in the world of AI, including Stability AI's Stable Audio 2.0, which generates high-quality songs based on text descriptions. Additionally, researchers introduced WE-agent, an AI tool that assists software engineers in fixing bugs within GitHub repositories. We'll also touch on gpt-author, a research project that utilizes AI to create fantasy novels in minutes, and Microsoft's VASA-1, which generates lifelike talking faces from a single photo and audio clip, and more.
Recommended by LinkedIn
AI agents take center stage at Google Cloud Next 2024
Google Cloud Next 2024 surprised everyone by making AI agents the star of the show, even though it was a cloud-centric event. Here's a breakdown of the key announcements:
Google unveiled Vertex AI Agent Builder, a powerful tool that empowers you to create your own conversational AI agents. This intuitive platform allows you to:
Beyond Vertex AI Agent Builder, Google Cloud Next unveiled a plethora of other AI-powered innovations:
Google announced a suite of Gemini assistants specifically designed to enhance various Google Cloud services. These AI-powered assistants will streamline workflows and provide real-time support for developers, data analysts, and other cloud users. Here are some examples:
Google introduced their Axion processors, boasting significant performance gains over traditional CPUs and Arm chips.
Google also integrates AI functionalities into Workspace and Google Vids, promising to improve productivity and collaboration.
High-quality songs from text prompts
Stability AI releases Stable Audio 2.0, a new AI music generation model capable of producing high-fidelity, three-minute songs from a simple text description. This is a significant leap from the first version, which debuted in 2023 and could only create short pieces.
Stable Audio 2.0 offers several new features that expand its creative potential. Users can now describe the music they want with text prompts, upload audio samples, and transform them using text descriptions. This allows for more flexibility and control over the music creation process.
The model can also generate a wider range of sounds and sound effects, and it has a new style transfer feature that allows users to customize the overall feel of the generated music. Stability AI, like OpenAI, emphasizes its commitment to responsible AI development. The model is trained on a dataset that has been cleared of copyrighted material, and it uses advanced content recognition technology to prevent users from uploading infringing content.
WE-agent
Researchers from Princeton University created an AI tool called WE-agent that can help software engineers fix bugs and problems in real GitHub repositories.
WE-agent works by interacting with LLMs like GPT-4 through a specially designed interface. This interface makes it easier for the LLM to understand the code and perform actions like browsing the repository, viewing files, and editing code.
In tests, WE-agent achieved state-of-the-art performance, resolving over 12% of issues in a benchmark dataset. The researchers emphasize that the success relies on both the capabilities of the underlying LLM and the well-designed interface.
Here are some key features of WE-agent's interface:
The researchers will soon be publishing a detailed paper on WE-agent.
gpt-author
gpt-author is a new research project that utilizes AI to create fantasy novels in just minutes. It leverages several powerful AI models: GPT-4 for writing the story, Stable Diffusion for generating cover art, and Anthropic's API for additional functionalities.
Users provide a starting prompt describing their desired story and the number of chapters. gpt-author then uses GPT-4 to brainstorm potential plots, select the most engaging one, and refine it for a captivating story. Following the chosen plot, the AI crafts each chapter individually, ensuring continuity with previous parts.
The project recently incorporated Anthropic's Claude 3 model, resulting in significantly improved writing quality while simplifying the overall process. gpt-author is open-source and welcomes contributions from the research community. Potential areas for development include adapting the tool to work with other AI models, refining prompts, and expanding beyond fantasy to write novels in other genres.
Microsoft unveils VASA-1
Microsoft introduces VASA-1, an AI model capable of generating real-time talking faces from just a single picture and an audio clip. This innovative tech goes beyond existing solutions like EMO (available on Github), which can also generate expressive faces but not in real-time.
Essentially, VASA-1 enables you to talk to still portraits and have them come alive, mimicking speech with natural facial movements and emotional nuances. It can open the door for new applications in entertainment, education, and potentially even video conferencing.
NVIDIA excels at speech and translation ️
NVIDIA asserts its dominance in speech recognition AI. Their Parakeet family of models for automatic speech recognition and Canary model for multilingual speech recognition and translation are currently leading the Hugging Face Open ASR Leaderboard. These models impress with their speed, accuracy, and robustness in challenging audio environments. NVIDIA's technologies secure all five top positions, leaving the closest competitor, OpenAI's Wisper model, trailing behind in the top 10.
Andrew Ng on agentic workflows: the future of AI?
Andrew NG, founder of DeepLearning.AI and AI Fund, spoke at Sequoia Capital's AI Ascent and discussed agentic workflows – a new paradigm where AI agents act autonomously to achieve specific goals. He believes agentic workflows hold the potential to revolutionize AI, potentially surpassing the impact of the latest generation of foundational models.
Summary
That’s it for today. We hope this newsletter sparked your curiosity and provided some valuable insights into the ever-evolving world of AI. Stay tuned for future updates as we continue to explore the cutting edge of AI research and development!
Check out our blog posts:
Avenga,
your competitive advantage 🚀
Great newsletter! Grok's seems to be improving so fast. I wonder what they will announce next?