AI news

This newsletter provides a comprehensive update on the hottest AI advancements from the last few months. We explore groundbreaking LLMs like Meta's Llama-3 and OpenAI's GPT-4 Turbo alongside the cutting-edge AI accelerators unveiled by NVIDIA and Google. We also discuss some general AI news and the future of the field with Andrew Ng's concept of agentic workflows. So, buckle up and get ready for a tour of AI innovation.

Updates on new AI models

The world of AI is evolving at a breakneck pace, with new models and updates being released by tech giants and open-source communities every day. In this section, we'll explore some groundbreaking language models, enhanced visual understanding capabilities, and impressive strides in speech recognition and translation. From Meta's Llama-3 to Databricks' open-source powerhouse DBRX and from Google's Gemini 1.5 Pro to OpenAI's resurgent GPT-4 Turbo, these developments showcase the incredible potential of AI to transform various domains.

Meta makes a splash with Llama-3

Meta releases Llama-3 in two versions: 8B and 70B parameters, both boasting a hefty 8K context window. Trained on a staggering 15 trillion tokens, Llama-3 promises impressive capabilities. The upcoming versions will offer free access on popular platforms, longer context lengths, and the ability to handle different data formats (text, images, etc.). Benchmarks show Llama-3 8B surpassing Mistral 7B, while the 70B version goes neck-and-neck with Gemini Pro 1.5. Notably, Llama-3 8B achieves this with fewer parameters, making it the current efficiency champion.

Databricks DBRX: open-source titan

Databricks unveils DBRX, a groundbreaking open-source LLM ready to shake things up. DBRX already surpasses Llama-2, Grok, and Mixtral in various tests. Here's what makes it unique:

Open for all. DBRX embraces open-source accessibility.
Expanded context. It boasts a 32K context length, allowing it to consider more information when processing requests.
Transformer architecture. DBRX leverages a robust transformer architecture, focusing on the decoder-only structure for efficient training.
Massive parameters. With 132B parameters, it packs a serious punch.
MoE. DBRX utilizes a mixture of experts (MoE) with 16 experts, enabling efficient training of large models.
GPT-4 tokenizer. DBRX speaks the same language as GPT-4 for seamless communication.
Speed. DBRX is twice as fast as comparable models, but it does require some serious muscle – 4 powerful GPUs with 80GB of memory each are needed to run it.

Grok 1.5 gets its “eyes”

Grok, the open-source AI model developed by Elon Musk's team, has taken a significant leap forward with the release of version 1.5V. This latest iteration introduces a game-changing feature: visual processing capabilities.

Grok 1.5V can now analyze and interpret a wide range of information, including documents, diagrams, charts, and images. A new benchmark, RealWorldQA, has been introduced to assess a model's grasp of the real world, and Grok shines in this test, demonstrating its superior ability to reason about physical objects based on visual data. Furthermore, akin to Tesla's car technology, Grok 1.5V possesses the remarkable ability to perceive and interpret the visual landscape with amazing acuity.

Gemini 1.5 Pro is supercharged for the future

Google introduces Gemini 1.5, which shows dramatic improvements across several dimensions. 1.5 Pro achieves comparable quality to 1.0 Ultra while using less compute.

Audio understanding. Gemini 1.5 Pro now understands audio input, allowing for natural language interactions that extend beyond text.
Unlimited file handling. Unleash its capabilities on massive datasets – Gemini 1.5 Pro can process unlimited files, making it ideal for large-scale tasks.
Actionable commands. Gemini 1.5 Pro can execute commands based on your instructions, automating workflows and completing tasks.
JSON mode. Enhance communication by using the flexible JSON format for data exchange, allowing for structured data manipulation and integration with various applications.

GPT-4 turbo regains the throne

OpenAI's GPT-4 Turbo reclaims its top spot on the Arena leaderboard, outperforming competitors across various domains, such as coding, long questions, and handling multiple languages. It even excels in English-only prompts and conversations involving code snippets. OpenAI CEO Sam Altman boasts that GPT-4 is now "significantly smarter and more pleasant to use.”

Mistral 7b v0.2

Mistral-7B receives an update with version 0.2. This iteration features:

Double the context. The context window is expanded from 8k to 32k, allowing for more comprehensive analysis.
Sliding window removal. The sliding window attention mechanism has been removed, potentially leading to efficiency gains.

AI accelerators

As AI continues to revolutionize industries worldwide, tech giants are locked in a fierce battle to develop the most powerful and efficient AI accelerators. These specialized chips are designed to handle the complex computations required for LLMs and other AI applications, offering unprecedented speed and performance while reducing costs and energy consumption.

From NVIDIA's game-changing Blackwell platform to Intel's impressive Gaudi 3 chips and from Groq's lightning-fast Language Processing Unit to the cutting-edge offerings from Meta, Microsoft, and Google, the AI accelerator landscape is rapidly evolving. As these companies push the boundaries of what's possible with AI hardware, we're witnessing a new era of innovation that promises to transform the way we interact with technology.

NVIDIA’s Blackwell platform

NVIDIA's new Blackwell platform allows organizations to build and run LLMs with trillions of parameters, all at a fraction of the cost and energy consumption compared to previous solutions. Here's why it's so powerful:

Blackwell B200 chip. This powerhouse boasts a whopping 208 billion transistors, making it the world's most powerful chip. It can churn through calculations at a staggering 20 petaFLOPS per GPU.
Unleashing trillion-parameter LLMs. Blackwell empowers you to run LLMs with trillions of parameters a staggering 25 times faster than before.
New NVLink. This innovative technology provides a high-speed data highway of 1.8 terabytes per second to each GPU, ensuring smooth information flow.
GB200 board. This board packs a punch, combining two B200 chips with a single Grace CPU. It delivers a whopping 30 times the performance for LLM inference while reducing cost and energy consumption by up to 25 times compared to the previous generation H100 chip.

Intel Gaudi 3 challenges the status quo

Intel is making news with its Gaudi 3 AI accelerator chips. They claim that Gaudi 3 outperforms NVIDIA's H100s in both speed and cost, even competing with NVIDIA's latest Blackwell platform.

Groq demonstrates striking LLM Inference speed.

Groq, powered by the world’s first Language Processing Unit Inference Engine, boasts mind-blowing speeds of up to 500 tokens per second. This translates to a massive performance leap – 100 to 600 times faster than traditional GPUs. Groq's LPU card is also significantly cheaper than NVIDIA's H100, making it an attractive option for cost-conscious users.

Meta and Microsoft join the fray.

Meta unveiled their next-generation Meta Training and Inference Accelerator (MTIA), designed to propel AI capabilities forward.

Microsoft Azure is also in the mix, offering the Azure Maia 100 GPU (codenamed Athena or M100) and the Cobalt 100 CPU, a power-efficient 128-core Arm processor.

Google doubles down on AI hardware.

Google isn't letting the competition steal the show. They've introduced their new TPU v5 chip, built to run in massive pods of 8,960 chips. This powerhouse delivers double the performance of the previous generation TPUs. Additionally, Google unveiled the Axion chip, offering a 30% performance boost over general-purpose Arm chips and a 50% improvement over current Intel x86 chips.

Miscellaneous

In this section, we'll explore some exciting announcements from Google Cloud Next 2024. Google unveiled Vertex AI Agent Builder, a powerful tool for creating conversational AI agents, along with new, even more powerful Gemini models. We'll also discuss other captivating advancements in the world of AI, including Stability AI's Stable Audio 2.0, which generates high-quality songs based on text descriptions. Additionally, researchers introduced WE-agent, an AI tool that assists software engineers in fixing bugs within GitHub repositories. We'll also touch on gpt-author, a research project that utilizes AI to create fantasy novels in minutes, and Microsoft's VASA-1, which generates lifelike talking faces from a single photo and audio clip, and more.

AI agents take center stage at Google Cloud Next 2024

Google Cloud Next 2024 surprised everyone by making AI agents the star of the show, even though it was a cloud-centric event. Here's a breakdown of the key announcements:

Google unveiled Vertex AI Agent Builder, a powerful tool that empowers you to create your own conversational AI agents. This intuitive platform allows you to:

Craft engaging conversations. Design chatbots and virtual assistants for various applications.
Leverage Google Search expertise. Ground your agents in the vast knowledge of Google Search to ensure accurate and up-to-date information.
Vertex AI search and RAG Integration. Get enhanced information retrieval and response generation with Vertex AI Search and RAG.
Search component APIs. Utilize Google Search component APIs to build even more sophisticated functionalities into your agents.
Vector search integration. Integrate vector search capabilities using embeddings for more efficient and relevant information retrieval.
LangChain on Vertex AI. Harness LangChain, a framework for building modular and scalable neural network pipelines, within Vertex AI for complex AI agent functionalities.

Beyond Vertex AI Agent Builder, Google Cloud Next unveiled a plethora of other AI-powered innovations:

New AI hypercomputer architecture (TPU v5p). Google showcased their next-generation TPU v5p chip, designed to run in massive pods for unprecedented AI processing power.
Gemini 1.5 Pro. The latest iteration of Google's powerful LLM, Gemini, promises even more advanced capabilities.
Imagen 2.0. Google's image generation AI, Imagen, received a significant upgrade, pushing the boundaries of image creation.

Google announced a suite of Gemini assistants specifically designed to enhance various Google Cloud services. These AI-powered assistants will streamline workflows and provide real-time support for developers, data analysts, and other cloud users. Here are some examples:

Gemini Cloud Assist. Gain instant assistance for navigating Google Cloud services.
Gemini in security. Strengthen your cloud security posture with AI-powered threat detection and analysis.
Gemini Code Assist. Receive coding guidance and suggestions directly within your development environment.
Gemini in BigQuery & Looker. Utilize AI for data exploration and analysis within BigQuery and Looker.
Gemini in databases. Get optimized database queries and recommendations with Gemini's assistance.

Google introduced their Axion processors, boasting significant performance gains over traditional CPUs and Arm chips.

Google also integrates AI functionalities into Workspace and Google Vids, promising to improve productivity and collaboration.

High-quality songs from text prompts

Stability AI releases Stable Audio 2.0, a new AI music generation model capable of producing high-fidelity, three-minute songs from a simple text description. This is a significant leap from the first version, which debuted in 2023 and could only create short pieces.

Stable Audio 2.0 offers several new features that expand its creative potential. Users can now describe the music they want with text prompts, upload audio samples, and transform them using text descriptions. This allows for more flexibility and control over the music creation process.

The model can also generate a wider range of sounds and sound effects, and it has a new style transfer feature that allows users to customize the overall feel of the generated music. Stability AI, like OpenAI, emphasizes its commitment to responsible AI development. The model is trained on a dataset that has been cleared of copyrighted material, and it uses advanced content recognition technology to prevent users from uploading infringing content.

WE-agent

Researchers from Princeton University created an AI tool called WE-agent that can help software engineers fix bugs and problems in real GitHub repositories.

WE-agent works by interacting with LLMs like GPT-4 through a specially designed interface. This interface makes it easier for the LLM to understand the code and perform actions like browsing the repository, viewing files, and editing code.

In tests, WE-agent achieved state-of-the-art performance, resolving over 12% of issues in a benchmark dataset. The researchers emphasize that the success relies on both the capabilities of the underlying LLM and the well-designed interface.

Here are some key features of WE-agent's interface:

Code linter. Checks for syntax errors before applying edits.
Custom file viewer. Displays code in chunks of 100 lines for better readability.
File editor. Allows scrolling, searching, and editing within files.
Directory search. Provides concise listings of files containing matches.
Informative feedback. Confirms successful commands even if they don't produce output.

The researchers will soon be publishing a detailed paper on WE-agent.

gpt-author

gpt-author is a new research project that utilizes AI to create fantasy novels in just minutes. It leverages several powerful AI models: GPT-4 for writing the story, Stable Diffusion for generating cover art, and Anthropic's API for additional functionalities.

Users provide a starting prompt describing their desired story and the number of chapters. gpt-author then uses GPT-4 to brainstorm potential plots, select the most engaging one, and refine it for a captivating story. Following the chosen plot, the AI crafts each chapter individually, ensuring continuity with previous parts.

The project recently incorporated Anthropic's Claude 3 model, resulting in significantly improved writing quality while simplifying the overall process. gpt-author is open-source and welcomes contributions from the research community. Potential areas for development include adapting the tool to work with other AI models, refining prompts, and expanding beyond fantasy to write novels in other genres.

Microsoft unveils VASA-1

Microsoft introduces VASA-1, an AI model capable of generating real-time talking faces from just a single picture and an audio clip. This innovative tech goes beyond existing solutions like EMO (available on Github), which can also generate expressive faces but not in real-time.

Essentially, VASA-1 enables you to talk to still portraits and have them come alive, mimicking speech with natural facial movements and emotional nuances. It can open the door for new applications in entertainment, education, and potentially even video conferencing.

NVIDIA excels at speech and translation ️

NVIDIA asserts its dominance in speech recognition AI. Their Parakeet family of models for automatic speech recognition and Canary model for multilingual speech recognition and translation are currently leading the Hugging Face Open ASR Leaderboard. These models impress with their speed, accuracy, and robustness in challenging audio environments. NVIDIA's technologies secure all five top positions, leaving the closest competitor, OpenAI's Wisper model, trailing behind in the top 10.

Andrew Ng on agentic workflows: the future of AI?

Andrew NG, founder of DeepLearning.AI and AI Fund, spoke at Sequoia Capital's AI Ascent and discussed agentic workflows – a new paradigm where AI agents act autonomously to achieve specific goals. He believes agentic workflows hold the potential to revolutionize AI, potentially surpassing the impact of the latest generation of foundational models.

Summary

That’s it for today. We hope this newsletter sparked your curiosity and provided some valuable insights into the ever-evolving world of AI. Stay tuned for future updates as we continue to explore the cutting edge of AI research and development!

Check out our blog posts:

Avenga,

your competitive advantage 🚀

avenga.com

Updates on new AI models

Meta makes a splash with Llama-3

Databricks DBRX: open-source titan

Grok 1.5 gets its “eyes”

Gemini 1.5 Pro is supercharged for the future

GPT-4 turbo regains the throne

Mistral 7b v0.2

AI accelerators

NVIDIA’s Blackwell platform

Intel Gaudi 3 challenges the status quo

Groq demonstrates striking LLM Inference speed.

Meta and Microsoft join the fray.

Google doubles down on AI hardware.

Miscellaneous

Recommended by LinkedIn

AI agents take center stage at Google Cloud Next 2024

High-quality songs from text prompts

WE-agent

gpt-author

Microsoft unveils VASA-1

NVIDIA excels at speech and translation ️

Andrew Ng on agentic workflows: the future of AI?

Summary

Check out our blog posts:

Avenga AI Insights

17,868 followers

Salesforce Agentforce, a new take on customer engagement

Nov 8, 2024

Qinshift: discover Avenga’s new tech partner

Oct 10, 2024

AI news #6. The insider's guide to AI transformation

Sep 25, 2024

Salesforce Dreamforce 2024, and key reasons to meet Avenga there

Sep 17, 2024

AI news #5: battle of embedding models

Aug 30, 2024

What's new in Power Platform: updates and insights

Aug 29, 2024

How AI is transforming the role of designers

Aug 1, 2024

Data—a friend or foe of efficient marketing?

Jul 31, 2024

AI News #4. The growing relevance of semantic search

Jul 29, 2024

AI news. Issue #3: all about Apple Intelligence

Jun 20, 2024

Insights from the community

Others also viewed

Why Llama 3.1's Release is an Important Step in the LLM Arena

This AI newsletter is all you need #5

Google Unleashes Generative AI Without A Moat to Defend

ODSC's AI Weekly Recap: Week of July 19th

Gemma 2B Beats GPT-3.5, Taco Bell’s AI Drive-Thrus, and ‘No Fakes’ Laws

Here to stay or fade away? Generative AI in the year ahead

AI Newsletter

Google Unleashes Generative AI Without A Moat to Defend

This week in Mundo Data-Driven, august 3, 2024

Single API to Access Llama 3.1, GPT-4 o, Claude 3.5, Mistral, Florence-2, and Leading Top Open-Source & Third-Party Models 🔥

Explore topics