👀Baidu's Tech Improves AI Image Accuracy, China’s AI Glasses Expected in 2025, and Alibaba's Coder Beats GPT-4o
Weekly China AI News from November 11, 2024 to November 17, 2024
Hi, this is Tony! Welcome to this week’s issue of Recode China AI, a newsletter for China’s trending AI news and papers.
Three things to know
Before we start, take a look at this quadruped robot from Deep Robotics, a Hangzhou-based startup. This all-terrain robot, called Lynx, can speed downhill, and perform backflips.
Baidu Tackles AI’s Visual Hallucination Problem with iRAG
What’s New: At its annual tech event Baidu World, Baidu unveiled iRAG (Image-Based Retrieval-Augmented Generation), a technology designed to tackle one of generative AI’s most persistent issues: image hallucinations, where AI generates inaccurate or imaginary visual elements. iRAG-assisted image generation is now live on ERNIE Bot and its mobile app.
Deep Dive: RAG (Retrieval-Augmented Generation) is an AI framework that improves the accuracy of LLM-generated answers by combining LLMs with external databases. In text-based RAG, an LLM retrieves relevant information from a knowledge base using a generated query. This retrieved information is then integrated into the LLM’s input, enabling it to generate more accurate and contextually relevant text.
While RAG has made strides in reducing LLM text hallucinations, visual hallucinations in AI-generated images have persisted, from six fingers to wrong car logos to missing elements. That’s why iRAG is proposed.
iRAG builds on Baidu’s image search database, linking it with text-to-image models for more accurate and contextually realistic visuals. Here’s how it works:
Below are a few sample generations. While still a work in progress, iRAG has already improved accuracy in AI-generated images.
Baidu isn’t alone in exploring RAG’s potential for multimodality. Earlier this year, a team from the University of Missouri and NEC Laboratories America introduced another iRAG, aimed at improving video understanding through efficient interactive querying.
Why It Matters: Hallucinations remain a major challenge to integrating AI into practical, everyday use. Baidu’s approach breaks new ground by extending retrieval beyond text. Baidu expects that with iRAG, AI can improve user experience and AI’s trustworthiness by delivering instant, high-quality visuals. However, producing hyper-realistic images could also increase the risk of misinformation and raise copyright concerns — issues that commonly affect other AI technologies, such as video generation.
One More Thing: Baidu also highlighted the impressive growth of its ERNIE models, now managing 1.5 billion daily API calls — a 30-fold increase over the past year. The Chinese tech giant also introduced Miaoda, a no-code app builder, and ERNIE-powered AI Glasses (details below).
Chinese Firms Eye Domestic AI Glasses as Meta’s Ray-Ban Gains Popularity
What’s New: Meta’s Ray-Ban smart glasses are rapidly gaining popularity, with users enjoying features like photo capture, music streaming, and on-demand AI interactions. This success has spurred Chinese tech companies to explore domestic alternatives.
At Baidu World, Baidu’s AI hardware company, Xiaodu Technology, introduced the Xiaodu AI Glasses. Weighing just 45 grams, these lightweight glasses are designed to blend seamlessly into daily life while offering advanced functionality. The glasses are equipped with a 16-megapixel ultra-wide camera with AI stabilization, a four-microphone array for clear sound capture, and open-ear anti-leakage speakers.
Built on Baidu’s ERNIE LLM and powered by the DuerOS operating system, the Xiaodu AI Glasses can provide real-time “walk-and-ask” capabilities, allowing hands-free interaction with the environment. According to Xiaodu, these glasses act as personal assistants, offering first-person photography, real-time Q&A, calorie recognition, object identification, encyclopedia lookup, audio-visual translation, and integration with Baidu services like Maps and Baike.
With up to 5 hours of active use and a 56-hour standby battery life, the Xiaodu AI Glasses charge fully in just 30 minutes. Set to launch in early 2025, they are expected to be an affordable alternative in the market.
Xiaomi’s Entry: Chinese smartphone and EV maker Xiaomi is also reportedly developing AI-powered smart glasses, aiming for a Q2 2025 release. Expected to offer camera and audio capabilities similar to other smart glasses, Xiaomi is partnering with Apple supplier Goertek to develop the product, with ambitions to sell over 300,000 units upon launch.
Additionally, Chinese media reports that OPPO, Vivo, Huawei, Tencent, and ByteDance are meanwhile exploring their own AI glasses projects.
Why It Matters: If these reports hold, 2025 could be the year of an AI glasses war among China’s tech giants and hardware companies. On the other hand, smaller startups, anticipating fierce competition, are mirroring the Meta-Ray-Ban model by teaming up with established eyewear brands. For example, Chinese AR firms Rokid and Xreal are said to be planning collaborations with Bolon and Gentle Monster, while the new AI glasses brand Shanji has joined forces with Hong Kong’s fast-fashion eyewear brand LOHO.
With so many players entering the space, the Chinese AI glasses market is shaping up to be one of the most dynamic fronts for wearable tech innovation.
Alibaba Claims New Coding Model is Better than GPT-4o
What’s New: Alibaba’s Qwen research team just open-sourced the latest Qwen2.5-Coder models for code generation, code reasoning, and fixing bugs. With this release, Qwen introduces six different model sizes, including the high-performing 32B model, which has now achieved state-of-the-art (SOTA) results in various code generation benchmarks, outperforming GPT-4o.
Other benchmarks such as Paul Gauthier’s reputable Aider benchmarks showed that the 32B model scored in between GPT-4o and 3.5 Haiku.
How It Works: In addition to handling over 40 programming languages, Qwen2.5-Coder is reportedly adept at multi-language code repair, scoring top marks in tasks requiring cross-language support.
For developers who prefer a tailored experience, Qwen2.5-Coder offers two model types for each size: a Base model for custom fine-tuning and an Instruct model pre-aligned to support interactive coding assistance. These models excel in environments like the Cursor code editor.
Additionally, in Artifact creation, the models assist in generating complex visual applications such as simulations, websites, and data charts.
How Good is Qwen2.5-Coder? I asked Qwen2.5-Coder on Hugging Face to create a portfolio website, and it did a pretty decent job. However, whenever I clicked “View More” on the “Project 1” section, there was no button to close the page and return to the home section.
I also used the same prompt with GPT-4o and Claude-3.5-Sonnet for comparison. GPT-4o generated a similar website to Qwen’s but with just 80+ lines of code, whereas Claude wrote over 500 lines. The Claude-generated site allows me to open projects, read more on the Blog, and even send a message through the Contact Me section.
While my test was quite preliminary, the results were consistent with benchmark findings.
Why It Matters: Qwen2.5-Coder’s open-source release is a further step toward democratizing access to powerful coding models, particularly in a field often dominated by closed models.
Weekly News Roundup
Trending Research