💡Will DoT Replace CoT, ByteDance Enters AI Video War, and China's Ready for 100K GPU Clusters
Weekly China AI News from September 23, 2024 to October 6, 2024
Hi, this is Tony! Welcome to this week’s issue of Recode China AI, a newsletter for China’s trending AI news and papers.
Three things to know
Chinese Researchers Propose Diagram of Thought to Overshadow Reasoning Technique Behind OpenAI’s o1
What’s New: Last week researchers from Tsinghua University, led by the legendary Chinese computer scientist Andrew Chi-Chih Yao, introduced a novel framework, Diagram of Thought (DoT), designed to enhance the reasoning capabilities of LLMs.
Limitations of Chain-of-Thought: The paper’s authors argued while LLMs have excelled in various tasks, their capacity for complex reasoning remains limited. For example, Chain-of-Thought (CoT), the reasoning technique behind OpenAI’s o1 model, represent reasoning as linear sequences. This means they process information in a straight line, similar to following a recipe. While this can be effective for some tasks, it doesn’t reflect the way of how humans think, which is often non-linear and iterative.
While extensions like Tree-of-Thought (ToT) and Graph-of-Thought (GoT) introduce branching structures, they can be computationally expensive and challenging to implement within a single LLM, the authors argue.
What is Diagram of Thought (DoT): DoT addresses these limitations by modeling reasoning as the construction of a directed acyclic graph (DAG) within a single LLM. This DAG comprises nodes representing:
Think of a DAG like a flowchart: With DAG, the model can explore multiple ideas or solutions. The critique nodes allow the model to evaluate its own reasoning and make corrections, just like we reflect on our own thoughts. Because DAG is a structure where information flows in one direction, it prevents the model from getting trapped in circular reasoning.
DoT leverages auto-regressive next-token prediction with role-specific tokens (e.g., <proposer>, <critic>, <summarizer>) to manage the reasoning process within a single LLM. This eliminates the need for multi-LLM collaboration or external control mechanisms.
Another crucial aspect of the DoT framework is its strong foundation in mathematical logic. The paper formalizes DoT using Topos Theory, providing a mathematical framework that ensures logical consistency and soundness in the reasoning process.
ByteDance Joins AI Video War with Two New Models
What's New: On September 24, 2024, ByteDance released two new video generation AI models named Doubao-PixelDance and Doubao-Seaweed. The models aim to compete with OpenAI’s Sora by narrowing the performance gap between Chinese and Western video generator.
The PixelDance model focuses on producing complex and sequential motions, capable of generating 10-second videos, while the Seaweed model extends that capacity up to 30 seconds. Both models are currently undergoing small-scale testing, but will soon be made available to enterprise markets and a broader user base.
How It Works: According to ByteDance, these two new video generators are both Diffusion Transformer (DiT) model, similar to Sora and Kuaishou’s Kling. The models leverage the Douyin (TikTok’s China version) and Jianying (CapCut’s China version) platforms’ video understanding capabilities. They also support multiple visual styles - ranging from 3D animation to traditional Chinese painting and watercolor - and can handle different screen ratios.
One major selling point is the stability and consistency of video generation. The models can generate videos that transition smoothly from one shot to another - whether it involves zooming, panning, rotating, or tracking a target - something that remains a significant challenge for other video generators. The generated characters can perform a wide range of complex movements. Multiple characters can even interact naturally within the scene.
While ByteDance didn’t disclose technical details during the release, its researchers released a paper of PixelDance a year ago. PixelDance is a diffusion model-based approach leveraging image instructions for the first and last frames of a video, alongside text instructions. This approach enables the model to focus on learning motion dynamics and generating complex movements.
Why It Matters: While it’s still difficult to tell whether ByteDance’s video generators are a valid Sora challenger given the limited amount of video demos, they are definitely one of the strongest contenders in China and opens up new creative possibilities for content creators in China.
Recommended by LinkedIn
China Tech Giants Ready to Support 100,000 GPU Clusters
What’s New: Chinese tech giants are entering the race for 100,000 GPU clusters - a critical milestone for AI model development as these clusters provide the computational power necessary to train increasingly complex and large-scale AI models like GPT-5.
On September 25, Baidu’s Executive Vice President Shen Dou announced at the Baidu AI Cloud Summit that “soon more 100,000-GPU clusters will emerge,” as demand for larger-scale model training continues to rise. Baidu unveiled its upgraded heterogeneous computing platform, Baige 4.0, designed to manage this ambitious scale. Alibaba Cloud and Tencent are similarly scaling up to support 100,000-GPU clusters.
The move comes on the heels of recent developments by other global players. Elon Musk’s xAI deployed a 100,000 Nvidia H100 GPU cluster, with plans to double it within months. Meanwhile, Meta CEO Mark Zuckerberg revealed that Meta’s compute power will reach the equivalent of 600,000 H100 GPUs.
Technical Challenges: Building a 100,000-GPU cluster involves more than simply scaling up existing setups. These clusters must solve issues related to high-performance computation, power consumption, and physical space. Shen highlighted that managing such a cluster requires about 100,000 square meters - equivalent to 14 soccer fields - and consumes nearly 3 million kWh of electricity per day, comparable to the daily power usage of one Beijing district with over 700,000 people.
Beyond the physical infrastructure, efficient GPU use is crucial. Key challenges include inter-GPU networking, hardware-software optimization, and managing system stability. For instance, the probability of hardware failures increases with the number of GPUs involved. Shen noted that at Meta, their 16,000-GPU cluster for training Llama often experiences failures every 2-3 hours. Such challenges are magnified when operating clusters as large as 100,000 GPUs.
Why It Matters: 100,000-GPU clusters are becoming the entry ticket to the future of generative AI. As models grow in size and complexity, traditional 10,000-GPU clusters are no longer sufficient. Companies without the compute power of at least 10,000 GPUs are increasingly seen as lagging behind in the AI race.
The pursuit of massive clusters comes with immense costs, both in terms of money and energy. For AI to truly advance, addressing cost efficiency and effective GPU utilization is key. Oracle Founder Larry Ellison once said that training costs for AI models, which today reach $1 billion, could grow to $100 billion in the coming years.
Weekly News Roundup
Trending Research
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
2moEasily one of my favorite source for China Tech News and AI developments.