September 2024 Top XR & AI News
Reality Vision: XR and AI News (Vol.6)
This month, the XR and AI spaces are buzzing with exciting updates. Meta Connect is set to reveal major advancements, while Pico launched its new product, the Pico 4 Ultra, heating up the XR competition. Qualcomm, Google, and Samsung are actively developing a mixed reality device, and Sony is preparing to enter the XR market with its enterprise-focused headset. On the AI front, MiniMax’s Hailuo AI is emerging as a strong rival to Sora in text-to-video generation, and Mistral AI’s Pixtral 12B is bringing new multimodal capabilities. Let’s dive into the latest developments in XR and AI!
Meta Connect 2024
Meta has announced the program for Meta Connect 2024, scheduled for September 25-26. The conference will focus on advancements in Mixed Reality, with one of the key announcements being the new Spatial App Framework. This framework is designed to simplify the development of immersive apps for Meta Quest, allowing developers to use familiar mobile development tools. One of Meta's current challenges, compared to the Apple Vision Pro, is the limited availability of mobile apps in the Meta Horizon Store. While Apple Vision Pro supports apps from the App Store, Meta Quest lacks a similar ecosystem, limiting content for its users. The new Spatial App Framework aims to bridge this gap by enabling developers to bring 2D mobile apps to Meta’s platform more easily, increasing accessibility and flexibility.
There are also rumors that Meta may announce the Meta Quest 3S, a more affordable, entry-level XR headset expected to replace the Meta Quest 3. Additionally, there is speculation that the headset could ship without controllers to reduce costs, though it will still support Meta's Touch Plus controllers. Regardless, it's best to wait for the official announcement to confirm the details and understand any potential compromises made for the Quest 3S or even a new AR Smart Glass device.
Pico 4 Ultra: ByteDance’s Powerful New Competitor in the VR Market
Last month, Pico introduced its latest standalone virtual reality headset, the Pico 4 Ultra. Powered by Qualcomm's Snapdragon XR2 Gen 2 chipset, the Pico 4 Ultra features 12 GB of RAM, 256 GB of storage, and dual 2.56-inch displays with a resolution of 2,160 × 2,160 pixels per eye, offering a 90 Hz refresh rate. It also boasts full-color passthrough with 3D environment meshing and redesigned controllers for an improved user experience.
Pico is positioning the Pico 4 Ultra as a strong competitor to Meta's Quest 3, with significant hardware upgrades designed to enhance the mixed reality experience through better performance and user-friendly features. Early testers have praised its balanced design, which makes it comfortable for extended use, and the precision of its full-body motion tracking. However, some users have noted that while the passthrough clarity is impressive, depth perception feels lacking compared to Meta Quest 3. Overall, the Pico 4 Ultra is viewed as a strong contender in the VR market, despite some limitations with its older lenses and higher price point.
Samsung, Qualcomm, and Google: Teaming Up for Next-Gen Mixed Reality Glasses
Since the official announcement of the Qualcomm, Samsung, and Google joint project early last year, details have remained scarce. Initially believed to focus on an XR (Extended Reality) headset, recent leaks suggest the project has shifted toward developing mixed reality glasses that connect to smartphones. These glasses offload most of the processing and battery load to the phone, making them lighter and more practical for daily use, positioning them as a more attractive option compared to bulkier devices like Meta Quest 3 and Apple Vision Pro.
Recent details indicate that Samsung's XR device is expected to be released in late 2024 or 2025. The device, developed in partnership with Google and Qualcomm, will integrate into Samsung’s Galaxy ecosystem and be based on Qualcomm's Snapdragon XR2+ Gen 2 platform. This collaboration positions Samsung as a serious competitor in the XR space, offering powerful hardware with an accessible design for a broader audience.
Sony Enters XR Market with New Enterprise Headset
Sony is preparing to enter the XR market with its upcoming mixed reality headset, designed specifically for the B2B enterprise sector. First announced at CES 2024, the device is expected to feature high-quality 4K OLED microdisplays and Qualcomm's Snapdragon XR2+ Gen 2 chipset, offering immersive and seamless mixed reality experiences. The headset is aimed at industries like manufacturing and design, enabling advanced 3D modeling, digital twin creation, and other enterprise applications, positioning Sony as a strong contender in the professional XR space.
Recommended by LinkedIn
This marks a significant shift from Sony’s previous focus on consumer gaming with PlayStation VR. Collaborations with industry leaders like Siemens underscore Sony’s commitment to enhancing productivity and innovation in sectors such as engineering, architecture, and industrial design. Sony originally announced the product for 2024, and it is now expected to be released by the end of the year, marking its entry into the professional market.
Loopy: A New Frontier in Audio-Driven Avatar Animation
Loopy, a new model introduced by a team from ByteDance and Zhejiang University, advances audio-driven portrait animations. It creates more natural and expressive talking head videos by leveraging long-term motion dependencies, resulting in smoother and more lifelike animations. Unlike other models, Loopy doesn’t rely on fixed spatial templates and can seamlessly match facial movements to audio input.
Built on the Stable Diffusion framework, Loopy uses dual U-Net architecture and an audio-to-latents module to enhance the connection between audio and motion. The model is trained with both audio and facial movement features but requires only audio during use, making it highly efficient. This approach allows for high-quality, flexible portrait animations, representing a significant step forward in AI-generated avatars. You can check the research from here.
MiniMax's Hailuo AI: The New Kid on the Block
MiniMax, a rising Chinese AI startup, has introduced its text-to-video generator, Hailuo AI, marking its entry into the competitive AI-generated video market. Released in early September 2024, the model can generate six-second video clips with realistic human and animal movements. While still in its early stages, Hailuo AI shows promise, especially in rendering human-like actions, though it sometimes struggles with more complex scenes. Backed by Alibaba and Tencent, MiniMax plans to enhance Hailuo AI’s capabilities, including longer video durations and new features like image-to-video conversion.
In addition to MiniMax, another key player in the Chinese AI scene is Kuaishou, which recently launched its text-to-video platform, Kling AI. Both MiniMax and Kuaishou are pushing the boundaries of generative AI, offering innovative text-to-video solutions that highlight China’s growing focus on AI-driven content creation. These developments position Chinese startups as key contributors to the rapidly evolving global AI landscape.
Mistral AI's New Model: A Potential GPT-4 Rival?
Mistral AI has ventured into the multimodal space with Pixtral 12B, a model that processes both text and images. While not yet publicly available on the web, developers can access the source code on Hugging Face or GitHub. Soon, Mistral plans to make it available through their web chatbot and API, allowing developers to explore its capabilities. Pixtral 12B enables users to interact with images and text prompts, offering a new level of functionality.
Mistral has been rapidly expanding since its launch, partnering with companies like Microsoft and AWS. Recently, it raised $640 million and continues to push boundaries with new models like Pixtral 12B, aimed at making advanced visual applications more accessible.
Don't forget to subscribe to my Newsletter and follow me on LinkedIn to stay updated on the latest news & insights in Spatial Computing, Extended Reality (XR), Augmented Reality (AR) and Artificial Intelligence (AI). All images that haven't been sourced were created with Midjourney.