Artificial Intelligence tools for video processing and encoding
Introduction
Artificial intelligence (AI) is an exciting, emerging technology that is reshaping how people live their lives. There are new AI-powered applications being launched every day.
At Amazon Go, a new revolutionary type of convenience store, customers can purchase items and exit the store without having to go through the regular checkout process. This is made possible with Amazon’s AI-powered, “Just Walk Out” technology. This innovative technology streamlines the customer’s in-store experience while increasing operational efficiency for the store owner.
Nvidia has developed video conferencing technology that leverages generative adversarial networks to synthesize realistic talking-head videos using a single 2D image of a person. This application of AI lowers both bandwidth requirements and costs for video conferencing applications.
Figure 1. Nvidia Maxine sends key point data that allows the receiving computer to re-create the face using a neural network. (left) Video conferencing concept. (Right)
One of the most exciting announcements recently is the coming of the Metaverse. The Metaverse can be described as “a manifestation of actual reality but one grounded in a virtual (often theme park-like) world”. The Metaverse’s underlying technologies include the use of AR and VR, which themselves utilize AI technologies to provide an even more immersive user experience.
There is an ever-growing list of applications for AI, including video and image processing in real-time for a smart city, video conferencing, and medical imaging applications.
AI & Video Analytics
With the advancement of the internet and 5G networks, video has become one of the most important mediums for information exchange.
Figure 2. Internet traffic growth
In the past, without AI, the only way to analyze video data was to label the video content manually, which is a tedious and inefficient process. However, everything changed with the introduction of AI. Google and Facebook use large amounts of data to help machines imitate humans through algorithms and perform video analysis to detect temporal and spatial events in videos automatically. AI has enabled hyper-scale video analytics, creating massive amounts of valuable video data.
Figure 3. Global Video Analytics Market Share breakdown
Up to the end of 2021, video analytics had a substantial global market size valued at $5.32 billion USD and is projected to grow from $6.35 billion in 2022 to $28.37 billion by 2029, making it one of the most important emerging technologies in the world.
A New Approach: Video Transcoding + AI
The traditional approach of video analytics involves many components, including bitstream decoder, bitstream encoder, data transfer functionality, AI inferencing, and pre/post-processing algorithmic computation. The video analytics process may have also required different types of hardware, such as GPU, CPU, and Hardware Accelerator. Orchestrating these different components together to perform the inferencing and transcoding tasks can be difficult. In addition, generalized hardware, although when combined, can perform the primary functions needed, is much less efficient and very power hungry.
NETINT is a pioneer in ASIC-based video transcoding solutions and by combining this expertise with its deep understanding of AI-based video processing, is enabling a new generation of AI-based video analytics applications, including frame-to-frame video analysis, improving perceptional video quality, and optimizing bandwidth and power savings.
NETINT’s newest AI-powered Video Processing Unit (VPU), Quadra, uses an innovative architecture to bind AI and video codecs together to enable integrated, end-to-end video analytics and video processing workflow with minimal interaction from the host CPU.
Figure 4. NETINT Quadra VPU functional workflow
This approach features the seamless integration of AI and Video Processing Unit (VPU) using NETINT’s Codensity G5 ASIC to eliminate bandwidth bottlenecks and latency issues arising from the transmission of data between the ASIC and host CPU, something that is common in many other architectures.
Figure 5. Comparison - AI performance of Quadra versus CPU. NETINT’s Quadra FPS performance per watt is more than 20x times higher than the CPU’s
The performance chart shown above is a comparison between a high-end CPU and Quadra. The model used for testing is Mobilenetv2 with different input sizes to demonstrate their effects on power consumption. In these test cases, Quadra’s FPS per watt performance is more than 20x greater than a high-end CPU. This reduction in power enables a Quadra system to significantly lower operational costs.
Recommended by LinkedIn
Analyzing Videos and Preserving Useful Information
In many video-related applications, the user needs to automatically extract useful information from video frames as part of their business transaction process.
A straightforward example is highway toll collection. The video analytics task is to extract frames from a camera and apply object detection and OCR to recognize the license plate number for automatic toll fee billing.
Video Enhancement
For video streaming applications, the end-user always prefers high-quality video with low bandwidth. Conversely, the platform operator prefers to support more clients while minimizing hardware and operational costs.
One example of how AI can be used to serve the needs of both the end user and the operator is in online gaming applications. The game service provider generates the low-resolution game streams rendered by GPU, then applies AI upscaling and other enhancements, including denoising, deblocking, as well as video augmentation (e.g., replacing background) to increase the resolution and maintain video quality.
Figure 6. Video analytics example
Figure 7. Video enhancement example using AI to denoise content
Use Case: Video Conference with Ultra-low Bandwidth
Driven by global events, including the Covid-19 pandemic, more and more meetings are moving from in-person to online. In many workplaces, video conferencing has entirely replaced in-person meetings. NETINT’s Quadra VPU, with its AI-powered video processing capabilities, enables advanced video conferencing features:
Figure 8. Smart video conferencing workflow in Quadra
Final Thoughts
The Quadra VPU is the only AI-powered VPU available today that combines the advantages of hardware video encoding with integrated AI acceleration. The Quadra VPU family of U.2 and PCIe products is perfect for all new hyperscale video analytics applications, including massively scaled 1:1 video apps, cloud gaming, or any video streaming applications with widely varying usage where individual stream values are low such as social media video.
QUEENIE QIU
Deep Learning Hardware Solution Engineer at NETINT
B.S.E University of Toronto
Double major in Mathematics and Statistics
I’m obsessed with the power of AI right now, and I’m passionate about exploring this category. I’m convinced that AI will be the future of technology! But what I’m really transfixed upon is the potential of AI combined with video transcoding at NETINT. That is why I‘ve chosen to work as a Deep Learning Hardware Solution Engineer at NETINT.
My job is to bring NETINT AI hardware solutions to the state-of-the-art deep learning model to improve efficiency and reduce overall operational costs. I am specifically working on model optimization for video analytics to discover the full potential of Quadra. My second focus is implementing traditional video enhancements with our hardware acceleration to increase video quality.
The main goal of our AI Team is to offer a bridge for customers to translate, quantize, infer and export deep learning models onto NETINT Neural Network Processing Units (NETINT NPUs). We seamlessly combine AI engine with transcoding 2D engine and provide an end-to-end solution for video analytics and video enhancement. Simultaneously we also focus on optimizing performance, increasing throughputs, and reducing latency.