Edge AI: Model Compression Techniques for Convolutional Neural Networks

Edge AI: Model Compression Techniques for Convolutional Neural Networks

Edge devices such as Internet of Things (IoT) are becoming increasingly important and widely used in our daily lives and industrial facilities. IoT is a network of things that empowered by sensors, identifiers, software intelligence, and internet connectivity, it can be considered as the intersection of the internet, things/objects (anything/everything), and data. The number of these devices is expected to increase even more. These devices have the potential to perform complex Artificial Intelligence (AI) tasks locally, without relying heavily on cloud infrastructure.

The rapid advancement of AI has led to the development of complex deep learning models that show high performance in different domains. Deploying AI models on edge devices has many advantages such as low latency, privacy and data security, bandwidth optimization, and reduced network dependence. Low latency is achieved due to real-time processing by instant data analysis on edge without waiting for remote server processing, this data analysis on the edge reduces transmitting data to the cloud which enhances security against breaches, reduces the bandwidth consumption, and reduces network dependence.

Convolutional Neural Network (CNN) models are subsets of Deep Neural Network (DNN) models with architecture often consists of Convolutional Layers, Activation Layers, Pooling Layers, Fully Connected (Dense) Layers, and Output Layer. CNN models are effective for image and video-related tasks due to their ability to learn relevant features from the data by recognizing patterns, shapes, and structures in images, which is challenging for traditional machine learning models. Therefore, CNN models are used for tasks such as image classification, object detection, image segmentation, facial recognition, and medical image analysis. 

In 1989, the use of a neural network architecture with convolutional layers for recognizing handwritten digits in the context of ZIP code recognition was introduced. That architecture consisted of input layer, 3 hidden layers, and output layer. Since then, CNN models have developed to be much deeper and below are some of known CNN architectures:

 

·       AlexNet: It consisted of multiple convolutional and fully connected layers, and it was trained on the ImageNet dataset. It is a significant milestone in the advancement of deep learning and CNNs. It demonstrated the power of deep learning for large-scale image classification tasks.

·       Inception (GoogLeNet): It is known for its utilization of inception modules and its efforts to balance model depth with computational efficiency.

·       Visual Geometry Group (VGG): It is known for its simplicity and uniformity in design with varying depths, (the most well-known being VGG16 and VGG19, 16 and 19 layers, respectively). It utilized small 3x3 convolutional filters stacked on top of each other, which allowed for the network to learn more complex features.

·       ResNet: It introduced the concept of the residual block, the output of a previous layer is added to the output of one or more convolutional layers, creating a "shortcut" connection. This approach enables training of very deep networks (with over a hundred layers) without suffering from vanishing gradient problems.

·       DenseNet: Each layer receives feature maps from all preceding layers and passes its own feature maps to all subsequent layers. This dense connectivity enhances feature reuse and enables efficient gradient flow during training. Both SenseNet and ResNet address vanishing gradient problems by allowing the earlier layers' information flows to later layers while they are different regarding connectivity patterns, the number of parameters, and the computational efficiency.

·       EfficientNet: It introduced a compound scaling method that uniformly scales the depth, width, and resolution of the network. This approach results in models that are more accurate and efficient than models scaled in only one dimension by using a scaling parameter called "phi" that controls the model's size.

·       RegNet: It is an approach that employs a quantized linear function to determine width and depth choices for stages in a network. The regular networks capture features at various levels of abstraction and emphasize simplicity and interpretability. Designed for generalization, RegNet models achieve competitive performance across metrics and settings, offering an innovative approach to effective and efficient network design.

·       Vision Transformer (ViT): It operates on sequences of data and capture long-range dependencies effectively. The input image is divided into fixed-size non-overlapping patches that are then linearly embedded and processed by transformer layers. The architecture showcases impressive performance on image classification tasks when trained on large-scale datasets.

·       Swin Transformer: It combines the strengths of both convolutional neural networks (CNNs) and transformers. It introduces a shifted window approach to process images in overlapping windows and a hierarchical architecture to handle both local and global information. This design helps the model scale to high-resolution images while being computationally efficient.

·       ResNeSt: Based on ResNet, ResNeSt architecture introduces a split-attention block that enhances the network's ability to capture informative features from different parts of the receptive field. The split-attention mechanism allows the network to attend to multiple, potentially disjoint, sets of features, leading to better feature representation.

·       Hybrid: It combines two types of Deep Learning architectures: Long Short-Term Memory (LSTM) and CNN. LSTM handles sequence data, like time-series solar irradiation data, while CNN excels at capturing spatial patterns.

 

Deploying CNN models on edge such as IoT devices has a wide range of practical and industrial applications across various sectors. Here are some specific examples:

·       Surveillance and Security: IoT cameras equipped with CNN models can perform real-time object detection and facial recognition for security monitoring, identifying intruders, and managing access control.

·       Manufacturing and Quality Control: CNN equipped IoT devices can inspect products on assembly lines for defects, ensuring quality control and minimizing errors.

·       Agriculture: CNN equipped sensors and drones can monitor crops, detect pests, diseases, and nutrient deficiencies, enabling precision agriculture.

·       Healthcare and Wearables: Wearable devices with embedded CNNs can continuously monitor vital signs, detect anomalies, and even diagnose certain health conditions.

·       Smart Cities: IoT connected traffic cameras can analyze real-time traffic patterns, optimize traffic flow, and detect accidents using CNN based video analytics.

·       Retail: In-store IoT cameras can track customer behavior, optimize store layouts, and offer personalized shopping experiences using CNN powered image recognition.

·       Energy Management: IoT devices equipped with CNN models can monitor energy usage, optimize consumption patterns, and identify areas for energy efficiency improvements.

·       Environmental Monitoring: CNN equipped sensors can monitor air quality, pollution levels, and weather conditions, providing valuable insights for urban planning.

·       Remote Sensing: CNN equipped IoT devices in remote locations can monitor natural resources, wildlife, and environmental changes.

·       Industrial Automation: CNN equipped sensors can monitor equipment health, predict maintenance needs, and reduce downtime.

·       Logistics and Inventory Management: CNN equipped IoT devices can automate package sorting, inventory tracking, and warehouse management.

·       Autonomous Vehicles: IoT connected vehicles can process real-time data from cameras and sensors using CNNs, aiding in autonomous navigation and collision avoidance.

·       Oil and Gas: CNN equipped IoT devices can monitor equipment integrity, detect leaks, and assess environmental impacts in the oil and gas industry.

·       Infrastructure Monitoring: CNN equipped sensors can monitor bridges, tunnels, and other critical infrastructure for signs of deterioration or damage.

·       Construction: IoT enabled construction equipment with embedded CNNs can enhance safety by detecting potential hazards and ensuring compliance with safety protocols.

 

Deploying AI models on edge devices, such as smartphones, IoT devices, and embedded systems, is challenging due to the limited computational resources, memory capacity, and energy constraints of these devices. The large size and computational requirements of AI models block their effective deployment on edge devices, which often have limited bandwidth and processing power.

Too many efforts have been focused on the effective compression of AI models for edge deployment. Model compression aims to reduce the size of the models without significantly sacrificing their accuracy, enabling efficient inference on resource-constrained edge devices. By compressing AI models, we can achieve reduced model sizes, faster inference times, lower memory usage, and decreased energy consumption, making it feasible to deploy AI applications on edge devices and enabling real-time, on-device intelligence.

The primary challenge in model compression for edge AI lies in finding the right balance between model size reduction and maintaining accuracy and performance. Various compression techniques have been proposed to address this challenge, including pruning, quantization, low-rank approximation, knowledge distillation, and hashing. Each technique offers unique benefits and trade-offs in terms of compression rate, computational efficiency, inference speed, and accuracy preservation. Some work included combining these techniques together such as Pruning, Quantization, and Knowledge Distillation PQK, while others mixed these techniques with the approach of parallelization for setting up distributed systems view of DNN for IoT.

 

Pruning includes removing the unnecessary and less relevant links and weights from AI models, to create a smaller model. There are different techniques for pruning such as manual thresholding, or hyperbolic and exponential biases, second-order derivatives, regularization, and Structured Sparsity Learning. Reinforcement Learning is also one of the techniques used for pruning. Recent work pruned Generative Pre-trained Transformer (GPT) models with at least 50% sparsity in one-shot using SparseGPT method.

 

Quantization includes using less memory by changing the presentation of each model parameter from 32-bit float to 8-bit or less and changing the activation functions. There are different techniques for quantization such fixed-point quantization, vector quantization, and replace the activation functions with binary or hardcoded values. Other work has introduced quantization techniques in Federated learning (FL) to improve the efficiency of data exchange between edge servers and a cloud node.

 

Low-rank factorization includes replacing the model weight matrix with smaller matrices using techniques such as singular value decomposition (SVD). Low-rank factorization was combined with other techniques to achieve significant results with CNN compression.

 

Knowledge Distillation (KD) includes knowledge transfer from a big and heavy model called “teacher” which can generalize well on unseen data since it is trained on a big dataset to a small and light model called “student”. A theoretical method was proposed using KD to build a model for pothole detection for autonomous cars. In the medical field, KD was used to compress medical diagnosis systems for both COVID-19 and Malaria.

 

Hashing includes using low-cost hash functions to randomly cluster connection weights into hash buckets which share the same parameter values. Different works used the hashing technique with model compression such as introducing a novel network architecture called HashedNets, introducing cache-friendly model compression approach called Random Operation Access Specific Tile (ROAST) hashing, and proposing a novel approach named Binary Weight Networks via Hashing (BWNH) for CNN compression.


With the enormous number of compression techniques proposed for CNNs, the rapid evolution of CNN architectures has created a gap in the field. This dynamic shift in architecture design requires an evaluation of existing compression methods, particularly in light of the demand to make these advanced CNN models suitable for deployment on edge devices. As CNN designs continue to advance, the challenge lies in adapting compression techniques to smoothly integrate with these modern architectures. This evaluation (either for each individual techniques or combined with each other) becomes important, as it not only ensures the continued relevance of compression techniques but also addresses the urgent need to make resource-intensive CNN models accessible and deployable on edge devices.    

Ann (Zemenak) Sarimo

Founder & CEO at Shop of Good | Branding/Communications/Marketing | Circular Economy, Deep Tech | Startup Insider | Mentor 🇺🇸🇫🇮

7mo

Thank you for this! Excellent piece! I'm been after this kind of info and nowI feel enlightened.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics