BxD Primer Series: Deep Belief Neural Networks

Hey there 👋

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on Deep Belief Neural Networks. Let’s get started:

The What:

Deep Belief Network (DBN) is a type of unsupervised learning model that have multiple hidden layers that can learn hierarchical representations of data. It can be thought of as a ‘Stacked Restricted Boltzmann Machine’.

Just like RBMs, DBNs are composed of two main types of layers:

Visible layer, which receives input from data source
Hidden layers, which extract features from the input

The layers are connected through weighted connections, which are updated during training to optimize the network's performance.

Neurons are connected to adjacent layers but there is no connections within each layer. This allows successive layers to learn increasingly complex features of input data, starting from low-level features such as edges and textures, and moving on to higher-level features such as object parts and shapes.

DBNs learn to represent data as a probabilistic model, just like regular RBMs. They are typically trained in a layer-wise manner, with each layer being pre-trained using an unsupervised learning algorithm such as contrastive divergence. Pre-trained weights of each layer are then used as initial weights for next layer, and the entire network is fine-tuned using a supervised learning algorithm such as back-propagation.

This training technique is also known as greedy training, where greedy means making locally optimal solutions to get to a decent but possibly not optimal answer.

Anatomy of a Deep Belief Neural Network:

Differences with other Architectures:

With Feed Forward Neural Network (FFNN):

DBN is designed to learn a hierarchical representation of input data, allowing it to extract increasingly complex features from input data as it progresses through the layers of network.
FFNN is designed to learn a direct mapping from the input to the output.

With Restricted Boltzmann Machine (RBM):

RBMs have a two-layer architecture consisting of a visible layer and a hidden layer. There are no connections within the same layer, and the learning is unsupervised.
DBNs are composed of multiple layers of hidden units, forming a deep neural network. Each layer is trained as an RBM, and the final model is fine-tuned using supervised learning.

With Deep Boltzmann Machine (DBM):

Like a DBN, a DBM is also composed of multiple layers of hidden units
In DBM, hidden units in each layer are connected to all the units in layers above and below it
In DBN, hidden units in each layer are connected to just the next layer

Role of Visible Layer in a DBN:

Visible layer is the input layer that receives raw data. This layer consists of a set of neurons, each representing a feature of input data. For example, in an image classification task, each neuron in visible layer may represent a pixel in the image.

The values of neurons in visible layer are used to compute the activation probabilities of neurons in first hidden layer, which in turn, are used to compute the activation probabilities of neurons in second hidden layer, and so on.

Visible layer also plays an important role in pre-training of DBN. During pre-training phase, the weights between visible layer and first hidden layer are learned using an unsupervised learning algorithm (Contrastive Divergence). These pre-training weights act as as initial weights for second hidden layer.

Role of Hidden Layers in a DBN:

The first hidden layer of a DBN extracts simple features from input data, such as edges and textures in images or phonemes in speech signals. Second hidden layer combines these simple features to learn more complex features, such as object parts or syllables. Subsequent hidden layers continue to learn increasingly complex features, until final hidden layer captures the most abstract and high-level concepts in data, such as object categories or spoken words.

Learned features are represented by activation values of hidden units. Activation values are computed based on the activation values of units in previous layer and the weights connecting them.

The How:

Consider a Deep Belief Neural Network with L layers. Each layer can be considered an RBM. The input layer is called visible layer, and the remaining layers are called hidden layers.

Let x be the input vector, and h be the hidden layer activations at layer l.

Joint distribution of visible layer v and hidden layer h is given by:

Where E(v, h) is the energy function of RBM at layer l, and Z is the partition function:

Where,

a is the bias term for visible layer
b is the bias term for hidden layer
W is the weight matrix connecting visible and hidden layers

The objective of training an RBM is to minimizing the negative log-likelihood of training data:

Using the RBM, we can estimate probability distribution of visible layer as:

Where Z is the partition function:

Contrastive Divergence (CD) algorithm, which is an approximation of maximum likelihood estimation, is used to train an RBM:

Step 1: Sample hidden layer activations given the input data:

Where 𝜎(·) is the sigmoid function.

Step 2: Sample visible layer activations given the hidden layer activations:

Step 3: Gibbs Sampling step is used to sample from the probability distribution of hidden units given the visible units, and vice versa.

This is done by alternatingly updating hidden and visible units using conditional probabilities calculated in step 2.

At each iteration of the Gibbs sampling, update the hidden units using following equation:

Where

w_{ij} is the weight between i’th visible unit and j’th hidden unit
b_j is the bias of j’th hidden unit.

Similarly, we update the visible units using following equation:

Where a_i is the bias of i’th visible unit.

Repeat the above two steps for a fixed number of iterations (e.g., 1,000), to get a sequence of samples from joint probability distribution of visible and hidden units.

Step 4: Use samples to update weights and biases of RBM. Weight and bias updates are given by:

Where,

𝜖 is the learning rate
v_i(0) and h_j(0) are the probabilities of visible and hidden units in input data
v_i(k) and h_j(k) are the probabilities of visible and hidden units in generated samples obtained from Gibbs sampling.

Repeat the above four steps for a fixed number of epochs (e.g., 10) or until convergence is achieved (i.e., change in log-likelihood of training data is below a certain threshold).

Step 5: After training the first RBM, we can use it to initialize the weights of next layer.

Binary v/s Continuous DBN:

In a binary DBN, the input data is binary where each input variable can only take one of two possible values, 0 or 1. The energy function of RBMs used in binary DBNs is defined in terms of binary variables, which allows for efficient computation of probability distribution over input data.

A continuous DBN is used to model continuous-valued data, where each input variable can take on any real value. The energy function of RBMs used in continuous DBNs is defined in terms of continuous variables, which requires different techniques for computing probability distribution over input data.

The Why:

Reasons to use DBNs:

DBNs can learn hierarchical representations of input data without the need for explicit supervision.
DBN learns the probability distribution of input data and are able to handle noisy and corrupted data.
A pre-trained DBN model can be used as starting point for a new task. This allows for faster training and improved performance on tasks with limited labeled data.
Can be used with a variety of data types, including images, text, and audio, and can be used for both supervised and unsupervised learning tasks.

The Why Not:

Reasons not to use DBNs:

While DBNs can learn hierarchical representations of input data, the learned features are not directly interpretable by humans.
DBNs can be slow to process data in real-time applications, such as video analysis or robotics, due to their computational requirements.
Implementing a DBN can be challenging for those with limited experience in deep learning. This is because DBNs require a combination of unsupervised and supervised learning techniques, and the training process is also complex.
Pre-trained DBN models are not available for most applications.

Time for you to support:

Reply to this article with your question
Forward/Share to a friend who can benefit from this
Chat on Substack with BxD (here)
Engage with BxD on LinkedIN (here)

In next edition, we will cover Convolutional Neural Networks.

Let us know your feedback!

Until then,

Have a great time! 😊

#businessxdata #bxd #Deep #Belief #neuralnetworks #primer

BxD Primer Series: Deep Belief Neural Networks

Mayank K.

Founding Partner - BUSINESS x DATA (Implementing AI-Driven Personalization at Scale)

The What:

Anatomy of a Deep Belief Neural Network:

Differences with other Architectures:

Role of Visible Layer in a DBN:

Role of Hidden Layers in a DBN:

The How:

Recommended by LinkedIn

Binary v/s Continuous DBN:

The Why:

The Why Not:

Time for you to support:

BUSINESS x DATA

762 followers

More articles by this author

Insights from the community

Others also viewed

Understanding Key Neural Network Architectures: A Quick Overview

BxD Primer Series: Recurrent Neural Networks

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

BxD Primer Series: Liquid State Machine (LSM) Neural Networks

The power of GeGLU in feedforward layers - Yes, improved AI with Grok-1

BxD Primer Series: Deep Q-Network (DQN) Reinforcement Learning Models

BxD Primer Series: Boltzmann Machine Neural Networks

AI Atlas #17: Recurrent Neural Networks (RNNs)

What Is Neural Network In Artificial Intelligence

Regularization, Parameter Norm Penalties, Dataset Augmentation, Noise Robustness, Early Stopping, Sparse Representation, and Dropout.

Explore topics

The What:

Anatomy of a Deep Belief Neural Network:

Differences with other Architectures:

Role of Visible Layer in a DBN:

Role of Hidden Layers in a DBN:

The How:

Recommended by LinkedIn

Binary v/s Continuous DBN:

The Why:

The Why Not:

Time for you to support:

BUSINESS x DATA

762 followers

What we look for in new recruits?

Sep 22, 2024

500+ Enrollments, 🌟🌟🌟🌟🌟 Ratings and a Podcast

Sep 14, 2024

What you mean 'Build A Business'?

Sep 7, 2024

Why 'AI-Driven Personalization' niche?

Aug 31, 2024

Entering the next chapter of BxD

Aug 24, 2024

We are ranking #1

Aug 17, 2024

My favorites from the new release

Jul 27, 2024

Many senior level jobs inside....

Jul 7, 2024

People need more jobs and videos.

Jun 29, 2024

BxD Saturday Letter #202425

Jun 22, 2024

Insights from the community

Others also viewed

Understanding Key Neural Network Architectures: A Quick Overview

BxD Primer Series: Recurrent Neural Networks

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

BxD Primer Series: Liquid State Machine (LSM) Neural Networks

The power of GeGLU in feedforward layers - Yes, improved AI with Grok-1

BxD Primer Series: Deep Q-Network (DQN) Reinforcement Learning Models

BxD Primer Series: Boltzmann Machine Neural Networks

AI Atlas #17: Recurrent Neural Networks (RNNs)

What Is Neural Network In Artificial Intelligence

Regularization, Parameter Norm Penalties, Dataset Augmentation, Noise Robustness, Early Stopping, Sparse Representation, and Dropout.

Explore topics