BxD Primer Series: Deep Belief Neural Networks
Hey there 👋
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on Deep Belief Neural Networks. Let’s get started:
The What:
Deep Belief Network (DBN) is a type of unsupervised learning model that have multiple hidden layers that can learn hierarchical representations of data. It can be thought of as a ‘Stacked Restricted Boltzmann Machine’.
Just like RBMs, DBNs are composed of two main types of layers:
The layers are connected through weighted connections, which are updated during training to optimize the network's performance.
Neurons are connected to adjacent layers but there is no connections within each layer. This allows successive layers to learn increasingly complex features of input data, starting from low-level features such as edges and textures, and moving on to higher-level features such as object parts and shapes.
DBNs learn to represent data as a probabilistic model, just like regular RBMs. They are typically trained in a layer-wise manner, with each layer being pre-trained using an unsupervised learning algorithm such as contrastive divergence. Pre-trained weights of each layer are then used as initial weights for next layer, and the entire network is fine-tuned using a supervised learning algorithm such as back-propagation.
This training technique is also known as greedy training, where greedy means making locally optimal solutions to get to a decent but possibly not optimal answer.
Anatomy of a Deep Belief Neural Network:
Differences with other Architectures:
With Feed Forward Neural Network (FFNN):
With Restricted Boltzmann Machine (RBM):
With Deep Boltzmann Machine (DBM):
Role of Visible Layer in a DBN:
Visible layer is the input layer that receives raw data. This layer consists of a set of neurons, each representing a feature of input data. For example, in an image classification task, each neuron in visible layer may represent a pixel in the image.
The values of neurons in visible layer are used to compute the activation probabilities of neurons in first hidden layer, which in turn, are used to compute the activation probabilities of neurons in second hidden layer, and so on.
Visible layer also plays an important role in pre-training of DBN. During pre-training phase, the weights between visible layer and first hidden layer are learned using an unsupervised learning algorithm (Contrastive Divergence). These pre-training weights act as as initial weights for second hidden layer.
Role of Hidden Layers in a DBN:
The first hidden layer of a DBN extracts simple features from input data, such as edges and textures in images or phonemes in speech signals. Second hidden layer combines these simple features to learn more complex features, such as object parts or syllables. Subsequent hidden layers continue to learn increasingly complex features, until final hidden layer captures the most abstract and high-level concepts in data, such as object categories or spoken words.
Learned features are represented by activation values of hidden units. Activation values are computed based on the activation values of units in previous layer and the weights connecting them.
The How:
Consider a Deep Belief Neural Network with L layers. Each layer can be considered an RBM. The input layer is called visible layer, and the remaining layers are called hidden layers.
Let x be the input vector, and h be the hidden layer activations at layer l.
Joint distribution of visible layer v and hidden layer h is given by:
Where E(v, h) is the energy function of RBM at layer l, and Z is the partition function:
Where,
The objective of training an RBM is to minimizing the negative log-likelihood of training data:
Using the RBM, we can estimate probability distribution of visible layer as:
Recommended by LinkedIn
Where Z is the partition function:
Contrastive Divergence (CD) algorithm, which is an approximation of maximum likelihood estimation, is used to train an RBM:
Step 1: Sample hidden layer activations given the input data:
Where 𝜎(·) is the sigmoid function.
Step 2: Sample visible layer activations given the hidden layer activations:
Step 3: Gibbs Sampling step is used to sample from the probability distribution of hidden units given the visible units, and vice versa.
This is done by alternatingly updating hidden and visible units using conditional probabilities calculated in step 2.
At each iteration of the Gibbs sampling, update the hidden units using following equation:
Where
Similarly, we update the visible units using following equation:
Where a_i is the bias of i’th visible unit.
Repeat the above two steps for a fixed number of iterations (e.g., 1,000), to get a sequence of samples from joint probability distribution of visible and hidden units.
Step 4: Use samples to update weights and biases of RBM. Weight and bias updates are given by:
Where,
Repeat the above four steps for a fixed number of epochs (e.g., 10) or until convergence is achieved (i.e., change in log-likelihood of training data is below a certain threshold).
Step 5: After training the first RBM, we can use it to initialize the weights of next layer.
Binary v/s Continuous DBN:
In a binary DBN, the input data is binary where each input variable can only take one of two possible values, 0 or 1. The energy function of RBMs used in binary DBNs is defined in terms of binary variables, which allows for efficient computation of probability distribution over input data.
A continuous DBN is used to model continuous-valued data, where each input variable can take on any real value. The energy function of RBMs used in continuous DBNs is defined in terms of continuous variables, which requires different techniques for computing probability distribution over input data.
The Why:
Reasons to use DBNs:
The Why Not:
Reasons not to use DBNs:
Time for you to support:
In next edition, we will cover Convolutional Neural Networks.
Let us know your feedback!
Until then,
Have a great time! 😊