BxD Primer Series: Deep Belief Neural Networks

BxD Primer Series: Deep Belief Neural Networks

Hey there 👋

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on Deep Belief Neural Networks. Let’s get started:

The What:

Deep Belief Network (DBN) is a type of unsupervised learning model that have multiple hidden layers that can learn hierarchical representations of data. It can be thought of as a ‘Stacked Restricted Boltzmann Machine’.

Just like RBMs, DBNs are composed of two main types of layers:

  • Visible layer, which receives input from data source
  • Hidden layers, which extract features from the input

The layers are connected through weighted connections, which are updated during training to optimize the network's performance.

Neurons are connected to adjacent layers but there is no connections within each layer. This allows successive layers to learn increasingly complex features of input data, starting from low-level features such as edges and textures, and moving on to higher-level features such as object parts and shapes.

DBNs learn to represent data as a probabilistic model, just like regular RBMs. They are typically trained in a layer-wise manner, with each layer being pre-trained using an unsupervised learning algorithm such as contrastive divergence. Pre-trained weights of each layer are then used as initial weights for next layer, and the entire network is fine-tuned using a supervised learning algorithm such as back-propagation.

This training technique is also known as greedy training, where greedy means making locally optimal solutions to get to a decent but possibly not optimal answer.

Anatomy of a Deep Belief Neural Network:

No alt text provided for this image

Differences with other Architectures:

With Feed Forward Neural Network (FFNN):

  • DBN is designed to learn a hierarchical representation of input data, allowing it to extract increasingly complex features from input data as it progresses through the layers of network.
  • FFNN is designed to learn a direct mapping from the input to the output.

With Restricted Boltzmann Machine (RBM):

  • RBMs have a two-layer architecture consisting of a visible layer and a hidden layer. There are no connections within the same layer, and the learning is unsupervised.
  • DBNs are composed of multiple layers of hidden units, forming a deep neural network. Each layer is trained as an RBM, and the final model is fine-tuned using supervised learning.

With Deep Boltzmann Machine (DBM):

  • Like a DBN, a DBM is also composed of multiple layers of hidden units
  • In DBM, hidden units in each layer are connected to all the units in layers above and below it
  • In DBN, hidden units in each layer are connected to just the next layer

Role of Visible Layer in a DBN:

Visible layer is the input layer that receives raw data. This layer consists of a set of neurons, each representing a feature of input data. For example, in an image classification task, each neuron in visible layer may represent a pixel in the image.

The values of neurons in visible layer are used to compute the activation probabilities of neurons in first hidden layer, which in turn, are used to compute the activation probabilities of neurons in second hidden layer, and so on.

Visible layer also plays an important role in pre-training of DBN. During pre-training phase, the weights between visible layer and first hidden layer are learned using an unsupervised learning algorithm (Contrastive Divergence). These pre-training weights act as as initial weights for second hidden layer.

Role of Hidden Layers in a DBN:

The first hidden layer of a DBN extracts simple features from input data, such as edges and textures in images or phonemes in speech signals. Second hidden layer combines these simple features to learn more complex features, such as object parts or syllables. Subsequent hidden layers continue to learn increasingly complex features, until final hidden layer captures the most abstract and high-level concepts in data, such as object categories or spoken words.

Learned features are represented by activation values of hidden units. Activation values are computed based on the activation values of units in previous layer and the weights connecting them.

The How:

Consider a Deep Belief Neural Network with L layers. Each layer can be considered an RBM. The input layer is called visible layer, and the remaining layers are called hidden layers.

Let x be the input vector, and h be the hidden layer activations at layer l.

Joint distribution of visible layer v and hidden layer h is given by:

No alt text provided for this image

Where E(v, h) is the energy function of RBM at layer l, and Z is the partition function:

No alt text provided for this image
No alt text provided for this image

Where,

  • a is the bias term for visible layer
  • b is the bias term for hidden layer
  • W is the weight matrix connecting visible and hidden layers

The objective of training an RBM is to minimizing the negative log-likelihood of training data:

No alt text provided for this image

Using the RBM, we can estimate probability distribution of visible layer as:

No alt text provided for this image

Where Z is the partition function:

No alt text provided for this image

Contrastive Divergence (CD) algorithm, which is an approximation of maximum likelihood estimation, is used to train an RBM:

Step 1: Sample hidden layer activations given the input data:

No alt text provided for this image

Where 𝜎(·) is the sigmoid function.

Step 2: Sample visible layer activations given the hidden layer activations:

No alt text provided for this image

Step 3: Gibbs Sampling step is used to sample from the probability distribution of hidden units given the visible units, and vice versa.

This is done by alternatingly updating hidden and visible units using conditional probabilities calculated in step 2.

At each iteration of the Gibbs sampling, update the hidden units using following equation:

No alt text provided for this image

Where

  • w_{ij} is the weight between i’th visible unit and j’th hidden unit
  • b_j is the bias of j’th hidden unit.

Similarly, we update the visible units using following equation:

No alt text provided for this image

Where a_i is the bias of i’th visible unit.

Repeat the above two steps for a fixed number of iterations (e.g., 1,000), to get a sequence of samples from joint probability distribution of visible and hidden units.

Step 4: Use samples to update weights and biases of RBM. Weight and bias updates are given by:

No alt text provided for this image

Where,

  • 𝜖 is the learning rate
  • v_i(0) and h_j(0) are the probabilities of visible and hidden units in input data
  • v_i(k) and h_j(k) are the probabilities of visible and hidden units in generated samples obtained from Gibbs sampling.

Repeat the above four steps for a fixed number of epochs (e.g., 10) or until convergence is achieved (i.e., change in log-likelihood of training data is below a certain threshold).

Step 5: After training the first RBM, we can use it to initialize the weights of next layer.

Binary v/s Continuous DBN:

In a binary DBN, the input data is binary where each input variable can only take one of two possible values, 0 or 1. The energy function of RBMs used in binary DBNs is defined in terms of binary variables, which allows for efficient computation of probability distribution over input data.

A continuous DBN is used to model continuous-valued data, where each input variable can take on any real value. The energy function of RBMs used in continuous DBNs is defined in terms of continuous variables, which requires different techniques for computing probability distribution over input data.

The Why:

Reasons to use DBNs:

  1. DBNs can learn hierarchical representations of input data without the need for explicit supervision.
  2. DBN learns the probability distribution of input data and are able to handle noisy and corrupted data.
  3. A pre-trained DBN model can be used as starting point for a new task. This allows for faster training and improved performance on tasks with limited labeled data.
  4. Can be used with a variety of data types, including images, text, and audio, and can be used for both supervised and unsupervised learning tasks.

The Why Not:

Reasons not to use DBNs:

  1. While DBNs can learn hierarchical representations of input data, the learned features are not directly interpretable by humans.
  2. DBNs can be slow to process data in real-time applications, such as video analysis or robotics, due to their computational requirements.
  3. Implementing a DBN can be challenging for those with limited experience in deep learning. This is because DBNs require a combination of unsupervised and supervised learning techniques, and the training process is also complex.
  4. Pre-trained DBN models are not available for most applications.

Time for you to support:

  1. Reply to this article with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next edition, we will cover Convolutional Neural Networks.

Let us know your feedback!

Until then,

Have a great time! 😊

#businessxdata #bxd #Deep #Belief #neuralnetworks #primer

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics