Deep Learning in Action: Building and Training a Neural Network for MNIST Classification and Exploring Backpropagation Through Gradient Descent
In the dynamic and ever-evolving field of artificial intelligence, deep learning stands out as a revolutionary force, driving progress across numerous applications—from voice recognition systems to self-driving cars. At the heart of deep learning lies the neural network, a computational architecture inspired by the biological networks within our own brains. This article dives into the practical implementation of neural networks, demonstrating their power and versatility in image recognition tasks using the iconic MNIST dataset.
The MNIST dataset, a collection of handwritten digits, has been the benchmark for image classification algorithms, providing a playground for beginners and experts alike to test the limits of algorithmic accuracy. Our journey will take us through the construction of a deep learning model tailored to classify these digits with precision, incorporating techniques like batch normalization and leveraging the robust features of TensorFlow, a leading framework in the field.
Furthermore, we unravel the complexities of neural networks by manually computing the forward and backward propagation steps, a foundational concept that enables these models to learn from data. This exercise not only solidifies the understanding of how neural networks adjust their parameters to minimize error but also illustrates the mathematical intricacies underpinning these powerful tools.
Whether you are a seasoned data scientist or a curious enthusiast, this article aims to provide a clear and thorough walkthrough of creating a deep learning model for the MNIST dataset and a deeper understanding of the mechanics of neural networks. Join us as we embark on this computational adventure, where each line of code brings us closer to the frontier of artificial intelligence.
Note1: This article is part of the following article:
Note 2: We will be using TensorFlow. For a quick start with TensorFlow, you may read the following article:
What is the MNIST dataset?
The MNIST dataset (Modified National Institute of Standards and Technology dataset) is a large database of handwritten digits that is commonly used for training various image processing systems. It's one of the most widely used datasets for benchmarking machine learning algorithms, especially in the field of computer vision.
Here are some key points about the MNIST dataset:
The high accuracy score you've achieved suggests that your neural network model has learned to recognize the patterns of the handwritten digits quite well, which is typical for models trained on the MNIST dataset.
Before we start into the coding part, I suggest you go through the following video:
_____________________________
1 - Exploratory Data Analysis
Let's first load the dataset and discover what is inside
1.1 Install TensorFlow
!pip install tensorflow
1.2 Load MNIST Dataset
import tensorflow as tf
mnist = tf.keras.datasets.mnist
1.3 Explore the content and structure of the MNIST dataset
You can explore the content and structure of the MNIST dataset by examining the arrays that are loaded into memory. When you load MNIST using tf.keras.datasets.mnist, it returns two tuples: one for the training data and one for the test data. Each tuple contains images and their corresponding labels.
Here's how you can inspect the content and structure:
import tensorflow as tf
# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Check the shape of the arrays
print("Training images shape:", train_images.shape) # Should be (60000, 28, 28)
print("Training labels shape:", train_labels.shape) # Should be (60000,)
print("Test images shape:", test_images.shape) # Should be (10000, 28, 28)
print("Test labels shape:", test_labels.shape) # Should be (10000,)
# Check the range of pixel values
print("Training images pixel values range from", train_images.min(), "to", train_images.max())
print("Test images pixel values range from", test_images.min(), "to", test_images.max())
# Check the first few labels
print("First 10 training labels:", train_labels[:10])
# Visualize the first image in the training dataset
import matplotlib.pyplot as plt
plt.imshow(train_images[0], cmap='gray')
plt.title(f"Label: {train_labels[0]}")
plt.show()
The previous code does the following:
This will give you a good sense of the structure and content of the MNIST dataset.
For fun, here are the first 10 images in the training set:
Here is the code to plot it:
import tensorflow as t
import matplotlib.pyplot as plt
# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (_, _) = mnist.load_data()
# Plot the first 10 images in the training set
plt.figure(figsize=(20, 4))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(train_images[i], cmap='gray')
plt.title(f"Label: {train_labels[i]}")
plt.axis('off')
plt.show()f
An MNIST digit image is a 28x28 pixel grayscale image of a single handwritten digit (0-9). Each pixel in the image has a value between 0 (black) and 255 (white), representing the intensity of the grayscale color at that pixel.
Here's an example of an MNIST digit image of the number "2":
MNIST digit 2
As you can see, the image is indeed 28x28 pixels and contains only the shape of the digit "2" with all other pixels being black (0).
2 - Data Preparation
2.1 - Normalize the Data
Let's Normalize the data to 0 and 1
# Normalize the image
mnist_train_images = mnist_train_images / 255.0
mnist_test_images = mnist_test_images / 255.0s
mnist_train_images = mnist_train_images / 255.0
This line of code takes the training images from the MNIST dataset and divides each pixel value by 255. Pixel values in an image are typically in the range of 0 to 255, representing the intensity of a pixel in grayscale (0 being black, 255 being white, and values in between representing shades of gray).
mnist_test_images = mnist_test_images / 255.0
Similarly, this line scales the pixel values of the test images from the test dataset to a range between 0 and 1 by dividing each pixel value by 255.
Importance of Normalization
Recommended by LinkedIn
Context
This normalization technique is particularly common in the preprocessing steps for deep learning models dealing with images, such as Convolutional Neural Networks (CNNs) used for image classification tasks.
Here is what it looks like for the number "1" after normalization. It is a normalized matrix to be only 0 and 1
It is a 3D matrix (Samples, x, y). Training images shape: (60000, 28, 28)
2.2 - Flatten the images to 1D Vector
# Flatten the images to one-dimensional vector
train_images = train_images.reshape((train_images.shape[0], 28 * 28))
test_images = test_images.reshape((test_images.shape[0], 28 * 28))s
There are a few important reasons why we flatten images in the MNIST example to one-dimensional vectors before feeding them into the deep learning model:
Also, you can use the Flatten layer that can do that automatically as follows:
# Build the mode
model = tf.keras.Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax') # 10 classes for the digits 0-9
])l
Preprocessing with Flatten: The common practice is to use a Flatten layer as the first layer (after specifying input_shape) in a Sequential model when dealing with image data. This layer converts the 2D image data into a 1D array, making it compatible with the Dense layer's expectations. It effectively reshapes the input images from a 2D format to a format (1D vector) that the Dense layer can work with.
3 - Build the Model
we will build the following Simple Neural Network (We will solve it using a better CNN later):
# Build the model
model = tf.keras.Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax') # 10 classes for the digits 0-9
])
The previous code constructs a neural network model using TensorFlow's Keras API. This model is designed for a classification task for the MNIST dataset, which consists of 28x28 pixel grayscale images of handwritten digits (0-9). Here's a step-by-step explanation of the model:
Code Breakdown
This layer applies batch normalization, a technique to normalize the inputs of each layer. It helps to accelerate the training process, improve model stability, and reduce the sensitivity to network initialization.
By normalizing the output of the previous layer, it ensures that the network always creates activations with the same distribution that we desire.
The final layer is another Dense layer with 10 neurons, corresponding to the 10 classes of the digits (0-9) that the model is trying to classify.
Model Overview
This model architecture is relatively simple and consists of an input layer (the Flatten layer), one hidden layer (the first Dense layer), a batch normalization layer to improve training efficiency and stability, and an output layer (the second Dense layer). It's designed to classify 28x28 pixel images into one of 10 classes (digits 0-9). The use of relu activation functions in hidden layers helps to mitigate the vanishing gradient problem, and the softmax activation in the output layer makes it suitable for multi-class classification. Batch normalization is included to enhance the training dynamics.
Here is the full program:
import tensorflow as t
from tensorflow.keras.layers import Flatten, Dense, BatchNormalization
from tensorflow.keras.callbacks import ModelCheckpoint
# Load the dataset
(mnist_train_images, mnist_train_labels), (mnist_test_images, mnist_test_labels) = tf.keras.datasets.mnist.load_data()
# Normalize the images
mnist_train_images = mnist_train_images / 255.0
mnist_test_images = mnist_test_images / 255.0
# Build the model
model = tf.keras.Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax') # 10 classes for the digits 0-9
])
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Set up model checkpoints
checkpoint_path = 'mnist_model_checkpoint.h5'
checkpoint = ModelCheckpoint(checkpoint_path, save_best_only=True, monitor='val_accuracy', mode='max')
# Fit the model
model.fit(mnist_train_images, mnist_train_labels, epochs=10, validation_split=0.2, callbacks=[checkpoint])
# Evaluate the model
test_loss, test_acc = model.evaluate(mnist_test_images, mnist_test_labels)
print(f'Test accuracy: {test_acc}')
f
Here's a step-by-step guide again:
1. Load the MNIST dataset
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize the pixel values to be between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0
# Flatten the images to one-dimensional vectors
train_images = train_images.reshape((train_images.shape[0], 28 * 28))
test_images = test_images.reshape((test_images.shape[0], 28 * 28))
2. Define the deep learning model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Activation
model = Sequential([
Dense(512, activation='relu', input_shape=(784,)),
BatchNormalization(),
Dense(256, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax') # 10 units for 10 classes
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
3. Train the model with batch normalization and model checkpoints
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint('model.h5', save_best_only=True)
model.fit(train_images, train_labels, epochs=5, batch_size=32, validation_data=(test_images, test_labels), callbacks=[checkpoint])
4. Fit a regression line using TensorFlow's gradient tape
4.1. Prepare the regression dataset
Since we already have regression dataset prepared as NumPy arrays x_data and y_data, Let's proceed with the following steps:
4.2. Define the model and loss function
import tensorflow as tf
# Define the model as a simple linear regression function
def model(x):
weights = tf.Variable(tf.random.normal([x.shape[1], 1]))
bias = tf.Variable(0.0)
return weights @ x.T + bias
# Define the loss function as mean squared error
def loss(y_true, y_pred):
return tf.reduce_mean((y_true - y_pred) ** 2)
4.3. Implement gradient tape and training loop
# Create an optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
# Training loop
for epoch in range(100):
with tf.GradientTape() as tape:
# Get predictions
y_pred = model(x_data)
# Calculate loss
loss_value = loss(y_data, y_pred)
# Calculate gradients
grads = tape.gradient(loss_value, [weights, bias])
# Update weights
optimizer.apply_gradients(zip([weights, bias], grads))
# Print loss
print(f"Epoch {epoch+1}, Loss: {loss_value.numpy()}")
This code defines a simple linear regression model, calculates the mean squared error loss, and updates the weights using the Adam optimizer within a training loop.
By following these steps, you'll have created a deep learning model for classifying MNIST images with batch normalization and saved checkpoints, as well as fit a regression line using TensorFlow's gradient tape.
Python Trainer/Developer/Training GPT/BERT Models - Machine Learning Engineer/Data Scientist/LLM Developer
1moExcellent Info!!
Exciting insights! Neural networks are truly transformative. 👍