Introduction to deep learning for beginners

Artificial intelligence (AI) is revolutionizing our world, yet terms like "artificial intelligence," "machine learning," and "deep learning" are often used interchangeably, which can be confusing.

What are these terms? What is intelligence? Intelligence is ability to make decisions based on the set of observations in the present. Artificial intelligence then helps in emebedding this human intelligence to make predictions, in machines which involves skills like reasoning, problem solving, langauge, understanding & interaction. Within AI, there are algorithm that help us in making predictions.

Machine learning (ML) is a subset of AI focused on algorithms that learn from data. Traditional ML requires manual feature extraction (e.g., using first and second derivatives of input data) to train models.

Deep learning, a subset of ML, automates feature extraction using neural networks, simplifying and enhancing the learning process.

A good example explaining the AI system is Netflix recommender system. Even though it requires heavily on the usage of AI and ML techniques, it is an AI system as it integrates multiple intelligent behaviurs and techniques into a cohensive solution like data collection, preprocessing, recognizing patterns and personalization.

Deep learning algorithms have existed for centuries what makes this algorithms so popular now? Deep learning algorithms are data hungry, they perform better with large dataset. There is a tremendous increase in large datasets because of the use of sensors, internet of things and publicly available datasets. The hardware development (use of GPUs, TPUs) and techniques like parallelization has made the training of these netwroks speedy. On top of that software tools like tensorflow, pytorch make the training process relatively easier.

Understanding some basic concepts in neural networks: Deep learning uses neural networks to make predictions/observations. The basic philosophy is that input data is multiplied by a set of weights, summed together and after thresholding via a non-linear activation function, the output is produced. The goal is to get the output closer to the truth.

  • Neurons/Perceptrons: A neuron, or perceptron, is the fundamental unit of a neural network, performing calculations to predict outcomes.
  • Loss Function: This measures the difference between the neural network's output and the actual data, guiding the optimization of network weights. The goal of the neural networks is to minimize the error between the truth and the predicted labels.The loss across all observations is called cost/objective function or empirical loss. There are many types of loss function, depending on taks (regression or classification) or multi class classification. Mean squared error loss function is typically used for regression tasks and binary cross entropy loss for classification tasks.
  • Forward and Backward Propagation: Forward propagation calculates outputs from inputs, while backward propagation adjusts the network's weights to minimize loss. As you move from computing the output from inputs multiplied by weights, summed and thresholded is called forward propogation and when you optimize the weights such that the loss is minimized, you go back and update the weights based on optmization algorithm, this is called backward propogation.
  • Non-linearity: Non-linear functions are crucial because they allow neural networks to model complex, non-linear relationships inherent in the real-world data. To introduce non-linearity, the outputs are passed through functions like relu, sigmoid, tanh, etc

Deep Neural Network Structures: These involve layers of neurons with varying depths and complexities, tailored to different tasks like image recognition or language processing.

Optimizing Network Weights for Minimal Loss: In machine learning, particularly in neural networks, finding the optimal set of weights to minimize loss is crucial. The gradient descent algorithm is commonly used for this optimization. The process begins by assigning random values to the weights. The gradient—or the slope of the loss function at each weight—indicates the direction of steepest ascent. By moving in the opposite direction (i.e., the direction of steepest descent), we aim to find the point where the loss is minimized.The learning rate determines the size of the steps we take on this descent. For example, a learning rate of 10% means that we adjust each weight by 10% of the gradient's magnitude in the downward direction. This step size is critical: if too large, it can cause the algorithm to oscillate around the minimum without settling; if too small, convergence becomes slow, increasing computational cost significantly. In practice, a simple gradient descent can be challenging to optimize because real-world neural networks might have millions or even billions of weights. Starting with ineffective initial weights can lead one to get stuck in local minima rather than finding the global minimum. There are enhanced gradient descent techniques that help in adjusting the learning better to potentially avoid local minima like Adam and Adagrad.

Data Preparation: Typical steps involve normalization, convolution to extract features, relu to introduce non linearity.

Model training: In training neural network models, the dataset is usually split into three parts: training, validation, and test sets. The model undergoes training using the training set and its performance is assessed with the validation set. Based on its validation performance, adjustments are made to the model's parameters to enhance its effectiveness. The final evaluation of the model occurs using the test set, which consists of data previously unseen by the model during its training phase. Model performace is usually evaluated using R-square, mean absolute error, root mean square error.

Overfitting: A key challenge in training deep learning models is "overfitting." Overfitting occurs when a model learns not just the general features of the training data but also its noise and minor intricacies. Such models are likely to perform poorly on new, unseen datasets because they fail to generalize well. Ideally, the model should learn to recognize the broad features. Overfitting can also occur if the model fails to capture essential non-linear features due to insufficient model depth or complexity.

Regularization: To mitigate overfitting, various regularization techniques are employed. Techniques designed to prevent models from learning overly complex patterns, thus aiding in generalization to unseen test data, include dropout and early stopping, L1 and L2 regularization.

  • Dropout: This method involves randomly setting a proportion of neurons to zero during training in each epoch. By deactivating some neurons randomly, the network cannot form dependencies on specific patterns, thus avoiding learning intricate, noise-sensitive details and instead focusing on more general features. Dropout also discourages reliance on any single neuron, contributing to a reduction in computation time.
  • Early Stopping: This technique monitors the model's performance during training. If the training loss continues to decrease but the validation loss starts to increase, it suggests that the model is beginning to overfit. Training is halted before the model becomes overly specific to the training data. Ideally, training stops when the training and validation losses are sufficiently close to each other but not diverging significantly
  • L1 regularization: This method adds a penalty equivalent to the absolute value of the magnitude of coefficients; leading to models where some feature weights are exactly zero. This is particularly useful for feature selection in models with high dimensionality
  • L2 regularization: L2 adds a penalty equal to the square of the magnitude of coefficients. It doesn't result in zero coefficients but encourages them to be small. This method is effective in dealing with multicollinearity and model overfitting.

Considerations for Data:

  • Precision: In my experience, working with lower precision, such as using float32 instead of float64, has shown to increase model speed. This is primarily because lower precision reduces memory usage, allowing for faster processing due to the smaller data sizes. However, the trade-off is that using lower precision can lead to a loss of numerical accuracy and a higher risk of computational errors.
  • Data Handling: Efficient data chunking and cleaning are crucial for handling massive datasets effectively.
  • Data cleaning : Data with inconsistencies, errors, or outliers can mislead the training process, causing the model to learn incorrect patterns and perform poorly on real-world data. Data cleaning & visualization thus becomes a most crucial step in traning machine learning.

Additional things/tools to keep in mind:

  • TQDM :TQDM versatile tool for displaying progress bars in loops, gives clear indications of remaining time and completed percentage.
  • GPU: By monitoring how the GPU handles different batch sizes or operations, adjustments can be made to improve the efficiency of algorithms or models.


Chandan Handa

Marketing Analytics Professional | Business Analyst & Strategist | Social Media | Digital Marketing | Content Creation & Strategy | SEO Optimization | GA4 | Google Ads | Manager at Mehta Tools | MS in Marketing Analytics

3mo

Love this

Steven Smith

Business Development Specialist at Datics Solutions LLC

3mo

This is a fantastic and comprehensive overview of AI, ML, and deep learning! It effectively breaks down key concepts and practical considerations, making it accessible for beginners and insightful for experts alike.

Srushti Patil

Research Data Analyst | Master's in Data Science and Analytics

3mo

Great read! 😇

Ravinder Kumar

Research and Development | Bunge Milling | Natural color extraction | Graduate research assistant | Polyphenols | Food Science | University of Missouri- Columbia | Agriculture | CCS Haryana Agricultural University

3mo

Quite insightful!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics