Artificial Intelligence #33: Implications of the computational cost of deep neural networks

Ajit Jaokar

Published Dec 7, 2021

+ Follow

Welcome to Artificial Intelligence # 33

This week, we crossed 35K subscribers in about 7 months

Thanks for your support

I use this newsletter to share my ideas about teaching

In that sense, it will be always long form and also will not have advertisements / only links etc

Speaking of teaching, we started the cloud and edge course last week and it an insightful conversation as usual

In the class, I mentioned Navier Stokes equations. In computational fluid dynamics, Navier stokes equations are one of the most widely used techniques with applications in aircraft design, automotive design, understanding the study of blood flow, the design of power stations, the analysis of pollution etc etc.

Despite their utility, it has not been proven if an exact solution exists in three dimensions – making Navier stokes one of the seven most important open questions in mathematics (along with the Riemann hypothesis, the Birch and Swinnerton-Dyer conjecture, the P versus NP problem, the Yang-Mills existence and mass gap problem, the Poincaré conjecture and the Hodge conjecture)

So, even if an exact solution does not exist (only an optimal solution is possible), they are very useful

And of course that has parallels with neural networks

Understanding this situation also gives a deeper appreciation of the costs and the limitations of deep neural networks. I read an interesting paper that highlighted this situation

To summarise

Deep learning is making massive strides
More power compute capability of today makes it possible to build networks with vastly increased / deeper connected neurons
Early AI systems were rule based. Deep neural networks are not rule based i.e. their insights are based purely on connections between neurons
Early neural networks had few parameters
Todays neural networks have a vastly greater number of parameters
This is significant because if you can discern a very large number of parameters, you could have a Universal function approximator
The flexibility of neural networks comes from taking the many inputs to the model and having the network combine them in myriad ways..
Deep-learning models are overparameterized i.e. they have more parameters than there are data points available for training.
Classically, this would lead to overfitting - leading to no exact solution like the Navier Stokes analogy
Deep learning avoids this trap by using a method called stochastic gradient descent.
Stochastic gradient descent also generalizes well.
So the good news is that deep learning provides enormous flexibility. The bad news is that this flexibility comes at an enormous computational cost due to two reasons.

As per the paper

Recommended by LinkedIn

Information and controlling system

Journal EEJET 2 weeks ago

How KANs Rethink AI Problem-Solving

Rudina Seseri 7 months ago

How KAN is rewriting today's AI rules

Samuel Larios 4 months ago

The first part is true of all statistical models: To improve performance by a factor of k, at least k2 more data points must be used to train the model. The second part of the computational cost comes explicitly from overparameterization. Once accounted for, this yields a total computational cost for improvement of at least k4. That little 4 in the exponent is very expensive: A 10-fold improvement, for example, would require at least a 10,000-fold increase in computation.

The expert-system approach to this problem would be to have people who are knowledgeable in radiology and oncology specify the variables they think are important, allowing the system to examine only those.
Because we do not have experts – we test as many combinations as possible
That’s where you need the computing power

As per the paper

As per scholars at the University of Massachusetts Amherst allows us to understand the economic cost and carbon emissions implied by this computational burden. The answers are grim: Training such a model would cost US $100 billion and would produce as much carbon emissions as New York City does in a month. And if we estimate the computational burden of a 1 percent error rate, the results are considerably worse.

How do you overcome this?

One strategy is to use processors designed specifically to be efficient for deep-learning calculations. That’s how we see GPUs
Another approach to reducing the computational burden focuses on generating neural networks that, when implemented, are smaller.
And that's the challenge with the various tactics that have been used to make
meta-learning.
Neuro-symbolic AI

I am personally interested in combining Bayesian techniques with deep learning

Finally, on a related note, Timnit Gebru has formed her own AI research centre

Such independent thinking is needed because thanks to her, we are all now more aware of the computational cost of deep neural networks

Also, this is the second time in two weeks that have discussed Navier Stokes at Oxford. The first was when Dr Robbie Stevens mentioned Navier Stokes in his talk at the Digital twins course

Paper source: Deep learning’s diminishing returns

Image - Navier Stokes equation NASA

Artificial Intelligence

115,659 followers

+ Subscribe

Mark Allen

Fullstack Software Developer

Nevertheless, I still subscribe to the Luce Irigaray viewpoint that the historical difficulty in proving the existence or otherwise of exact solutions to the Naviers-Stokes equation is down to the masculine science of physics promoting rigid-body mechanics and neglecting the fundamentally more feminine fluid-dynamics ;)

Nitin Malik

PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)

The computational cost of a neural network depends upon the number of arithmetic operations to be carried out which further depends upon the trainable weights and the training samples. The cost can be reduced if we have lots of zeros during initialization before the training, pruning the model during training, and post-training, their precision is reduced before deployment.

Raghu Ram

I champion cutting-edge cybersecurity and endpoint management solutions for MSPs and SMBs, translating complex IT challenges into scalable security strategies.

Adithya Premchandra

Artificial Intelligence #33: Implications of the computational cost of deep neural networks

Ajit Jaokar

Recommended by LinkedIn

Artificial Intelligence

115,659 followers

More articles by this author

Insights from the community

Others also viewed

The Future of AI: Can Quantum Computing and Liquid Neural Networks Prevent Another AI Winter?

Noisy by Nature: How AI Learns to Shush the Static

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

Autoencoders

What Is Neural Network In Artificial Intelligence

Techniques to make deep learning efficient: Pruning and Leverage Sparse Tensor Cores of A100

Unraveling the Mysteries of Neural Networks with 200-Year-Old Math

Artificial Neural Networks Should Work Like the One in Our Head

The Real Impact of Pruning Neural Networks

Neural Network, Types, Codes and Applications

Explore topics

Recommended by LinkedIn

Artificial Intelligence

115,659 followers

Learning to code by generating scripts using LLMs (chatGPT and Claude) - part two

Dec 30, 2024

Learning to code by generating scripts using LLMs (chatGPT and Claude)

Dec 29, 2024

How can countries lead in AI by harnessing the power of AI education?

Dec 28, 2024

AGI reasoning meets Enterprise AI reasoning - by a combination of O1, Knowledge Graphs and Causal Graphs - from an LLM first perspective

Dec 25, 2024

Chicken and the Egg: its more impactful to think of knowledge graphs and causal graphs supporting LLMs than vice versa Part One

Dec 23, 2024

Leveraging Multimodal Generative AI to Foster a Creative Mindset and Expand Perception

Dec 21, 2024

How to become an AI Engineer

Dec 18, 2024

Designing Agentic Workflows

Dec 18, 2024

The AI model collapse theory seems to have collapsed

Dec 17, 2024

Creating a community (LinkedIn group) for my blog - where you can ask me questions re AI

Dec 14, 2024

Insights from the community

Others also viewed

The Future of AI: Can Quantum Computing and Liquid Neural Networks Prevent Another AI Winter?

Noisy by Nature: How AI Learns to Shush the Static

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

Autoencoders

What Is Neural Network In Artificial Intelligence

Techniques to make deep learning efficient: Pruning and Leverage Sparse Tensor Cores of A100

Unraveling the Mysteries of Neural Networks with 200-Year-Old Math

Artificial Neural Networks Should Work Like the One in Our Head

The Real Impact of Pruning Neural Networks

Neural Network, Types, Codes and Applications

Explore topics