Artificial Intelligence #33: Implications of the computational cost of deep neural networks
Navier stokes equations - NASA

Artificial Intelligence #33: Implications of the computational cost of deep neural networks

Welcome to Artificial Intelligence # 33

This week, we crossed 35K subscribers in about 7 months

Thanks for your support

I use this newsletter to share my ideas about teaching

In that sense, it will be always long form and also will not have advertisements / only links etc

Speaking of teaching, we started the cloud and edge course last week and it an insightful conversation as usual

In the class, I mentioned Navier Stokes equations. In computational fluid dynamics, Navier stokes equations are one of the most widely used techniques with applications in aircraft design, automotive design, understanding the study of blood flow, the design of power stations, the analysis of pollution etc etc.

 Despite their utility, it has not been proven if an exact solution exists in three dimensions – making Navier stokes one of the seven most important open questions in mathematics (along with the Riemann hypothesis, the Birch and Swinnerton-Dyer conjecture, the P versus NP problem, the Yang-Mills existence and mass gap problem, the Poincaré conjecture and the Hodge conjecture)

So, even if an exact solution does not exist (only an optimal solution is possible), they are very useful

And of course that has parallels with neural networks

Understanding this situation also gives a deeper appreciation of the costs and the limitations of deep neural networks. I read an interesting paper that highlighted this situation

To summarise

  • Deep learning is making massive strides
  • More power compute capability of today makes it possible to build networks with vastly increased / deeper connected neurons
  • Early AI systems were rule based. Deep neural networks are not rule based i.e. their insights are based purely on connections between neurons
  • Early neural networks had few parameters
  • Todays neural networks have a vastly greater number of parameters
  • This is significant because if you can discern a very large number of parameters, you could have a Universal function approximator
  • The flexibility of neural networks comes from taking the many inputs to the model and having the network combine them in myriad ways..
  • Deep-learning models are overparameterized i.e. they have more parameters than there are data points available for training.
  • Classically, this would lead to overfitting - leading to no exact solution like the Navier Stokes analogy
  • Deep learning avoids this trap by using a method called stochastic gradient descent.
  • Stochastic gradient descent also generalizes well.
  • So the good news is that deep learning provides enormous flexibility. The bad news is that this flexibility comes at an enormous computational cost due to two reasons.

As per the paper

The first part is true of all statistical models: To improve performance by a factor of k, at least k2 more data points must be used to train the model. The second part of the computational cost comes explicitly from overparameterization. Once accounted for, this yields a total computational cost for improvement of at least k4. That little 4 in the exponent is very expensive: A 10-fold improvement, for example, would require at least a 10,000-fold increase in computation.


  • The expert-system approach to this problem would be to have people who are knowledgeable in radiology and oncology specify the variables they think are important, allowing the system to examine only those.
  • Because we do not have experts – we test as many combinations as possible
  • That’s where you need the computing power

 

As per the paper

As per scholars at the University of Massachusetts Amherst allows us to understand the economic cost and carbon emissions implied by this computational burden. The answers are grim: Training such a model would cost US $100 billion and would produce as much carbon emissions as New York City does in a month. And if we estimate the computational burden of a 1 percent error rate, the results are considerably worse.

 

How do you overcome this?

  • One strategy is to use processors designed specifically to be efficient for deep-learning calculations. That’s how we see GPUs
  • Another approach to reducing the computational burden focuses on generating neural networks that, when implemented, are smaller.
  • And that's the challenge with the various tactics that have been used to make
  • meta-learning.
  • Neuro-symbolic AI

I am personally interested in combining Bayesian techniques with deep learning

Finally, on a related note, Timnit Gebru has formed her own AI research centre

Such independent thinking is needed because thanks to her, we are all now more aware of the computational cost of deep neural networks

Also, this is the second time in two weeks that have discussed Navier Stokes at Oxford. The first was when Dr Robbie Stevens mentioned Navier Stokes in his talk at the Digital twins course

Paper source: Deep learning’s diminishing returns

Image - Navier Stokes equation NASA

Mark Allen

Fullstack Software Developer

3y

Nevertheless, I still subscribe to the Luce Irigaray viewpoint that the historical difficulty in proving the existence or otherwise of exact solutions to the Naviers-Stokes equation is down to the masculine science of physics promoting rigid-body mechanics and neglecting the fundamentally more feminine fluid-dynamics ;)

Like
Reply
Nitin Malik

PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)

3y

The computational cost of a neural network depends upon the number of arithmetic operations to be carried out which further depends upon the trainable weights and the training samples. The cost can be reduced if we have lots of zeros during initialization before the training, pruning the model during training, and post-training, their precision is reduced before deployment.

Like
Reply
Raghu Ram

I champion cutting-edge cybersecurity and endpoint management solutions for MSPs and SMBs, translating complex IT challenges into scalable security strategies.

3y

Adithya Premchandra

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics