Artificial Intelligence #33: Implications of the computational cost of deep neural networks
Welcome to Artificial Intelligence # 33
This week, we crossed 35K subscribers in about 7 months
Thanks for your support
I use this newsletter to share my ideas about teaching
In that sense, it will be always long form and also will not have advertisements / only links etc
Speaking of teaching, we started the cloud and edge course last week and it an insightful conversation as usual
In the class, I mentioned Navier Stokes equations. In computational fluid dynamics, Navier stokes equations are one of the most widely used techniques with applications in aircraft design, automotive design, understanding the study of blood flow, the design of power stations, the analysis of pollution etc etc.
Despite their utility, it has not been proven if an exact solution exists in three dimensions – making Navier stokes one of the seven most important open questions in mathematics (along with the Riemann hypothesis, the Birch and Swinnerton-Dyer conjecture, the P versus NP problem, the Yang-Mills existence and mass gap problem, the Poincaré conjecture and the Hodge conjecture)
So, even if an exact solution does not exist (only an optimal solution is possible), they are very useful
And of course that has parallels with neural networks
Understanding this situation also gives a deeper appreciation of the costs and the limitations of deep neural networks. I read an interesting paper that highlighted this situation
To summarise
As per the paper
Recommended by LinkedIn
The first part is true of all statistical models: To improve performance by a factor of k, at least k2 more data points must be used to train the model. The second part of the computational cost comes explicitly from overparameterization. Once accounted for, this yields a total computational cost for improvement of at least k4. That little 4 in the exponent is very expensive: A 10-fold improvement, for example, would require at least a 10,000-fold increase in computation.
As per the paper
As per scholars at the University of Massachusetts Amherst allows us to understand the economic cost and carbon emissions implied by this computational burden. The answers are grim: Training such a model would cost US $100 billion and would produce as much carbon emissions as New York City does in a month. And if we estimate the computational burden of a 1 percent error rate, the results are considerably worse.
How do you overcome this?
I am personally interested in combining Bayesian techniques with deep learning
Finally, on a related note, Timnit Gebru has formed her own AI research centre
Such independent thinking is needed because thanks to her, we are all now more aware of the computational cost of deep neural networks
Also, this is the second time in two weeks that have discussed Navier Stokes at Oxford. The first was when Dr Robbie Stevens mentioned Navier Stokes in his talk at the Digital twins course
Paper source: Deep learning’s diminishing returns
Image - Navier Stokes equation NASA
Fullstack Software Developer
3yNevertheless, I still subscribe to the Luce Irigaray viewpoint that the historical difficulty in proving the existence or otherwise of exact solutions to the Naviers-Stokes equation is down to the masculine science of physics promoting rigid-body mechanics and neglecting the fundamentally more feminine fluid-dynamics ;)
PhD | Professor | Data Science | Machine Learning | Deputy Dean (Research)
3yThe computational cost of a neural network depends upon the number of arithmetic operations to be carried out which further depends upon the trainable weights and the training samples. The cost can be reduced if we have lots of zeros during initialization before the training, pruning the model during training, and post-training, their precision is reduced before deployment.
I champion cutting-edge cybersecurity and endpoint management solutions for MSPs and SMBs, translating complex IT challenges into scalable security strategies.
3yAdithya Premchandra
in this newsletter Robbie Stevens