Artificial Intelligence #48: How do we combine statistical thinking and machine learning?

Ajit Jaokar

Published Mar 22, 2022

+ Follow

If you are working with machine learning algorithms.. You often find statistics daunting

The problem is not the lack of information.. The problem is how to put it all together in a concise and pragmatic whole

Christoph Molnar has a good link which summarises data modelling mindsets ideas.

I summarise it below and then attempt to extend this thinking to deep learning

Data modelling mindsets include Bayesian and frequentist statistics, machine learning and causal inference. While these approaches share common methods and models, they differ in assumptions about the data-generating process and when a model is a good generalization of the real world.

Machine learning minimizes a loss function L by finding the best function f that to predict target Y from features X. A good machine learning model has a low loss on the test data.

Statistical inference fits the best parameters of a chosen probability distribution for variables X. A good statistical model has a high goodness-of-fit: the data fits the distribution.

Bayesian inference assumes that the distribution parameters θ are random variables with an a-priori distribution. A good Bayesian model has a high posterior probability (Bayes factor).

Causal inference operates on the principles of causality, intervention and counterfactuals.. A good causal model has high goodness-of-fit and solid causal assumptions.

Recommended by LinkedIn

How can machine learning be used to improve existing…

Machine Learning 2 years ago

Artificial Intelligence #5 : A taxonomy of machine…

Ajit Jaokar 3 years ago

Artificial Intelligence No 52: An introduction to…

Ajit Jaokar 2 years ago

Then he recommends that the smart way is to be pragmatic about the modeling choices i.e. if you need causal interpretation use causal models; if only predictive performance is important then pick machine learning; want to include prior information about model parameters then choose Bayesian stats.

The above represents a very pragmatic view of unifying statistical thinking and machine learning – including the various paradigms of statistical thinking

But we could also extend it to deep learning

Essentially, the key characteristic of deep learning is representation learning

Deep learning itself has evolved rapidly into a few key areas.

1) Initially, autoencoders were not taken seriously but over time, we realised that the key feature of autoencoders is in their ability to learn a representation. This became significant with variational autoencoders and then with GANs

2) Also, we saw transformers based on the attention mechanism. The ability to train transformers in parallel led to large language models like GPT-3

3) Reinforcement learning continues to evolve

4) Also we are seeing multimodal learning like CLIP

So, in many ways, these three worlds are rapidly evolving with some synergies: statistics, machine learning and deep learning

Note, I say statistics is also rapidly evolving because I believe that Bayesian and causal models will play a key role in the future

Post from Christoph Molnar on data modelling mindsets

Christoph Molnar book on interpretable machine learning

Image source: Rutgers university