Massively Speed-Up your Learning Algorithm, with Stochastic Thinning
Prediction error using a fraction of your training set

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning

You have to see it to believe it! Imagine a technique where you randomly delete as many as 80% of your observations in the training set, without decreasing the predictive power (actually improving it in many cases), and reducing computing time by an order of magnitude. In its simplest version, that’s what stochastic thinning does. Here, performance improvement is measured outside the training set, on the validation set also called test data. I illustrate this method on a real-life dataset, in the context of regression and neural networks. In the latter, it speeds up the training stage by a noticeable factor. The thinning process applies to the training set, and may involve multiple tiny random subsets called fractional training sets, representing less than 20% of the training data when combined together. It can also be used for data compression, or to measure the strength of a machine learning algorithm.

No alt text provided for this image
Prediction error using a fraction of your training set

I also show the potential limitations of the new technique, and introduce the concepts of leading or influential observations (those kept for learning purposes) and followers (observations dropped from the training set). The word “influential observations” should not be confused with its usage in statistics, although in both cases it leads to explainable AI. The neural network used in this article offers replicable results by controlling all the sources of randomness, a property rarely satisfied in other implementations.

If you are new to neural networks and deep learning or manage a group of engineers developing or using such tools, the full technical article (13 pages including 6 pages of Python code) will give you a quick overview of the issues and benefits surrounding these methods, and a solid high-level introduction to the subject including how to discover and overcome — or leverage — the problems faced.

To read more, access the full document and learn how it works with real-world use case, follow this link.

To view or add a comment, sign in

More articles by Vincent Granville

Insights from the community

Others also viewed

Explore topics