Adaptive Boosting (In-depth intuition)

Adaptive Boosting (In-depth intuition)

This post assumes you are already familiar with the decision trees, and bootstrapping.

No alt text provided for this image

Adaboost is a boosting technique. In simple words, boosting is a sequential (next to next) model. It is a powerful model for classification problems because sequentially tree growing considering past mistakes. It adapts the weights to its own and changes the weights based on past mistakes.

Adaboost likes weak learners. It combines multiple weak learners and change weak learners to super learners. ❤️🔥

This AdaBoost will build based on the decision trees. So, decision trees' knowledge is highly acceptable.

Terminology!!

Stump - A tree with one node and two leaves!

No alt text provided for this image

  • Stumps are not good at making accurate classifications!
  • Stumps are technically "weak learners".
  • Why? It takes one feature to make a decision on the dataset at a particular time!
  • That's why AdaBoost likes weak learners.

In contrast, in a forest of trees made with adaboost, the trees are usually a node and two leaves (stumps).

Order of Stump is important?

  • Usually, the order of the nodes is not important to a random forest, decision tree because they are independent of each other.
  • But in our case, the Order of stump is too important!

In contrast, in the Forest of Stumps made with AdaBoost, the order is important, but in the random forest it's not important.

  • Because the error that the first stump makes influences how the second stump is made, then the error that the second stump makes influences how the third stump is made. Etc, etc, etc.. 

No alt text provided for this image

 

Three Main Ideas Behind the Ada boost!

  • Adaboost Combines a lot of weak learners to make classifications. Then weak learners are always stumps.
  • Some stumps get more say in the classification than others.
  • All stumps are made by taking the previous stump's mistake into account. 

Creating Adaboost! 

Consider we need to classify patient is having heart disease or not? 

No alt text provided for this image

Step1: ( Initialize sample weight )

  • Give sample weight to each row in the dataset.
  • It shows how important it is to be correctly Classified.

Formula:

Sample Weight = 1 / Total number of samples!


No alt text provided for this image

Step 2: ( Create First Stump ) 

  • Selecting the first stump is a complicated process if you don't know about Gini impurity or entropy. It's very easy if you know about that. In simple words, you need to calculate the Gini impurity for each feature in your dataset and find which features is having a very low Gini index we will take as for our first stump node.

No alt text provided for this image

Calculate say for the first stump! 

  • Where ever we have a wrong classification, we can find the say using the wrong classification in our stump.
  • In our stump, we have 1 wrong classification (Incorrect classification).
  • The patient weight stump has 1 wrong sample.
  • The total error of the stump is the sum of the weights associated with Incorrect classified samples.
  • The total error will always be between 0 for a perfect stump and 1 for a bad stump/
  • We use Total error to determine the amount of say in the final classification in the stump. 

Formula:

Amount of SAY = 1/2log (1 - Total error) / (Total Error)
No alt text provided for this image
Now, we need to learn how to change the weight. So, that the next stump will take the errors that the current stump made. 

Step 3: (Increase and decrease the sample weight)

In simple words, we need to increase the sample weight for wrongly classified value and we need to decrease the sample weight for correctly classified value. 

The formula for increasing the sample weight

Increase the Sample weight (Formula) = sample weight x e amount of say ( Incorrectly classified values) 
No alt text provided for this image


The formula for decreasing the sample weight

Decrease the Sample weight (Formula) = sample weight x e - amount of say ( Correctly classified values) 
No alt text provided for this image

The major difference is that the exponential is having a negative term here, it helps to reduce the correct value.

If you see carefully, the wrongly classified sample values get increased, and correctly classified sample weight gets decreased. 

No alt text provided for this image

These values are not normalized. It means if you add everything, it must be 1, but the sum of the new weight is not equal to 1. So, we normalized the data. 

The formula for normalizing:

= New weight[i] / sum (new weight)
No alt text provided for this image

Then we will update the new weights to the old sample weight, for creating a second stump! Then we can use the updated sample weight to make the second stump in the AdaBoost forest (repeated words)

Norm.weight -> Sample weight

No alt text provided for this image

Now we can use the modified sample weights to make the second stump in the AdaBoost forest.

In theory, to create the next stump, we will use the weighted Gini Index, but instead of using the weighted Gini Index, we can make a new collection of samples that contains duplicate copies of the samples with the largest sample weights. 

Process:

  • In simple words, we will create a new empty dataset with the same shape as an old dataset, then we pick a random number between 0 and 1, if the random number is between the buckets, we take that row to the newly created empty dataset.
  • For creating buckets like 1st bucket (0.07 + 0.07 = 0.14), 2nd bucket ( 0.14 + 0.07 = 0.21 ), 3rd bucket ( 0.21+0.07=0.28 ), 4th bucket ( 0.28 + 0.49 = 0.77 ) like this etc,. . .

For Example:

1st random number is 0.72. It is in the 5 buckets, right?

2nd random number is 0.42. It is 4 buckets, right?

3rd random number is 0.83 it is in 6 buckets, right? 

This random initialization was happened automatically by the algorithm!

We just continue this process until the new empty dataset and old dataset have the same shape.

No alt text provided for this image

This is our updated dataset. Now, we can use this to make a new stump from the beginning.

Then we do all the process from the beginning! Then this process will go n number of times. 

How does Adaboost find the answer? 

  • Now, we have AdaBoost stump forest, we will separate the stump classified as having heart disease and not having heart disease.
  • Then, we will find the sum of heart disease and sum of not heart disease stumps (predictions), then which total is a higher value, then take it as an answer.

No alt text provided for this image

This is how AdaBoost working ❄️😌

Did you like this article?

+

Name: R.Aravindan

Position: Content Writer

Company: Artificial Neurons.AI


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics