Metrics in classification (RECAP) -2
Hi Guys!
In the previous article, we have discussed specificity, sensitivity, and accuracy. In this article, we will discuss the remaining metrics in classification.
This post assumes you are familiar with the confusion matrix. If not, check it out!
Quartiles:
There are technically more definitions available for quartiles in wild. Here, I will specify from the source, will mention that at end of the article.
The Data point which divides the data into equally sized groups like 25th, 50th, and 100 - quartiles!
Normally 25 is the meaning of "Quant". In other words, percentiles are just quantiles that divide the data into 100 equally sized groups.
Often, the term 'quantile' and 'percentile' are used when we divide each data point into own its group.
Receiver Operating Characteristics:
"Assume that our task is to find mouse is obese or not obese? "
For this, we will build a "Logistic Regression" model, which will give the probability value ranges between 0 to 1.
To make logistic regression for classification, we just set a threshold value of .05 default.
Just consider the case, the confusion matrix, TP, TN, FN & FP are changing according to different thresholds like (.01 to .09).
To find the optimal threshold for our logistic regression model, we can create a separate model for each threshold? like (0.1 for one model, 0.2 for the next model ... n)
We can do this, but it is very time-consuming, so we use some alternative methods that help to find the optimal threshold value for logistic regression.
This is where the ROC is coming!
ROC!:
Instead of an overwhelming more confusion matrix, the ROC graph summarizes all the information.
The Y-axis contains Sensitivity (Correctly identified true samples) and the x-axis contains 1-Specificity (Correctly identified false samples).
The ROC graph summarizes all the confusion matrices that each threshold produced!
Process:
- Because of the threshold value, the confusion matrix changing right?, So, First we will calculate the Sensitivity and Specificity for 1st threshold value and we plot it in the roc graph.
- Then, we change the threshold value we find the sensitivity and specificity for the 2nd threshold value and plot it in a roc graph and this process will go on until the last threshold value.
- We can select which threshold is good by looking at the graph.
- If you want to compare multiple models, you can diagram multiple roc graphs of multiple models in the same graph!
- The best is which model is having a high Area Under the Curve, that model is considered as the best model.
Precision and recall!
They interconnect both with each other,
How many positive values are in the total predictions - precision!
How many values are really positive (dependent) in precision - recall!
It's a very simple concept, but if you look at any other resources, all will confuse you. We gave you a crystal concept of precision and recall! - 🤣
Did you like this post?
+
Name: R.Aravindan
Position: Content Writer
Company: Artificial Neurons.AI