Metrics in Classification! (RECAP)
Hi Guys!
Considering this is our dataset, we need to find the patient heart disease based on the following features: chest pain, blood circulation, and lot. To solve this problem, we can use the k nearest neighbors, logistic regression, random forest, or some other methods that are available in machine learning. To find which model performs well, we use metrics.
We classify metrics into two types:
1. Classification
2. Regression
In this article, we will discuss metrics that are available for classification. The first metric is Confusion Matrix:
Confusion Matrix:
It helps to understand how well our model performed well in different stages! Before moving on confusion matrix first, we look at what is true and false.
Yes, but it will be true positive, true negative, false positive, and false negative.
Here, we need to find the person is having heart disease or not
True Positive:
Actual: The person is having heart disease.
Predicted: A model predicted, the person is having heart disease.
True Negative:
Actual: The person doesn't have heart disease.
Predicted: A model predicted a person does not have heart disease.
False Positive:
Actual: The person doesn't have heart disease.
Predicted: A model predicted a person has heart disease.
False Negative:
Actual: The person has heart disease.
Predicted: A model predicted a person doesn't have heart disease.
We combine all the things and make a confusion matrix of 2 x 2 (because here we are going to predict has or hasn't heart disease) matrix.
This is a 2x2 matrix, TP (True Positive), TN (true Negative), FN(False Positive), and FN(False Negative). Diagonal elements are true values and non-diagonal values are false values.
True Positive + True Negative = Total number of correct predictions.
False Positive + False Negative = Total number of wrong predictions.
How we can find which model performs well? By using this data. Based on the domain expert. If a domain expert telling false positive is too important, then we need to reduce the false positive. If domain experts telling true negative is too important, then we will reduce the true negative. Based on the problem, we will reduce the particular effects.
Note: This confusion matrix is for only a 2x2 matrix. Based on the classification ranges vary, we will get more matrices.
In other words, if we want to classify 10 different species. Then, we have a 10 x 10 confusion matrix.
NOTE: Considered Diagonal Elements as True Value, Remaining all are considered as False values!
By using the confusion matrix, we will identify true positive rate, false-positive rate, precision, recall, and more. Let's recap one by one!
Sensitivity (2x2):
Sensitivity tells what percentage of YES (positive) class, identified correctly! - simplified for you.
Sensitivity = True Positive / True Positive + False Negative
Sensitivity tells 90% (assumption) YES category, identified correctly!
Specificity (2x2):
Sensitivity tells what percentage of NO (Negative) class, identified correctly! -too simplified
Specificity = True Negative / True Negative + False Positive
Specificity tells 80% (assumption) No category, identified correctly!
Summary!
If Positive is more important, we go for sensitivity.
If Negative is more important, we go for Specificity.
Accuracy:
We will combine everything, what we have learned still now! Accuracy is nothing but how many times we predicted the class correctly.
In simple terms, the total number of correct predictions includes false and true categories.
Accuracy = Number of Correct Predictions / Total number of prediction
Accuracy = TP + TN/ TP + TN + FP + FN
By using this simple formula, we will identify a total number of correct predictions.
We will see some more metrics in future articles!
Thank you so much!
+
Name: R.Aravindan
Company: Artificial Neurons.AI
Position: Content Writer