Confusion Matrix

Confusion Matrix is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

No alt text provided for this image

Confusion Matrix is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.


The following is a Contingency Table of the types of errors and success (or hits) in a test for the presence of some anomaly (a tumour, a pathology, etc,), in general for the output of a binary classifier, a decision process or a diagnostic procedure:

No alt text provided for this image

The following measures qualify the performance of a test, classifier or decision process.

No alt text provided for this image

True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually is.

True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he actually is not.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.


We describe predicted values as Positive and Negative and actual values as True and False.

How to Calculate Confusion Matrix for a 2-class classification problem?

No alt text provided for this image

Recall

Out of all the positive classes, how much we predicted correctly. It should be high as possible.

Precision

Out of all the positive classes we have predicted correctly, how many are actually positive.

and Accuracy will be

Out of all the classes, how much we predicted correctly, which will be, in this case, 4/7. It should be as high as possible.

F-measure

It is difficult to compare two models with low precision and high recall or vice versa. So to make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.

No alt text provided for this image







To view or add a comment, sign in

More articles by DILIP KUMAR KHANDELWAL

  • Azure Data Factory - Data Flow

    Azure Data Factory - Data Flow

    Data Flow is a new feature of Azure Data Factory (ADF) that allows you to develop graphical data transformation logic…

  • groupByKey vs reduceByKey in Spark

    groupByKey vs reduceByKey in Spark

    While both of these functions will produce the same result 1. reduceByKey example works much better on a large dataset…

  • Spark - repartition() vs coalesce()

    Spark - repartition() vs coalesce()

    Spark - repartition() vs coalesce() 1. Repartitioning is a fairly expensive operation.

Insights from the community

Others also viewed

Explore topics