A confusion matrix in Machine Learning could be a technique for summarizing the performance of a classification algorithmic program.
Confusion matrix or the error matrix is a table or matrix that is often used to describe the performance of a classification model on a set of test data for which the true values are known.
It permits the image of the performance of an associate algorithmic program. It permits straightforward identification of confusion between categories e.g. one class is commonly mislabeled as the other.
Aim of the Article is –
- What is the Confusion Matrix in Machine Learning?
- Why you need to use it
- Confusion matrix example
- Confusion Matrix in Machine Learning with Python code
- How to calculate Confusion Matrix for a 2-class classification problem?
- Recall
- Precision
- F-Measure
Understanding Confusion Matrix
A confusion matrix is a small description and concise summary of prediction results on a classification problem. The number of positive and negative predictions are summarized with count values and broken down by each class.
The confusion matrix shows the ways in which the classification model is confused when it makes predictions.
A simple guide to Confusion Matrix terminology
Let’s understand a few of the terminology used in the matrix like TP, FP, FN, TN, etc.
- T: True
- F: False
- P: Positive
- N: Negative
- True Positive (TP): Actually is true but predicted to be positive.
- False Positive (FP): Actually is false but is predicted to be positive.
- False Negative (FN): Actually is true but is predicted to be negative.
- True Negative (TN): Actually is false but is predicted to be negative.
Confusion matrix example
Let us understand all the terminology used in the confusion matrices by an example. We have taken a Pregnancy example.
- True Positive (TP):
Your prediction is that a woman is pregnant and she actually is pregnant.
- True Negative (TN):
Your prediction is that a man is not pregnant and he actually is not pregnant
- False Positive (FP):
Your prediction is that a man is pregnant and he actually is not pregnant.
- False Negative (FN):
Your prediction is that a woman is not pregnant but she actually is pregnant.
How to calculate Confusion Matrix for a 2-class classification problem?
The method for calculating a confusion Matrices is given below –
- You need a testing dataset with expected outcome values.
- Make a prediction for each row in the testing dataset.
- From the expected outcomes and predictions count:
- The number of correct predictions for each class.
- The number of incorrect predictions for each class, organized by the class that was predicted.
Confusion Matrix in Python with scikit-learn
The scikit-learn library for machine learning in Python can calculate the matrix.
Given an array or list of expected values and a list of predictions from your machine learning model, the confusion_matrix() function will calculate a confusion matrix and return the result as an array. Display the array and judge the results.
from sklearn.metrics import confusion_matrix
expected = [1, 0, 1, 0, 0, 1, 1, 1, 0]
predicted = [0, 0, 1, 1, 0, 1, 1, 1, 1]
results = confusion_matrix(expected, predicted)
print(results)
On executing this example prints the matrix array summarizing the results for the contrived 2 class problem.
Learn more about theconfusion_matrix() function in the scikit-learn API documentation.
Confusion Matrix Precision and Recall and F-measure accuracy python
Recall
The recall is the total number of truly classified positive examples divide to the total number of positive examples.
Recall = TP / (TP + FN)
Precision
Precision can be defined as the ratio of the total number of correctly classified positive examples by the total number of predicted positive examples.
Precision = TP / (TP + FP)
F-measure
We find out F-measure value which uses HM (Harmonic Mean) in place of AM (Arithmetic Mean) as it punishes the intense values additional.
The F-Measure will always be closer to the lesser value of Precision or Recall.
F-measure = 2 * Recall * Precision / (Recall + Precision)