- sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)[source]¶
-
Compute confusion matrix to evaluate the accuracy of a classification.
By definition a confusion matrix (C) is such that (C_{i, j})
is equal to the number of observations known to be in group (i) and
predicted to be in group (j).Thus in binary classification, the count of true negatives is
(C_{0,0}), false negatives is (C_{1,0}), true positives is
(C_{1,1}) and false positives is (C_{0,1}).Read more in the User Guide.
- Parameters:
-
- y_truearray-like of shape (n_samples,)
-
Ground truth (correct) target values.
- y_predarray-like of shape (n_samples,)
-
Estimated targets as returned by a classifier.
- labelsarray-like of shape (n_classes), default=None
-
List of labels to index the matrix. This may be used to reorder
or select a subset of labels.
IfNoneis given, those that appear at least once
iny_trueory_predare used in sorted order. - sample_weightarray-like of shape (n_samples,), default=None
-
Sample weights.
New in version 0.18.
- normalize{‘true’, ‘pred’, ‘all’}, default=None
-
Normalizes confusion matrix over the true (rows), predicted (columns)
conditions or all the population. If None, confusion matrix will not be
normalized.
- Returns:
-
- Cndarray of shape (n_classes, n_classes)
-
Confusion matrix whose i-th row and j-th
column entry indicates the number of
samples with true label being i-th class
and predicted label being j-th class.
References
Examples
>>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
>>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
In the binary case, we can extract true positives, etc as follows:
>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel() >>> (tn, fp, fn, tp) (0, 2, 1, 1)
Examples using sklearn.metrics.confusion_matrix¶
| title | date | categories | tags | |||||
|---|---|---|---|---|---|---|---|---|
|
How to create a confusion matrix with Scikit-learn? |
2020-05-05 |
frameworks |
|
After training a supervised machine learning model such as a classifier, you would like to know how well it works.
This is often done by setting apart a small piece of your data called the test set, which is used as data that the model has never seen before.
If it performs well on this dataset, it is likely that the model performs well on other data too — if it is sampled from the same distribution as your test set, of course.
Now, when you test your model, you feed it the data — and compare the predictions with the ground truth, measuring the number of true positives, true negatives, false positives and false negatives. These can subsequently be visualized in a visually appealing confusion matrix.
In today’s blog post, we’ll show you how to create such a confusion matrix with Scikit-learn, one of the most widely used frameworks for machine learning in today’s ML community. By means of an example created with Python, we’ll show you step-by-step how to generate a matrix with which you can visually determine the performance of your model easily.
All right, let’s go! 
[toc]
A confusion matrix in more detail
Training your machine learning model involves its evaluation. In many cases, you have set apart a test set for this.
The test set is a dataset that the trained model has never seen before. Using it allows you to test whether the model has overfit, or adapted to the training data too well, or whether it still generalizes to new data.
This allows you to ensure that your model does not perform very poorly on new data while it still performs really good on the training set. That wouldn’t really work in practice, would it 
Evaluation with a test set often happens by feeding all the samples to the model, generating a prediction. Subsequently, the predictions are compared with the ground truth — or the true targets corresponding to the test set. These can subsequently be used for computing various metrics.
But they can also be used to demonstrate model performance in a visual way.
Here is an example of a confusion matrix:
To be more precise, it is a normalized confusion matrix. Its axes describe two measures:
- The true labels, which are the ground truth represented by your test set.
- The predicted labels, which are the predictions generated by the machine learning model for the features corresponding to the true labels.
It allows you to easily compare how well your model performs. For example, in the model above, for all true labels 1, the predicted label is 1. This means that all samples from class 1 were classified correctly. Great!
For the other classes, performance is also good, but a little bit worse. As you can see, for class 2, some samples were predicted as being part of classes 0 and 1.
In short, it answers the question «For my true labels / ground truth, how well does the model predict?».
It’s also possible to start from a prediction point of view. In this case, the question would change to «For my predicted label, how many predictions are actually part of the predicted class?». It’s the opposite point of view, but could be a valid question in many machine learning cases.
Most preferably, the entire set of true labels is equal to the set of predicted labels. In those cases, you would see zeros everywhere except for the line from the top left to the bottom right. In practice, however, this does not happen often. Likely, the plot is much more scattered, like this SVM classifier where many supporrt vectors are necessary to draw a decision boundary that does not work perfectly, but adequately enough:
Creating a confusion matrix with Python and Scikit-learn
Let’s now see if we can create a confusion matrix ourselves. Today, we will be using Python and Scikit-learn, one of the most widely used frameworks for machine learning today.
Creating a confusion matrix involves various steps:
- Generating an example dataset. This one makes sense: we need data to train our model on. We’ll therefore be generating data first, so that we can make an adequate choice for a ML model class next.
- Picking a machine learning model class. Obviously, if we want to evaluate a model, we need to train a model. We’ll choose a particular type of model first that fits the characteristics of our data.
- Constructing and training the ML model. The consequence of the first two steps is that we end up with a trained model.
- Generating the confusion matrix. Finally, based on the trained model, we can create our confusion matrix.
Software dependencies you need to install
Very briefly, but importantly: if you wish to run this code, you must make sure that you have certain software dependencies installed. Here they are:
- You need to install Python, which is the platform that our code runs on, version 3.6+.
- You need to install Scikit-learn, the machine learning framework that we will be using today:
pip install -U scikit-learn. - You need to install Numpy for numbers processing:
pip install numpy. - You need to install Matplotlib for visualizing the plots:
pip install matplotlib. - Finally, if you wish to generate a plot of decision boundaries (not required), you also need to install Mlxtend:
pip install mlxtend.
[affiliatebox]
Generating an example dataset
The first step is generating an example dataset. We will be using Scikit-learn for this purpose too. First, create a file called confusion-matrix.py, and open it in a code editor. The first thing we do is add the imports:
# Imports
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
The make_blobs function from Scikit-learn allows us to generate ‘blobs’, or clusters, of samples. Those blobs are centered around some point and are the samples are scattered around this point based on some standard deviation. This gives you flexibility about both the position and the structure of your generated dataset, in turn allowing you to experiment with a variety of ML models without having to worry about the data.
As we will evaluate the model, we need to ensure that the dataset is split between training and testing data. Scikit-learn also allows us to do this, with train_test_split. We therefore import that one too.
Configuration options
Next, we can define a number of configuration options:
# Configuration options
blobs_random_seed = 42
centers = [(0,0), (5,5), (0,5), (2,3)]
cluster_std = 1.3
frac_test_split = 0.33
num_features_for_samples = 4
num_samples_total = 5000
The random seed describes the initialization of the pseudo-random number generator used for generating the blobs of data. As you may know, no random number generator is truly random. What’s more, they are also initialized differently. Configuring a fixed seed ensures that every time you run the script, the random number generator initializes in the same way. If weird behavior occurs, you know that it’s likely not the random number generator.
The centers describe the centers in two-dimensional space of our blobs of data. As you can see, we have 4 blobs today.
The cluster standard deviation describes the standard deviation with which a sample is drawn from the sampling distribution used by the random point generator. We set it to 1.3; a lower number produces clusters that are better separable, and vice-versa.
The fraction of the train/test split determines how much data is split off for testing purposes. In our case, that’s 33% of the data.
The number of features for our samples is 4, and indeed describes how many targets we have: 4, as we have 4 blobs of data.
Finally, the number of samples generated is pretty self-explanatory. We set it to 5000 samples. That’s not too much data, but more than sufficient for the educational purposes of today’s blog post.
Generating the data
Next up is the call to make_blobs and to train_test_split for actually generating and splitting the data:
# Generate data
inputs, targets = make_blobs(n_samples = num_samples_total, centers = centers, n_features = num_features_for_samples, cluster_std = cluster_std)
X_train, X_test, y_train, y_test = train_test_split(inputs, targets, test_size=frac_test_split, random_state=blobs_random_seed)
Saving the data (optional)
Once the data is generated, you may choose to save it to file. This is an optional step — and I include it because I want to re-use the same dataset every time I run the script (e.g. because I am tweaking a visualization). If you use the code below, you can run it once — then, it’s saved in the .npy file. When you subsequently uncomment the np.save call, and possibly also the generate data calls, you’ll always have the same data load from file.
Then, you can tweak away your visualization easily without having to deal with new data all the time 
# Save and load temporarily
np.save('./data_cf.npy', (X_train, X_test, y_train, y_test))
X_train, X_test, y_train, y_test = np.load('./data_cf.npy', allow_pickle=True)
Should you wish to visualize the data, this is of course possible:
# Generate scatter plot for training data
plt.scatter(X_train[:,0], X_train[:,1])
plt.title('Linearly separable data')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
Picking a machine learning model class
Now that we have our code for generating the dataset, we can take a look at the output to determine what kind of model we could use:
I can derive a few characteristics from this dataset (which, obviously, I also built-in up front 
First of all, the number of features is low: only two — as our data is two-dimensional. This is good, because then we likely don’t face the curse of dimensionality, and a wider range of ML models is applicable.
Next, when inspecting the data from a closer point of view, I can see a gap between what seem to be blobs of data (it is also slightly visible in the diagram above):
This suggests that the data may be separable, and possibly even linearly so (yes, of course, I know this is the case 
Third, and finally, the number of samples is relatively low: only 5.000 samples are present. Neural networks with their relatively large amount of trainable parameters would likely start overfitting relatively quickly, so they wouldn’t be my preferable choice.
However, traditional machine learning techniques to the rescue. A Support Vector Machine, which attempts to construct a decision boundary between separable blobs of data, can be a good candidate here. Let’s give it a try: we’re going to construct and train an SVM and see how well it performs through its confusion matrix.
Constructing and training the ML model
As we have seen in the post linked above, we can also use Scikit-learn to construct and train a SVM classifier. Let’s do so next.
Model imports
First, we’ll have to add a few extra imports to the top of our script:
from sklearn import svm
from sklearn.metrics import plot_confusion_matrix
from mlxtend.plotting import plot_decision_regions
(The Mlxtend one is optional, as we discussed at ‘what you need to install’, but could be useful if you wish to visualize the decision boundary later.)
Training the classifier
First, we initialize the SVM classifier. I’m using a linear kernel because I suspect (actually, I’m confident, as we constructed the data ourselves) that the data is linearly separable:
# Initialize SVM classifier
clf = svm.SVC(kernel='linear')
Then, we fit the training data — starting the training process:
# Fit data
clf = clf.fit(X_train, y_train)
That’s it for training the machine learning model! The classifier variable, or clf, now contains a reference to the trained classifier. By calling clf.predict, you can now generate predictions for new data.
Generating the confusion matrix
But let’s take a look at generating that confusion matrix now. As we discussed, it’s part of the evaluation step, and we use it to visualize its predictive and generalization power on the test set.
Recall that we compare the predictions generated during evaluation with the ground truth available for those inputs.
The plot_confusion_matrix call takes care of this for us, and we simply have to provide it the classifier (clf), the test set (X_test and y_test), a color map and whether to normalize the data.
# Generate confusion matrix
matrix = plot_confusion_matrix(clf, X_test, y_test,
cmap=plt.cm.Blues,
normalize='true')
plt.title('Confusion matrix for our classifier')
plt.show(matrix)
plt.show()
Normalization, here, involves converting back the data into the [0, 1] format above. If you leave out normalization, you get the number of samples that are part of that prediction:
Here are some other visualizations that help us explain the confusion matrix (for the boundary plot, you need to install Mlxtend with pip install mlxtend):
# Get support vectors
support_vectors = clf.support_vectors_
# Visualize support vectors
plt.scatter(X_train[:,0], X_train[:,1])
plt.scatter(support_vectors[:,0], support_vectors[:,1], color='red')
plt.title('Linearly separable data with support vectors')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
# Plot decision boundary
plot_decision_regions(X_test, y_test, clf=clf, legend=2)
plt.show()
It’s clear that we need many support vectors (the red samples) to generate the decision boundary. Given the relative unclarity of the separability between the data points, this is not unexpected. I’m actually quite satisfied with the performance of the model, as demonstrated by the confusion matrix (relatively blue diagonal line).
The only class that underperforms is class 3, with a score of 0.68. It’s still acceptable, but is lower than preferred. This can be explained by looking at the class in the decision boundary plot. Here, it’s clear that it’s the middle class — the reds. As those samples are surrounded by the other ones, it’s clear that the model has had significant difficulty generating the decision boundary. We might for example counter this by using a different kernel function which takes this into account, ensuring better separability. However, that’s not the core of today’s post.
Full model code
Should you wish to obtain the full model code, that’s of course possible. Here you go 
# Imports
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.metrics import plot_confusion_matrix
from mlxtend.plotting import plot_decision_regions
# Configuration options
blobs_random_seed = 42
centers = [(0,0), (5,5), (0,5), (2,3)]
cluster_std = 1.3
frac_test_split = 0.33
num_features_for_samples = 4
num_samples_total = 5000
# Generate data
inputs, targets = make_blobs(n_samples = num_samples_total, centers = centers, n_features = num_features_for_samples, cluster_std = cluster_std)
X_train, X_test, y_train, y_test = train_test_split(inputs, targets, test_size=frac_test_split, random_state=blobs_random_seed)
# Save and load temporarily
np.save('./data_cf.npy', (X_train, X_test, y_train, y_test))
X_train, X_test, y_train, y_test = np.load('./data_cf.npy', allow_pickle=True)
# Generate scatter plot for training data
plt.scatter(X_train[:,0], X_train[:,1])
plt.title('Linearly separable data')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
# Initialize SVM classifier
clf = svm.SVC(kernel='linear')
# Fit data
clf = clf.fit(X_train, y_train)
# Generate confusion matrix
matrix = plot_confusion_matrix(clf, X_test, y_test,
cmap=plt.cm.Blues)
plt.title('Confusion matrix for our classifier')
plt.show(matrix)
plt.show()
# Get support vectors
support_vectors = clf.support_vectors_
# Visualize support vectors
plt.scatter(X_train[:,0], X_train[:,1])
plt.scatter(support_vectors[:,0], support_vectors[:,1], color='red')
plt.title('Linearly separable data with support vectors')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
# Plot decision boundary
plot_decision_regions(X_test, y_test, clf=clf, legend=2)
plt.show()
[affiliatebox]
Summary
That’s it for today! In this blog post, we created a confusion matrix with Python and Scikit-learn. After studying what a confusion matrix is, and how it displays true positives, true negatives, false positives and false negatives, we gave a step-by-step example for creating one yourself.
The example included generating a dataset, picking a suitable machine learning model for the dataset, constructing, configuring and training it, and finally interpreting the results i.e. the confusion matrix. This way, you should be able to understand what is happening and why I made certain choices.
I hope you’ve learnt something from today’s blog post! 
Thank you for reading MachineCurve today and happy engineering! 😎
[scikitbox]
References
Raschka, S. (n.d.). Home — mlxtend. Site not found · GitHub Pages. https://rasbt.github.io/mlxtend/
Scikit-learn. (n.d.). scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation. Retrieved May 3, 2020, from https://scikit-learn.org/stable/index.html
Scikit-learn. (n.d.). 1.4. Support vector machines — scikit-learn 0.22.2 documentation. scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation. Retrieved May 3, 2020, from https://scikit-learn.org/stable/modules/svm.html#classification
Scikit-learn. (n.d.). Confusion matrix — scikit-learn 0.22.2 documentation. scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation. Retrieved May 5, 2020, from https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
Scikit-learn. (n.d.). Sklearn.metrics.plot_confusion_matrix — scikit-learn 0.22.2 documentation. scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation. Retrieved May 5, 2020, from https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html#sklearn.metrics.plot_confusion_matrix
A Dependency Free Multiclass Confusion Matrix
# A Simple Confusion Matrix Implementation
def confusionmatrix(actual, predicted, normalize = False):
"""
Generate a confusion matrix for multiple classification
@params:
actual - a list of integers or strings for known classes
predicted - a list of integers or strings for predicted classes
normalize - optional boolean for matrix normalization
@return:
matrix - a 2-dimensional list of pairwise counts
"""
unique = sorted(set(actual))
matrix = [[0 for _ in unique] for _ in unique]
imap = {key: i for i, key in enumerate(unique)}
# Generate Confusion Matrix
for p, a in zip(predicted, actual):
matrix[imap[p]][imap[a]] += 1
# Matrix Normalization
if normalize:
sigma = sum([sum(matrix[imap[i]]) for i in unique])
matrix = [row for row in map(lambda i: list(map(lambda j: j / sigma, i)), matrix)]
return matrix
The approach here is to pair up the unique classes found in the actual vector into a 2-dimensional list. From there, we simply iterate through the zipped actual and predicted vectors and populate the counts using the indices to access the matrix positions.
Usage
cm = confusionmatrix(
[1, 1, 2, 0, 1, 1, 2, 0, 0, 1], # actual
[0, 1, 1, 0, 2, 1, 2, 2, 0, 2] # predicted
)
# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]
Note: the actual classes are along the columns and the predicted classes are along the rows.
# Actual
# 0 1 2
# # #
[[2, 1, 0], # 0
[0, 2, 1], # 1 Predicted
[1, 2, 1]] # 2
Class Names Can be Strings or Integers
cm = confusionmatrix(
["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"] # predicted
)
# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]
You Can Also Return The Matrix With Proportions (Normalization)
cm = confusionmatrix(
["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"], # predicted
normalize = True
)
# And The Output
print(cm)
[[0.2, 0.1, 0.0], [0.0, 0.2, 0.1], [0.1, 0.2, 0.1]]
A More Robust Solution
Since writing this post, I’ve updated my library implementation to be a class that uses a confusion matrix representation internally to compute statistics, in addition to pretty printing the confusion matrix itself. See this Gist.
Example Usage
# Actual & Predicted Classes
actual = ["A", "B", "C", "C", "B", "C", "C", "B", "A", "A", "B", "A", "B", "C", "A", "B", "C"]
predicted = ["A", "B", "B", "C", "A", "C", "A", "B", "C", "A", "B", "B", "B", "C", "A", "A", "C"]
# Initialize Performance Class
performance = Performance(actual, predicted)
# Print Confusion Matrix
performance.tabulate()
With the output:
===================================
Aᴬ Bᴬ Cᴬ
Aᴾ 3 2 1
Bᴾ 1 4 1
Cᴾ 1 0 4
Note: classᴾ = Predicted, classᴬ = Actual
===================================
And for the normalized matrix:
# Print Normalized Confusion Matrix
performance.tabulate(normalized = True)
With the normalized output:
===================================
Aᴬ Bᴬ Cᴬ
Aᴾ 17.65% 11.76% 5.88%
Bᴾ 5.88% 23.53% 5.88%
Cᴾ 5.88% 0.00% 23.53%
Note: classᴾ = Predicted, classᴬ = Actual
===================================
Матрица ошибок – это метрика производительности классифицирующей модели Машинного обучения (ML).
Когда мы получаем данные, то после очистки и предварительной обработки, первым делом передаем их в модель и, конечно же, получаем результат в виде вероятностей. Но как мы можем измерить эффективность нашей модели? Именно здесь матрица ошибок и оказывается в центре внимания.
Матрица ошибок – это показатель успешности классификации, где классов два или более. Это таблица с 4 различными комбинациями сочетаний прогнозируемых и фактических значений.
Давайте рассмотрим значения ячеек (истинно позитивные, ошибочно позитивные, ошибочно негативные, истинно негативные) с помощью «беременной» аналогии.
Истинно позитивное предсказание (True Positive, сокр. TP)
Вы предсказали положительный результат, и женщина действительно беременна.
Истинно отрицательное предсказание (True Negative, TN)
Вы предсказали отрицательный результат, и мужчина действительно не беременен.
Ошибочно положительное предсказание (ошибка типа I, False Positive, FN)
Вы предсказали положительный результат (мужчина беременен), но на самом деле это не так.
Ошибочно отрицательное предсказание (ошибка типа II, False Negative, FN)
Вы предсказали, что женщина не беременна, но на самом деле она беременна.
Давайте разберемся в матрице ошибок с помощью арифметики.
Пример. Мы располагаем датасетом пациентов, у которых диагностируют рак. Зная верный диагноз (столбец целевой переменной «Y на самом деле»), хотим усовершенствовать диагностику с помощью модели Машинного обучения. Модель получила тренировочные данные, и на тестовой части, состоящей из 7 записей (в реальных задачах, конечно, больше) и изображенной ниже, мы оцениваем, насколько хорошо прошло обучение.
Модель сделала свои предсказания для каждого пациента и записала вероятности от 0 до 1 в столбец «Предсказанный Y». Мы округляем эти числа, приводя их к нулю или единице, с помощью порога, равного 0,6 (ниже этого значения – ноль, пациент здоров). Результаты округления попадают в столбец «Предсказанная вероятность»: например, для первой записи модель указала 0,5, что соответствует нулю. В последнем столбце мы анализируем, угадала ли модель.
Теперь, используя простейшие формулы, мы рассчитаем Отзыв (Recall), точность результата измерений (Precision), точность измерений (Accuracy), и наконец поймем разницу между этими метриками.
Отзыв
Из всех положительных значений, которые мы предсказали правильно, сколько на самом деле положительных? Подсчитаем, сколько единиц в столбце «Y на самом деле» (4), это и есть сумма TP + FN. Теперь определим с помощью «Предсказанной вероятности», сколько из них диагностировано верно (2), это и будет TP.
$$Отзыв = frac{TP}{TP + FN} = frac{2}{2 + 2} = frac{1}{2}$$
Точность результата измерений (Precision)
В этом уравнении из неизвестных только FP. Ошибочно диагностированных как больных здесь только одна запись.
$$Точностьspaceрезультатаspaceизмерений = frac{TP}{TP + FP} = frac{2}{2 + 1} = frac{2}{3}$$
Точность измерений (Accuracy)
Последнее значение, которое предстоит экстраполировать из таблицы – TN. Правильно диагностированных моделью здоровых людей здесь 2.
$$Точностьspaceизмерений = frac{TP + TN}{Всегоspaceзначений} = frac{2 + 2}{7} = frac{4}{7}$$
F-мера точности теста
Эти метрики полезны, когда помогают вычислить F-меру – конечный показатель эффективности модели.
$$F-мера = frac{2 * Отзыв * Точностьspaceизмерений}{Отзыв + Точностьspaceизмерений} = frac{2 * frac{1}{2} * frac{2}{3}}{frac{1}{2} + frac{2}{3}} = 0,56$$
Наша скромная модель угадывает лишь 56% процентов диагнозов, и такой результат, как правило, считают промежуточным и работают над улучшением точности модели.
SkLearn
С помощью замечательной библиотеки Scikit-learn мы можем мгновенно определить множество метрик, и матрица ошибок – не исключение.
from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
Выводом будет ряд, состоящий из трех списков:
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
Значения диагонали сверху вниз слева направо [2, 0, 2] – это число верно предсказанных значений.
Фото: @opeleye
Confusion Matrix Guide
How many times your read about confusion matrix, and after a while forgot about the ture positive, false negative … etc, Even you implemented confusion matrix with sklearn or tensorflow, Still we get confusion about the each componets of the matrix.
But after reading this article, you will never forget confusion matrix any more.
So forgot everything you learned so far, and start fresh now.
The machine learning model building journey end goal is not about building the model. The real journey will begin when we start measuring the performance of the model we built.
We are having numerous ways to quantify the performance of the model. The confusion matrix is one such case, This model performance metrics is popularly used for quantifying the performance of the classification models which fall under the supervised learning algorithms.
Before we drive further let me explain what you are about to learn in this article.
Why we need a confusion matrix
Before we learn about the confusion matrix, Let’s understand what is the need of using the confusion matrix as performance metrics for the classification models.
Accuracy is the popular model evaluation method used for the majority of the classification models in supervised learning algorithms. This makes us to think about the below question.
When we are having accuracy as a measure for knowing the performance of the classification models then why we need another measure to quantify the performance of the model?
Let’s understand this with an example.
Suppose if we are building a binary classification model for imbalanced target class data.
Let me give you an example of an imbalanced dataset. In the target class imbalance dataset, the target classes are not properly balanced.
What is target class imbalance
Let’s say we are having two expected classes for the target variable.
Which are
- Positive
- Negative
95% percentage we are getting a positive class and only 5% percentage we’re getting the negative class. Here positive class is dominating the negative class, this kind of in balance of the target class within the target classes is called imbalance.
Examples of the imbalanced dataset
Below are some of the examples with the imbalance dataset.
- Credit card fraud detections datasets.
- Default customers prediction datasets
In the above examples the target classes distribution, will not be in equally distributed. In general target class imbalance means the target classes are not equally distributed, one class will be dominating the other classes.
Let’s go back to our question of what is the need for confusion matrix when we have the accuracy to calculate the performance of the classification model?
In the above example, we have seen the positive class is coming 95% percent cases whereas the negative class is coming only 5% of the time.
Without building any machine learning model if we predict all the target classes as positive. What do you think our model accuracy could be?
Yes, you are correct our model accuracy is 95%
But do you think this is the correct way of quantifying the performance of the model?
A big No!
So when we are dealing with target class imbalance datasets, accuracy is not the best performance measure technique.
In such cases, we will use the confusion matrix to measure the efficiency of the classification model.
In the next section of this article, we will learn more about the confusion matrix representation.
What is confusion matrix?
To understand the confusion matrix in the much deeper level we are considering the below example dataset.
|
S no. |
Actual Target |
Predicted Target |
|---|---|---|
|
1 |
Positive |
Positive |
|
2 |
Positive |
Negative |
|
3 |
Positive |
Positive |
|
4 |
Positive |
Negative |
|
5 |
Positive |
Positive |
|
6 |
Negative |
Negative |
|
7 |
Negative |
Positive |
|
8 |
Positive |
Positive |
|
9 |
Positive |
Positive |
|
10 |
Positive |
Positive |
The above table contains the actual target class and the predicted class information.
If we calcualte the accuracy of this data it will 70%, as the predicted target column’s values are matching 7 times in an overall 10 cases in actual targets.
Accuracy is not able to explain the below question.
- How many actual positive targets are predicted as positive?
- How many actual positive targets are predicted as negative?
- How many actual negative targets are predicted as positive?
- How many actual negative targets are predicted as negative?
We can answer all these questions with a confusion matrix, below is the pictorial representation of answer all the above questions.
A confusion matrix is a matrix representation of showing how well the trained model predicting each target class with respect to the counts. Confusion Matrix mainly used for the classification algorithms which fall under supervised learning.
Below are the descriptions for the terms used in the confusion matrix
- Ture positive: Target is positive and the model predicted it as positive
- False negative: Target is positive and the model predicted it as negative
- False positive: Target is negative and the model predicted it as positive
- True negative: Target is negative and the model predicted it as negative
Using the above positive and negative targets information table, we will populate the matrix which gives a much more clear understanding of how the confusion matrix constructed.
Confusion matrix representation for a binary classification problem
The above image is representing the confusion matrix for the binary classification problem, each cell values of the matrixs, are calculated for the example dataset we showed before.
- TP: Out of 8 actual positive cases, in 6 cases the model predicted positive.
- FN: (8 — 6), the remaining 2 cases will fall into the true negative cases.
- FP: We are having 2 negative cases and 1 we predicted as positive.
- TN: Out of 2 negative cases, the model predicted 1 negative case correctly.
So these cell values of the confusion matrix are addressed the above questions we have.
By seeing the matrix representation, we can understand where the model is much more accurate and we can clearly know where the model not able to predict properly. This helps in tuning the right model parameters to reduce the false positive and false negative.
By now we are having clear understanding about each component of the confusion but still TP, TN, FP, FN is hard to remember, we know the concepts but these terms are really a bit confusing.
So Let’s understand how we can remember these terms forever.
In the above image we spilt the each term into two characters, the second
character represents what the model predicting, in our case, is the model predicting postive class or negative class. The first character represents, is the model prediction is correct or not.
Let’s undersand with an example.
Suppose TN means, the second character is N means model predicted negative class, the first character T means model predicted correctly.
Hope this gives clear picture about these individual components about the matrix.
What is the ideal model?
The ideal machine learning model which will always predict the correct target values. If we are using accuracy as a measure to quantify the performance of the model. The ideal model should get 100% accuracy.
In the same way, to say a model is ideal with confusion matrix performance metrics, it should have zero cases in false positive and false negative, which are called as type 1 and type 2 errors.
What are Type 1 and Type 2 errors?
Below are the two error types we can represent with confusion matrix.
- Type 01 (False positive)
- Type 02 (False negative)
The above image clearly explaining the difference between Type 1 and type 2 errors. Below are the key difference between type 1 and type 2 errors.
Difference between Type 1 and Type 2 errors
Type 1 Error
- False positive from confusion matrix.
- Predicting negative as positive.
Type 2 Error
- False negative from confusion matrix.
- Predicting positive as negative.
By now we clearly understood how the confusion matrix can build and aware of the components of the confusion matrix.
Now we will learn how to implement the confusion matrix in different ways. Before that below is the full representation of the learnings we got in one picture.
Implementing confusion matrix in python
We are going to implement confusion matrix in two different ways.
- Confusion matrix with Sklearn
- Confusion matrix with Tensorflow
Confusion matrix implementation with sklearn
The scikit learn confusion matrix representation will be a bit different, as scikit learn considers the actual target classes as columns and the predicted classes as rows, because of this scikit learn confusion matrix output look different.
Confusion matrix implementation with Tensorflow
How to plot the confusion matrix
Using the below code, we can easily plot the confusion matrix, we are using seaborn heat map to visuvalize the confusion matrix in more representive way.
If we run the above code we will get the below kind of graph, the below graph is the confusion matrix created for the email spam classification model.
Confusion matrix
By now we know the different components of the confusion matrix, using these components we can derive multiple model performance metrics to quantify the performance of the trained model. We will cover that in another article.
Below are the different measures we can calculate using the confusion matrix.
- Accuracy
- Misclassification rate
- True positive rate or Recall or Sensitivity
- False positive rate
- True negative rate or Specificity
- Precision
- F 1 score
We will learn about these measures in the upcoming article.
Complete Code
Below is the code for implementing confusion matrix in sklearn and tensorflow along with visuvalization code.
You can also clone this code in our Github.
Conclusion
In this article we learned what is the need for confusion matrix, different components of the confusion matrix, how to implement them with sklearn and TensorFlow and we also have seen the code to visualize it.
Recommended Courses
Machine Learning A to Z Course
AWS Machine Learning Full Guide
Data Science Full Specialization
Logistic regression is a type of regression we can use when the response variable is binary.
One common way to evaluate the quality of a logistic regression model is to create a confusion matrix, which is a 2×2 table that shows the predicted values from the model vs. the actual values from the test dataset.
To create a confusion matrix for a logistic regression model in Python, we can use the confusion_matrix() function from the sklearn package:
from sklearn import metrics metrics.confusion_matrix(y_actual, y_predicted)
The following example shows how to use this function to create a confusion matrix for a logistic regression model in Python.
Example: Creating a Confusion Matrix in Python
Suppose we have the following two arrays that contain the actual values for a response variable along with the predicted values by a logistic regression model:
#define array of actual values y_actual = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] #define array of predicted values y_predicted = [0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
We can use the confusion_matrix() function from sklearn to create a confusion matrix for this data:
from sklearn import metrics #create confusion matrix c_matrix = metrics.confusion_matrix(y_actual, y_predicted) #print confusion matrix print(c_matrix) [[6 4] [2 8]]
If we’d like, we can use the crosstab() function from pandas to make a more visually appealing confusion matrix:
import pandas as pd y_actual = pd.Series(y_actual, name='Actual') y_predicted = pd.Series(y_predicted, name='Predicted') #create confusion matrix print(pd.crosstab(y_actual, y_predicted)) Predicted 0 1 Actual 0 6 4 1 2 8
The columns show the predicted values for the response variable and the rows show the actual values.
We can also calculate the accuracy, precision, and recall using functions from the sklearn package:
#print accuracy of model print(metrics.accuracy_score(y_actual, y_predicted)) 0.7 #print precision value of model print(metrics.precision_score(y_actual, y_predicted)) 0.667 #print recall value of model print(metrics.recall_score(y_actual, y_predicted)) 0.8
Here is a quick refresher on accuracy, precision, and recall:
- Accuracy: Percentage of correct predictions
- Precision: Correct positive predictions relative to total positive predictions
- Recall: Correct positive predictions relative to total actual positives
And here is how each of these metrics was actually calculated in our example:
- Accuracy: (6+8) / (6+4+2+8) = 0.7
- Precision: 8 / (8+4) = 0.667
- Recall: 8 / (2+8) = 0.8
Additional Resources
Introduction to Logistic Regression
The 3 Types of Logistic Regression
Logistic Regression vs. Linear Regression















