Don't miss an insight. Subscribe to Techopedia for free.


Why is a confusion matrix useful in machine learning?

By Justin Stoltzfus | Last updated: August 30, 2019

There are a number of ways to talk about why a confusion matrix is valuable in machine learning (ML) – but one of the simplest ways is to explain that the confusion matrix is a data visualization resource.

A confusion matrix allows viewers to see at a glance the results of using a classifier or other algorithm. By using a simple table to show analytical results, the confusion matrix essentially boils down your outputs into a more digestible view.

The confusion matrix uses specific terminology to arrange results. There are true positives and true negatives, as well as false positives and false negatives. For a more complicated confusion matrix or one based on comparison classification, these values might be shown as being actual and predicted classes for two distinct objects.

Regardless of the semantic terminology, the results are grouped into a square (or rectangular) table.

This view makes it easier for analysts to see how in accurate an algorithm was in classifying results. (Read New Generators Put Modern Algorithms to Work on ASCII Art.)

The utility of the confusion matrix has to do with the complexity of ML projects, and also with the way that information is formatted and delivered to users. Imagine a string of linear results including false positives, false negatives, true positives and true negatives. (Read Machine Learning 101.)

A user would have to tabulate all of those linear results into a graph to understand how the algorithm worked, and how accurate it was. With the confusion matrix, this information is simply presented in a powerful visual model.

For example, suppose the machine is asked to classify 20 images, of which five are fruits and five are vegetables. If a confusion matrix holds the following contents (from top left clockwise): 7, 5, 3, 5, then the matrix is showing that seven were correctly identified as vegetables, while three were correctly classified as fruits.

The other 10, as represented, are results where the program failed to correctly identify the image.

The confusion matrix will be useful in all sorts of ML analytics. By observing this resource, users can figure out how to handle problems like dimensionality and overfitting, and other ways to optimize an algorithm.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter


Data Management Computer Science Artificial Intelligence Emerging Technology Identity & Access Governance Machine Learning Data Science

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Canary Test

A canary test, also known as a canary deployment or canary release, is a form of A/B testing used in Agile software...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Go back to top