Don't miss an insight. Subscribe to Techopedia for free.


Why is data annotation important in some machine learning projects?

By Justin Stoltzfus | Last updated: September 10, 2018

Data annotation is important in machine learning because in many cases, it makes the work of the machine learning program much easier.

This has to do with the difference between supervised and unsupervised machine learning. With supervised machine learning, the training data is already labeled so the machine can understand more about the desired results. For example, if the purpose of the program is to identify cats in images, the system already has a large number of photos tagged as cat or not. It then uses those examples to contrast new data to make its results.

Free Download: Machine Learning and Why It Matters

With unsupervised machine learning, there are no labels, and so the system has to use attributes and other techniques to identify the cats. Engineers can train the program on recognizing visual features of cats like whiskers or tails, but the process is hardly ever as straightforward as it would be in supervised machine learning where those labels play a very important role.

Data annotation is the process of affixing labels to the training data sets. These can be applied in many different ways – above we talked about binary data annotation – cats or not cats – but other kinds of data annotation are important as well. For example, in the medical field, data annotation may involve tagging specific biological images with tags identifying pathology or disease markers for other medical properties.

Data annotation takes work – and is often done by teams of people – but it is a fundamental part of what makes many machine learning projects function accurately. It provides that initial setup for teaching a program what it needs to learn and how to discriminate against various inputs to come up with accurate outputs.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter


Emerging Technology Machine Learning Data Science

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Canary Test

A canary test, also known as a canary deployment or canary release, is a form of A/B testing used in Agile software...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Go back to top