Advertisement

Data Wrangling

Last updated: November 1, 2021

What Does Data Wrangling Mean?

Data wrangling is a process that data scientists and data engineers use to locate new data sources and convert the acquired information from its raw data format to one that is compatible with automated and semi-automated analytics tools.

Data wrangling, which is sometimes referred to as data munging, is arguably the most time-consuming and tedious aspect of data analytics.

The exact tasks required in data wrangling depend on what transformations the analyst requires to make a dataset useable. The basic steps involved in data wranging include:

Discovery -- learn what information is contained in a data source and decide if the information has value.

Structuring -- standardize the data format for disparate types of data so it can be used for downstream processes.

Cleaning -- remove incomplete and redundant data that could skew analysis.

Enriching -- decide if you have enough data or need to seek out additional internal and/or 3rd-party sources.

Validating -- conduct tests to expose data quality and consistency issues.

Publishing -- make wrangled data available to stakeholders in downstream projects.

In the past, wrangling required the analyst to have a strong background in scripting languages such as Python or R. Today, an increasing number of data wrangling tools use machine learning (ML) algorithms to carry out wrangling tasks with very little human intervention.

Advertisement

Techopedia Explains Data Wrangling

It may sound like an informal term, like cowbody coding, but data wrangling actually occupies a particular space in data management.

One helpful way to understand data wrangling is to contrast it with the often more formal extract, transform and load (ETL) methodology. Data wrangling has different aspects and use cases than ETL. It is often done by skilled data scientists or others close to the pipeline. In some ways, data wrangling could be called a type of "open source" ETL in that those engineers dealing with the data may be more "hands-on" or use more manual methods of extraction.

For those who really understand the refined processes by which diverse data gets culled, sorted and fed into enterprise architectures, data wrangling is actually a very important topic. IT professionals look at a vast array of tools, resources and techniques to bring value from messy, raw or unstructured data.

Advertisement

Share this Term

  • Facebook
  • LinkedIn
  • Twitter

Related Reading

Tags

Data ManagementInformation AssuranceAnalytics/BI

Trending Articles

Go back to top