Skip to main content

Restructure Data

Learning Objectives

After completing this unit, you’ll be able to:

  • Identify data restructuring options.
  • Describe the purpose of pivoting data.
  • Describe the purpose of splitting data.

Introduction

You learned how to recognize “good” data by identifying the traits of high-quality, meaningful data, and discovered how well-structured data is organized. But what should you do if your data is not well-structured? In this unit, you explore some options for restructuring your data, such as using pivots and splitting variables (fields).

Options for Restructuring Data

Options you can use to restructure data include: 

  • Changing the underlying database
  • Using a programming language, such as R or Python
  • Using tools, such as pivoting and splitting data, within the Tableau Platform, including Tableau Prep Builder or Tableau Desktop
  • Using other ETL (Extract, Transform, Load) tools

In some cases, you customize metadata that describes the data, instead of modifying the actual data. Metadata may include information such as calculations, renamed fields, and default formatting. Metadata may also include what changes have been made to the data structure.

Review: Well-Structured Data

Think back to what you learned earlier in this module about how well-structured data is organized.

  • Each variable is in one column, with a column header.
  • Each different observation of that variable is in a different row.

Previously, we looked at the following simple table. This data is well-structured. The variables are Vendor, City, and State, and each one exists in its own column with a column header. Each row lists the observations (or values) for the variables (or fields), in this case, the vendor name and location by city and state.

Vendor
City
State

Polly’s Lollipops

Preston

WA

Lucy’s Lollies

Lansing

MI

Carlo Callazo’s Candy

Cambridge

MA

Ming’s Minty Meringues

Madison

WI

If your data isn't well-structured, you need to complete some data preparation tasks in order for the data set to be useful for analysis.

Some common tasks to turn poorly structured data into well-structured data include:

  • Pivoting column into rows or vice versa
  • Splitting fields

What Is a Pivot?

Simply stated, a pivot changes columns into rows, and in some cases, vice versa.

Use a pivot on “wide” data sets when the information is captured with many columns, and the columns contain similar information. This structure is possibly more convenient for a user report, but not as useful for analysis. For example, Tableau Desktop works best analyzing data in a “tall/narrow” table structure.

Consider this table that lists parking fees paid by employees. There is a row for each employee with a field (column) for each date (2/5/2020, 2/6/2020, and so on).

Employee
2/5/2020
2/6/2020
2/7/2020
2/8/2020
2/9/2020

Christine

10

10

10

10

10

Tristan

10





Lily

10




10

Jamal

10


10



If we bring this data structure into some tools, such as Tableau Desktop, we get a field for every column. There are five fields that all represent the parking fees paid for that day. This makes it very hard to do analysis across time as the data is stored in separate fields. The fields (variables) Employee, Date, and Parking Fee should all exist in their own columns, with column headers. Each row should list the values (observations of the variables)—in this case, the employee's name, the date, and the parking fee paid.

After a pivot, the desired data structure is achieved, as shown in the following table. You can now analyze this data by looking at trends over time because all the date values are in one column.

Employee
Date
Parking Fee

Christine

2/5/2020

10

Christine

2/6/2020

10

Christine

2/7/2020

10

Christine

2/8/2020

10

Christine

2/9/2020

10

Tristan

2/5/2020

10

Lily

2/5/2020

10

Lily

2/9/2020

10

Jamal

2/5/2020

10

Jamal

2/7/2020

10

What Is a Split?

Simply stated, a split separates a column that contains multiple pieces of information into multiple columns, one for each piece of information.

A split separates string (text) field values based on a delimiter (a character such as a comma, a colon, or a hyphen that is located between distinct pieces of information within a field value). Splits are useful for when field components have meaning that can be used to analyze your data.

In the following example, the values in the Airline field contain the airline name as well as a two-letter airline code. The colon between these two types of information is the delimiter.

Airline

American Airlines: AA

Delta Airlines: DL

JetBlue Airways: B6

United Airlines: UA

After the split, the airline names and codes are in separate columns in the table. Now it is easier to analyze this data by airline codes.

Airline
Airline Code

American Airlines

AA

Delta Airlines

DL

JetBlue Airways

B6

United Airlines

UA

You now have an understanding of data characteristics and organization, and some options to restructure data, including pivoting and splitting fields.

Use the following table to answer the first quiz question.

City State Q1 Q2 Q3 Q4

San Francisco

CA

$25,465

$15,389

$19,268

$28,491

New York

NY

$42,543

$39,642

$41,687

$44,594

Seattle

WA

$38,756

$32,174

$33,452

$39,892

Keep learning for
free!
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities