Restructure Data
Learning Objectives
After completing this unit, you’ll be able to:
- Identify data restructuring options.
- Describe the purpose of pivoting data.
- Describe the purpose of splitting data.
Introduction
You learned how to recognize “good” data by identifying the traits of high-quality, meaningful data, and discovered how well-structured data is organized. But what should you do if your data is not well-structured? In this unit, you explore some options for restructuring your data, such as using pivots and splitting variables (fields).
Options for Restructuring Data
Options you can use to restructure data include:
- Changing the underlying database
- Using a programming language, such as R or Python
- Using tools, such as pivoting and splitting data, within the Tableau Platform, including Tableau Prep Builder or Tableau Desktop
- Using other ETL (Extract, Transform, Load) tools
In some cases, you customize metadata that describes the data, instead of modifying the actual data. Metadata may include information such as calculations, renamed fields, and default formatting. Metadata may also include what changes have been made to the data structure.
Review: Well-Structured Data
Think back to what you learned earlier in this module about how well-structured data is organized.
- Each variable is in one column, with a column header.
- Each different observation of that variable is in a different row.
Previously, we looked at the following simple table. This data is well-structured. The variables are Vendor, City, and State, and each one exists in its own column with a column header. Each row lists the observations (or values) for the variables (or fields), in this case, the vendor name and location by city and state.
| Vendor | City | State | 
|---|---|---|
| Polly’s Lollipops | Preston | WA | 
| Lucy’s Lollies | Lansing | MI | 
| Carlo Callazo’s Candy | Cambridge | MA | 
| Ming’s Minty Meringues | Madison | WI | 
If your data isn't well-structured, you need to complete some data preparation tasks in order for the data set to be useful for analysis.
Some common tasks to turn poorly structured data into well-structured data include:
- Pivoting column into rows or vice versa
- Splitting fields
What Is a Pivot?
Simply stated, a pivot changes columns into rows, and in some cases, vice versa.
Use a pivot on “wide” data sets when the information is captured with many columns, and the columns contain similar information. This structure is possibly more convenient for a user report, but not as useful for analysis. For example, Tableau Desktop works best analyzing data in a “tall/narrow” table structure.
Consider this table that lists parking fees paid by employees. There is a row for each employee with a field (column) for each date (2/5/2020, 2/6/2020, and so on).
| Employee | 2/5/2020 | 2/6/2020 | 2/7/2020 | 2/8/2020 | 2/9/2020 | 
|---|---|---|---|---|---|
| Christine | 10 | 10 | 10 | 10 | 10 | 
| Tristan | 10 | ||||
| Lily | 10 | 10 | |||
| Jamal | 10 | 10 | 
If we bring this data structure into some tools, such as Tableau Desktop, we get a field for every column. There are five fields that all represent the parking fees paid for that day. This makes it very hard to do analysis across time as the data is stored in separate fields. The fields (variables) Employee, Date, and Parking Fee should all exist in their own columns, with column headers. Each row should list the values (observations of the variables)—in this case, the employee's name, the date, and the parking fee paid.
After a pivot, the desired data structure is achieved, as shown in the following table. You can now analyze this data by looking at trends over time because all the date values are in one column.
| Employee | Date | Parking Fee | 
|---|---|---|
| Christine | 2/5/2020 | 10 | 
| Christine | 2/6/2020 | 10 | 
| Christine | 2/7/2020 | 10 | 
| Christine | 2/8/2020 | 10 | 
| Christine | 2/9/2020 | 10 | 
| Tristan | 2/5/2020 | 10 | 
| Lily | 2/5/2020 | 10 | 
| Lily | 2/9/2020 | 10 | 
| Jamal | 2/5/2020 | 10 | 
| Jamal | 2/7/2020 | 10 | 
What Is a Split?
Simply stated, a split separates a column that contains multiple pieces of information into multiple columns, one for each piece of information.
A split separates string (text) field values based on a delimiter (a character such as a comma, a colon, or a hyphen that is located between distinct pieces of information within a field value). Splits are useful for when field components have meaning that can be used to analyze your data.
In the following example, the values in the Airline field contain the airline name as well as a two-letter airline code. The colon between these two types of information is the delimiter.
| Airline | 
|---|
| American Airlines: AA | 
| Delta Airlines: DL | 
| JetBlue Airways: B6 | 
| United Airlines: UA | 
After the split, the airline names and codes are in separate columns in the table. Now it is easier to analyze this data by airline codes.
| Airline | Airline Code | 
|---|---|
| American Airlines | AA | 
| Delta Airlines | DL | 
| JetBlue Airways | B6 | 
| United Airlines | UA | 
You now have an understanding of data characteristics and organization, and some options to restructure data, including pivoting and splitting fields.
Use the following table to answer the first quiz question.
| City | State | Q1 | Q2 | Q3 | Q4 | 
|---|---|---|---|---|---|
| San Francisco | CA | $25,465 | $15,389 | $19,268 | $28,491 | 
| New York | NY | $42,543 | $39,642 | $41,687 | $44,594 | 
| Seattle | WA | $38,756 | $32,174 | $33,452 | $39,892 |