Learning Objectives
After completing this unit, you’ll be able to:
- Explain data structure in the context of analysis.
- Aggregate: sum, average, count.
- Work with multiple pills for the same field.
Connect Your Tableau Public Account
To get started, connect to your Tableau Public account in the Playground window to the right. If you don’t already have a Tableau Public account, sign up for one now, and be sure to activate your account before starting this interactive unit. You can find more detailed instructions in The Tableau Data Model.
Analyze the Data
In the previous modules, you’ve done some data exploration, and now you’re ready to start analysis! Here’s a guiding question to start.
Question for Analysis: What Are the Trends in Ratings?
Let’s find out!
- Click the new sheet tab icon (next to the Validation sheet tab at the bottom).
- If necessary, expand the Episodes.csv table.
- Double-click the My Rating field. This adds the field to the viz and automatically creates a bar chart.
The Rows shelf now has a copy of that field (each instance of a field in the viz is known as a pill). The pill doesn't just say the field name, though. It says SUM(My Rating).
Why? Well, there has to be some way of aggregating lots of data records into a single number, and the default is to sum them all. (For more information about aggregation, see Aggregation and Granularity.)
In this case, you have the sum of ratings for the data set, 1040, as the height of the bar. (Hover over the bar to see a tooltip with the precise value.)
- Expand the Seasons.csv table in the Data pane if necessary, and drag Season to the Columns shelf.
Pause and think about what you have so far. It’s a bar chart with 14 bars, one for each season. Each bar shows the total rating for that season. As you start visualizing the data, it’s common for intermediate steps in the analysis to be messy or meaningless. Is that the case here? Let's ask some questions to see if this chart is something you can interpret.
Inspect the Data
Can you safely say that seasons 1 and 2 have the lowest ratings? Why or why not?
No! The ratings field is in the Episode table, not the Season table, which implies the ratings are probably per episode for a single season, not by season. But if you're not sure about the data structure, it's never a bad idea to inspect the data.
- Create a new sheet using the new sheet tab icon at the bottom.
- Expand the ChallengeBakes.csv table.
- Drag Season Episode to Rows.
- Expand the Episode.csv table.
- Drag My Rating to the Text shelf on the Marks card. Tip: It’s often referred to as a shelf even though it looks like a tile.
Yep, those are realistic values—remember from the previous module, the field was originally named MyRating (out of 10). It definitely looks like these are ratings per episode—each row has information about a single episode, including a rating, and the ratings change per row.
- Right-click on the tab that says Sheet 3 and delete it to get back to your earlier viz. Tip: As a best practice, create and delete new sheets as much as you need to explore tangential questions or test things. But only keep the sheets you need so your workbook doesn't get cluttered.
Ask More Questions and Remove Misleading Visuals
Summing the rating might be misleading if the number of episodes isn’t constant for each season. You'll check that now.
How can you verify the number of episodes per season?
Each table in the Data pane has a Count field. These aren’t present in the original data. They’re a special field added by Tableau (you can tell because it’s in italics) that counts the number of records in each table.
- Drag Episodes.csv (Count) to the Label shelf on the Marks card. Now you can see that season 1 only had six episodes, and season 2 had eight. The rest of the seasons had 10 episodes each.
If the number of episodes isn’t consistent, is SUM the best aggregation to use here? If not, what should we use?
To control for the different number of episodes—and to keep the ratings value in a useful scale out of 10—it’s better to average the individual episode ratings for the season, not sum them. This makes it a useful value to compare across all seasons.
- Right-click the SUM(My Rating) pill on the Rows shelf to open the context menu.
- Hover over Measure in the menu, then choose Average. The pill updates to say AVG(MyRating), and the axis in the view updates to a scale of 0–8.
Why does the label still say 6, 8, 10, 10…? Because you didn’t change the field on the label shelf, it’s still the count of the number of episodes.
- Drag the CNT(Episodes.csv) field off the Marks card. Tip: Or you can right-click and select Remove.
- Drag My Rating from the Data pane to Label on the Marks card. Make sure you drag out a new copy of the rating field from the Data pane, don’t move the pill from the Rows shelf.
The default is to aggregate by Sum, so this label is summed again. Change the aggregation to Average for this pill as well.
- Right-click the SUM(My Rating) pill on the Marks card and change Measure | Average.
Great, we can see that the values align with the axis. But those labels are a bit of visual clutter we don't need.
- Drag My Rating off the Marks card.
- When dragging a field off a shelf, drop it anywhere that's not outlined in orange as an active drop area. It doesn't need to go anywhere in particular. If you accidentally put it somewhere rather than removing it, click the Undo button in the toolbar and try again.
- When dragging a field off a shelf, drop it anywhere that's not outlined in orange as an active drop area. It doesn't need to go anywhere in particular. If you accidentally put it somewhere rather than removing it, click the Undo button in the toolbar and try again.
The labels should no longer be visible.
Updated Question for Analysis: What Are the Average Seasonal Trends in Ratings?
You started with the question "what are the trends in ratings?", but now we're looking at average trends and seasonal ratings. It might seem like a subtle change, but as you dive deeper into analysis, it’s important to go back and make sure you’re asking the right questions and are as precise as possible. This enables you to communicate your findings and explain your reasoning to your stakeholders. It also helps you better enable others to use the vizzes you’re creating.
In the spirit of communication and collaboration, it’s always a good idea to name our sheets—and with a meaningful name that ties in with what the analysis is about.
- Double-click the tab for Sheet 2 at the bottom of the screen.
- There are alternative ways to do many actions in Tableau. You can also right-click | Rename.
- There are alternative ways to do many actions in Tableau. You can also right-click | Rename.
- Rename the sheet
Avg Seasonal Ratings
, then click anywhere off the sheet tab to commit the change.