Skip to main content
Join Trailblazers for Dreamforce 2024 in San Francisco or on Salesforce+ from September 17-19. Register now

Question the Data

Learning Objectives

After completing this unit, you’ll be able to:

  • Investigate the source of data and charts.
  • Evaluate the trustworthiness of data and analyses.
Note

Some of the images listed in this unit were used with permission from Alberto Cairo and taken from his book, How Charts Lie: Getting Smarter about Visual Information.

Get to Know the Source

“Distrust any publication that doesn’t clearly mention or link to the sources of the stories they publish.” –Alberto Cairo, data visualization expert and author of How Charts Lie

Some of the images listed in this unit were used with permission from Alberto Cairo and taken from his book How Charts Lie: Getting Smarter about Visual Information.

From business reports to social media posts, it’s important to identify and evaluate the source of the data and charts that you encounter. Knowing the source is the first (and most important!) step in determining if a chart is misleading. Investigate the source by asking questions to get to the heart of who, what, where, when, and why the chart was created.

Topic to Investigate Purpose Questions to Ask

Who

To verify the source of the data and whether there are vested interests in producing the information or presenting it in a certain way.

  • Who is responsible for the data, analyses, and resulting presentations?
  • Who collected and analyzed or created the data?
  • Was the author paid to collect, analyze, and present the data? If so, are there any potential conflicts of interest?
  • Is the author or publisher of the chart an authority on the subject?
  • Was a particular organization or multiple organizations involved in the work?

What

To understand the type of data collected and how it was measured so you can verify the accuracy of the data presented.

  • What methods were used in collecting and preparing the data?
  • What types of data does the author present?
  • What types of measurements were taken and how accurate are those measurements?
  • What was the sample size of the data?
  • How many people were surveyed in a poll and what was the demographic breakdown of those surveyed?

Where

To evaluate the data collection methods used and further assess the source’s credibility and context.

  • Was the data collected from a particular region, state, or country?
  • Was the data collected in different locations?
  • Is the data available on a credible source’s website?

When 

To determine the relevancy of the data used based on when it was collected.

  • Is the data current?
  • Does the data track back in time? If so, how long?
  • Was the data collected yearly, monthly, daily, or even by the minute or second?

Why

To uncover any potential financial, political, scientific, philanthropic, or other motivations that might influence the objectivity of the data presented.

  • Did the authors have a particular motive behind gathering the data and doing the analysis?
  • Was a company paying a researcher to test a new product?
  • Was it a government study sponsored by the community?

To help you question the data, be sure to study the metadata, which is the data about the data. Read any accompanying documentation to gather useful insights from the metadata, such as when, how, and why the data was collected.

Note

For more tips on how to effectively evaluate the credibility of your sources, check out the University of Washington’s Savvy Info Consumers Research Guide.

Review the Numbers

Are you familiar with the saying, “garbage in, garbage out”? The concept behind this common phrase is that the quality of any output is determined by the quality of the input. This idea applies to a wide variety of things, including data consumption. When you look at visualizations like charts that are based on inaccurate data (garbage in), you’re at risk of forming inaccurate conclusions and taking ineffective actions (garbage out). To help determine the quality and accuracy of the data presented in visualizations, review as much of the underlying numbers that you can.

Check the Summary Values

Charts are often accompanied by summary values that describe and summarize the underlying data. Summary values, also known as summary statistics, communicate high-level information about the data, often revealing meaningful insights worth consideration and further investigation. The table below will help you become familiar with common summary values.

Summary Value Description

Sum

Total of the values 

Count

The number of values

Mean

Average of the numbers, calculated by dividing the total of the numbers (sum) by the number of values in the data set (count)

Median

The middle value in a data set in which the values have been placed in order of magnitude (smallest to largest)

Mode

The most frequent value

Range

The difference between the min and max

Min

The smallest value

Max

The largest value

Note

These are only a few common summary values that you may encounter, but there are many more. To learn more, check out the Data Distributions and Aggregation and Granularity modules.

Check Rates and Percentages

Rates and percentages measure a relationship between values. For example, you may observe three out of every five attendees of a presentation sign up for a newsletter. You say ⅗ or 0.60 or 60% of attendees signed up for the newsletter. As you see in the next section, knowing the rate or percentage will often help you make better comparisons. 

For example, an article headline said, “2,139 DACA recipients convicted or accused of crimes.” This could seem like a large number without context. However, if you view the rate or percentage, you see a new view. The same article shared that there are more than 800,000 DACA recipients. If you consider the rate or percentage, you get 2,139/800,000 or 0.003 or 0.3%! The percentage is a more informative value, and is not a large percentage of total recipients.

Watch for Insufficient Data

“No chart can ever capture reality in all its richness. However, a chart can be made worse or better depending on its ability to strike a balance between oversimplifying that reality and obscuring it with too much detail.” –Alberto Cairo, How Charts Lie

When reading charts, ask yourself whether sufficient data is included to provide a meaningful and accurate version of the story. In his book How Charts Lie, Albert Cairo explains how insufficient data can lead to drawing incorrect conclusions. To prove this point, Cairo refers to an article that said that the 2017 Tax Cuts and Jobs Act (TCJA) promised that the typical family of four would save $1,182 annually and he explains how that number wasn’t representing the whole story. To show a more complete story, Cairo described a histogram, a distribution of quantitative data that shows the data grouped into ranges. 

Histogram showing the distribution of household family income in the United States in 2017.

Let’s dig into why this number is an oversimplification by looking at the histogram. As you look at the data in the histogram, notice that there's a significant range of salaries represented. Due to this wide range, it'd be very challenging to determine a "typical" salary. There’s clearly a significant range of salaries in the US. While 3% of households made less than $5,000 per year, more than 9% earned more than $200,000. Discussing the average savings from TCJA is almost meaningless when it’s nearly impossible to compute an accurate “typical salary” in the first place.

Note

To learn more about distributions, complete the Data Distributions module.

In the next unit, learn the importance of using evenly sized intervals on axes. 

Resources