Understand Dataset Grain and When to Use a Lookup

Learning Objectives

After completing this unit, you’ll be able to:

  • Preserve and validate the definition of a dataset grain.
  • Determine when a lookup or join is needed for the job.

As we learned from the previous units, it’s important to have a base dataset that preserves the grain for your dataset analysis that includes, queries, charts, measures such as count of rows, and so on. You want to ensure that there are no duplicate rows or other misleading metrics. The creation of queries that feed dashboard widgets or standalone lenses should be simple and not require additional summarizations, such as “group by’s” or average across repeating rows. 

Why Preserving the Grain Matters

While trying out the lookup between Opportunity and Account, you observe that Jose and Candace are able to easily get the metrics they need for the business on their own without complex queries. Also, as Jose and Candace view a dashboard widget that contains a sum of amount by account, they can understand its measures and dimensions, and change them as needed. Jose and Candace trust the sum of amount metric, as shown in Illustration 6, as it appears to be correct.

Illustration 6: A sound Sum of Opportunity Amount lens.Illustration 6: A sound Sum of Opportunity Amount lens.

Now let’s have Candace try the left join dataset created with Opportunity and Cases, as shown in Illustration 7.

 

Illustration 7: Left join between Opportunity and Cases using the AccountId field.

Illustration 7: Left join between Opportunity and Cases using the AccountId field.

This dataset contains all Opportunity rows, and some rows repeat because of multiple matches to rows in the Cases dataset. If Sales were to use this dataset as their primary Opportunity dataset, a basic sum of amount by account won't be so simple because of repeating rows. Remember, there are repeating rows because of the multiple matches to the Cases dataset. For example, when Candace does her query she’ll have to incorporate some adjustments, such as using Average of Amount as the measure then a “group by” using Opportunity Name to overcome the repeating rows as shown in Illustration 8.

Illustration 8: Complicated calculations for the left join case.Illustration 8: Complicated calculations for the left join case.

 

Another option for Candace’s query is to create a compare table formula that starts by grouping the records by AccountId, then creating a calculation that divides the Sum of Amount measure by the Unique of [Opportunity] Name measure as shown in Illustration 9. A person would need a deep understanding of the dataset to know why this calculation is necessary.

 

Illustration 9: Another complicated calculation for the left join case.Illustration 9: Another complicated calculation for the left join case.

 

Note

To make the lookup and join scenarios easier to understand they are using Names instead of Ids. For real life instances, it is recommended to use unique fields such as Opportunity Id, Account Id, Case Id, and so on for lookups and joins.

Carry Out the Best Solution for a Business Need

As a dataset builder your objective is to enable business users and dashboard designers with the data to do their work. A recommended strategy for all interested parties is to use a lookup and create base datasets, then implement Connect Data Sources and Linked Dashboards to have multiple datasets on a dashboard.  

For example, an app that contains several datasets and associated dashboards provides source content for a new dashboard. A dashboard designer can configure Connect Data Sources to combine the app’s datasets on the new dashboard, then leverage existing dashboard content by linking them to the new dashboard. You could also use data blending or more advanced features, such as SAQL queries. When using a lookup to create datasets that will be used in Einstein Discovery stories, the dataset should have a unique, non-repeating grain. Grain level of the dataset is significant and has to be validated.

Consider creating joined datasets for the following scenarios.

  1. Use different join scenarios to address specific business queries. For example, set up a full outer join to have all opportunities and cases on the same dashboard. The business case should determine the join scenario then verify the outcome is accurate or expected.
  2. If end users are prohibited from exploring dashboard lenses, join base datasets into a new dataset to enable users to have their own data for building specific explorations and interactions.
  3. Leverage one joined dataset when dashboards that contain large multiple datasets are lagging in performance. Users get the advantage of querying a single dataset instead of multiple datasets associated with the dashboard.

Summary

With Tableau CRM, you can set up lookup and join tasks to create and merge datasets. It’s important to understand the business case and weigh the pros and cons of each approach since they’re used for specific use cases. Lookup-based datasets are reliable for easy exploration because they maintain unique rows. Joins offer flexible options to create a dataset for a particular requirement. Enhance lookups and joins with other data tasks such as aggregation or multi-value lookups to achieve powerful and enriched datasets.

Resources 

Keep learning for
free!
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities