Understand Dataset Grain and When to Use a Lookup
After completing this unit, you’ll be able to:
- Preserve and validate the definition of a dataset grain.
- Determine when a lookup or join is needed for the job.
As we learned from the previous units, it’s important to have a base dataset that preserves the grain for your dataset analysis that includes, queries, charts, measures such as count of rows, and so on. You want to ensure that there are no duplicate rows or other misleading metrics. The creation of queries that feed dashboard widgets or standalone lenses should be simple and not require additional summarizations, such as “group by’s” or average across repeating rows.
Why Preserving the Grain Matters
While trying out the lookup between Opportunity and Account, you observe that Jose and Candace are able to easily get the metrics they need for the business on their own without complex queries. Also, as Jose and Candace view a dashboard widget that contains a sum of amount by account, they can understand its measures and dimensions, and change them as needed. Jose and Candace trust the sum of amount metric, as shown in Illustration 6, as it appears to be correct.
Now let’s have Candace try the left join dataset created with Opportunity and Cases, as shown in Illustration 7.
This dataset contains all Opportunity rows, and some rows repeat because of multiple matches to rows in the Cases dataset. If Sales were to use this dataset as their primary Opportunity dataset, a basic sum of amount by account won't be so simple because of repeating rows. Remember, there are repeating rows because of the multiple matches to the Cases dataset. For example, when Candace does her query she’ll have to incorporate some adjustments, such as using Average of Amount as the measure then a “group by” using Opportunity Name to overcome the repeating rows as shown in Illustration 8.
Another option for Candace’s query is to create a compare table formula that starts by grouping the records by AccountId, then creating a calculation that divides the Sum of Amount measure by the Unique of [Opportunity] Name measure as shown in Illustration 9. A person would need a deep understanding of the dataset to know why this calculation is necessary.
Carry Out the Best Solution for a Business Need
As a dataset builder your objective is to enable business users and dashboard designers with the data to do their work. A recommended strategy for all interested parties is to use a lookup and create base datasets, then implement Connect Data Sources and Linked Dashboards to have multiple datasets on a dashboard.
For example, an app that contains several datasets and associated dashboards provides source content for a new dashboard. A dashboard designer can configure Connect Data Sources to combine the app’s datasets on the new dashboard, then leverage existing dashboard content by linking them to the new dashboard. You could also use data blending or more advanced features, such as SAQL queries. When using a lookup to create datasets that will be used in Einstein Discovery stories, the dataset should have a unique, non-repeating grain. Grain level of the dataset is significant and has to be validated.
Consider creating joined datasets for the following scenarios.
- Use different join scenarios to address specific business queries. For example, set up a full outer join to have all opportunities and cases on the same dashboard. The business case should determine the join scenario then verify the outcome is accurate or expected.
- If end users are prohibited from exploring dashboard lenses, join base datasets into a new dataset to enable users to have their own data for building specific explorations and interactions.
- Leverage one joined dataset when dashboards that contain large multiple datasets are lagging in performance. Users get the advantage of querying a single dataset instead of multiple datasets associated with the dashboard.
With Tableau CRM, you can set up lookup and join tasks to create and merge datasets. It’s important to understand the business case and weigh the pros and cons of each approach since they’re used for specific use cases. Lookup-based datasets are reliable for easy exploration because they maintain unique rows. Joins offer flexible options to create a dataset for a particular requirement. Enhance lookups and joins with other data tasks such as aggregation or multi-value lookups to achieve powerful and enriched datasets.