Skip to main content

Create Lookups with the Dataset Builder

Learning Objectives

After completing this unit, you’ll be able to:

  • Identify the dataset grain.
  • Create a dataset using a lookup.
  • Define the appropriate number of base-level datasets.

Welcome to Cloud Kicks

Cloud Kicks is a popular athletic and lifestyle brand for athletes, celebrities, and everyday people who want to stay in shape and look stylish.  

Welcome to Cloud Kicks!

Cloud Kicks’ primary goal is to maintain and expand the relationships it’s built. CRM Analytics helps Cloud Kicks with this. Jose Figueroa, the Cloud Kicks sales manager, appreciates CRM Analytics’s reporting and visualizations that enable him to stay on top of activities at each stage of the sales pipeline. 

Candace is Cloud Kicks’ top sales rep and uses CRM Analytics to prioritize opportunity interactions and nurture them into new accounts. Candace wants to be able to view dashboards and interact with visualizations. She wants to easily get counts of opportunities, amounts while adding filters, groupings, and formulas. Jose’s and Candace’s explorations often require interactions among related objects, such as Account and User. As the Cloud Kicks CRM Analytics admin, you want to make their work easier so they don’t need to wait for you or someone on your team to create the lenses they need. 

In this module you go through the steps to build the datasets that will help Jose and Candace. It’s important that the datasets contain accurate representations without any duplication or omission of important information.

Get Familiar with Building Datasets

A CRM Analytics dataset is a collection of related data that can be created from many sources, including Salesforce objects, external data sources, and other datasets. Datasets are the foundation to analysis, dashboards, models, and predictions. Building datasets according to an expected business need is a critical step. Tools that you use to create datasets include the Dataset Builder, which incorporates the instructions for building datasets using a dataflow, and Data Prep, which cleans and transforms data using a recipe. 

This module teaches best practices for creating a dataset and assumes you’re familiar with dataset construction using both of these methods. For step-by-step, hands-on practice with data integration, try the CRM Analytics Data Integration Basics module also in Trailhead.

What Is Your Dataset About?

You and your team meet to discuss what you’ve done to support Jose and Candace. You come away with learning that they’re all about analyzing opportunity pipeline, such as amount by account, owner, and close date.

Sales’ data comes from Salesforce objects that include Opportunity, Cases, Accounts, Campaigns, and so on. 

 

Note

This module talks about building datasets using the Dataset Builder and Data Prep. Learn more about each.

Your options to create the dataset include the Dataset Builder...

Dataflow lookup job.

...or through a recipe using Data Prep.

Data prep recipe.


Since Jose’s and Candace’s focus is opportunity pipeline, your strategy for a new dataset will consist of:
  1. Defining what the new dataset is about. You decide to select the Opportunity object and its fields as the grain.
  2. Linking the Account object to Opportunity so Jose and Candace can analyze insights by customer accounts. For this step, you add the Account object and selected fields based on the AccountId, a field that Account and Opportunity share.
  3. Linking the User object to Opportunity so Jose and Candace can view activity by opportunity owner. For this step you’ll add the User object plus selected fields based on the OwnerId field, a field that User and Opportunity share. You could also add the account owner by linking the Account and User objects.
Note

Dataset grain is the unit of data in each row of the dataset. Using the Opportunity object as an example, each dataset row is an opportunity, so the opportunity record is the grain. Understanding the grain is an important concept when you’re joining data together from different sources, as you'll discover in this module.

It’s important to establish the correct relationships among datasets to get the expected outcome. Dataset relationships can be implemented as either lookups or joins. However, don’t expect both operations to result in the same conclusion. It’s important to understand what each transformation delivers.

Creating lookups with the Dataset Builder

A lookup, for our purpose, is to find a single matching record. Opportunity and Account are used in this example, with AccountId as the common key between both objects.

Opportunity to Account lookup.

The recipe starts with one Opportunity row with AccountId as its unique identifier. 

Opportunity to Account lookup job.

The row initiates a “lookup” to one (and only one) matching AccountId from the Account object (right side of the lookup). If there are more than one matching Account record for that particular AccountId, the lookup brings back one random record and no more for a single lookup.

Since a dataset’s accuracy starts with selecting the object that meets your objective as the grain, it’s important to understand the role each object has in your lookup task. The left side object defines the grain, and it remains unique as more and more fields are added from related objects or datasets coming from the right side of the lookup. Here’s an example of the final result of a dataset. Remember we started with the Opportunity object as the lowest grain.

Unique rows in the Opportunity dataset.

The count of rows and sum of amount are both true values since every row of the Opportunity dataset is a unique opportunity. 

Count of Rows and Sum of Amount lens.

Why the Grain Matters

Let’s see what would happen if a right dataset is a lower grain or different grain from the left dataset in a single lookup?

Consider the following scenario where we want to augment Opportunity to Accounts. A lookup is initiated from one Account row (on the left) to multiple matching Opportunities (on the right).

Account and Opportunity datasets.

By looking at the datasets you can see there is: 

  • One matching Opportunity row for Abbott358 Inc in the Opportunity dataset.
  • Two matching Opportunity rows for account Adkins907 Inc.
  • Three matching Opportunity rows for Aguilar870 Inc.

Here is the result of a single lookup from the Account dataset to Opportunity dataset:

Data Prep recipe using a Lookup.

Do you notice what happened?

  • For account Abbot358 Inc, we got the one matching opportunity row.
  • For account Adams25 Inc, we had no matching opportunity row, hence we kept the initial grain (the account Adams25 Inc), and all the opportunity values at null or zero.
  • For account Adkins907 Inc, we got only one row (out of the two). We lost an opportunity!
  • Similarly, for account Aguilar870 Inc, we got only one row (out of three) and lost the details of the other two opportunities!

This is why you have to be very careful with lookups to datasets that have a different or lower grain. Later in this module, we discuss the join as another possible approach. 

Data for the Task or Ask

It’s perfectly acceptable to have multiple “base level” datasets for each of the main objects used by the business. For example, Cloud Kicks will need a base dataset for Opportunity, User, and Account to start with. From there you’ll determine how to combine them in the data layer or as needed in the dashboard layer.

Let’s say you’ve been asked to build datasets to help your users answer queries such as, opportunities by account and products by account. How many base objects are needed as shown in Illustration 1.

Illustration 1: How many base datasets are needed?Illustration 1: How many base datasets are needed?

You’ll have more options and flexibility with four base-level datasets, as shown in Illustration 2. If a user requires comprehensive Products analysis, the red circle illustrates how you can create a new Products dataset using related data by way of lookups from the Opportunity, Account, and User datasets. 


The green circle illustrates the needed datasets for an Opportunities dataset using lookups from Account, and User. The blue circle is an instance when a user wants to do basic analysis just on all Accounts only, and the purple circle is just for Users.

Illustration 2: Possibly four datasets for most scenarios.Illustration 2: Possibly four datasets for most scenarios.

 

If your user is also requesting an all Cases dataset by user and account, the number of base datasets increases to five, as highlighted by the magenta dotted line in Illustration 3.

Illustration 3: Five base level for most scenarios.Illustration 3: Five base level for most scenarios.

 

Summary

Starting with the lowest grain, or what the dataset is about, ensures an accurate target dataset with unique rows. Lookups return all rows from the lowest grain and a single, matching row from a related dataset. Evaluate results when using a lookup between datasets that have a different grain to avoid skewed results.

Resources

Keep learning for
free!
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities