Skip to main content

Prepare Your Data

Learning Objectives

After completing this unit, you’ll be able to:

  • Describe how you can use a recipe to prepare data for use.
  • Prepare data using a recipe.

Prepare Data Overview

You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities_with_Accounts_and_Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a dataset.

Prepare Data in a Recipe

We introduced recipes when we extracted Salesforce Object data, but it's in the data preparation stage where they really get cooking. 

When you create a recipe, you specify the transformations, or steps, that you want to perform on a source. The source can be one or more datasets or connected objects. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.

To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.

Create a Recipe

Similar to the steps you followed when extracting Salesforce objects, you can now create a dataset recipe with Opportunities_with_Accounts_and_Users  and join it to SIC Descriptions. Here, Opportunities_with_Accounts_and_Users is what is known as a "base dataset." It's the dataset that contains the data to which we want to add fields and prepare. 

  1. From Data Manager, in the left-hand pane, click Recipes.
  2. Click New Recipe.
  3. Click Add Input Data.
  4. Uncheck External Connected Objects and Salesforce Objects, so that only datasets are visible. Click the dataset Opportunities_with_Accounts_and_Users.
  5. Click Next. The recipe opens in Data Prep.

When you click on the input node, you see the data Preview selected. By default, Preview displays up to 2000 rows of data.

The Recipe editor, showing a preview of the Opportunities_with_Accounts_and_Users dataset.

Hide Recipe Columns

If your dataset has many columns, you can switch to the Columns tab to search for columns or hide ones that you don’t need. Let’s hide a few columns to declutter the preview.

  1. Click the Columns tab. You see a list view of the columns in the recipe.
    The Recipe editor dataset preview Columns list, showing columns from the Opportunities_with_Accounts_and_Users dataset.
  2. To hide the Account ID column, select the column and click the Hide button () above the Preview and Column section. Hidden columns are listed below the displayed columns.
  3. You don’t really need the following columns right now, so hide each of these.
    • Billing Country
    • Billing City
    • Owner ID (see under API name, Account.OwnerId)
    • Created Date
    • Opportunity ID
  4. Click the Preview tab. The preview no longer includes the columns that you hid.

OK, now that you’ve decluttered the recipe a bit, let’s get back to adding SIC Description data to your base opportunity dataset.

Join Datasets in a Recipe

As you saw when combining the Opportunity, Account, and User objects into a dataset, to bring data sources together, you need to match fields between them. The join key here is the SIC Code, since SIC Description's SIC Code and the account SIC Code fields match. Let's join them in the recipe.

  1. In Data Prep, click the plus button (+) next to the Opportunities_with_Accounts_and_Users input node.
  2. Select Join.
  3. Click the SIC Descriptions dataset. This is the “lookup” dataset that has the columns that you want to add.
  4. Click Next. You see the Join window. You’re already familiar with this window from the previous lesson

If CRM Analytics sees a possible match, it selects the join keys for you. If it can’t find a match, it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs.

Under the Columns tab you select the columns you want in the recipe after you add the data. Columns that you’ve hidden and columns from the lookup dataset aren’t selected.  The Preview tab shows a preview of the data for the columns that you’ve included.

In this case CRM Analytics hasn't selected the correct join keys.

  1. In the Join Keys section, click the preselected values and update it.
    • Opportunities_with_Accounts_and_Users: SIC Code
    • SIC Description: SIC Code
  2. Click Add.
  3. Click Apply. The Join node and SIC Descriptions input node are added to the recipe.

You can continue to add columns from other datasets in the same way.

Configure a Recipe's Output

Let's select the final fields to include then add an output node so you can run the recipe exactly as you want it. . 

  1. In the Columns tab, notice that all of the hidden columns are included in the recipe. Also, the SIC Code appears twice, as Account.Sic and SICDescrip.SIC_Code, because they are the lookup keys. You only need one of them, so hide SIC Code at the bottom of the list.
    The Recipe editor dataset preview Columns list, showing SIC Code unchecked to exclude it from the output.
  2. Click Apply.
  3. Click the plus button () next to the Join node.
  4. Select Output.
  5. Set up the output node:
    1. Dataset Display Label: Opportunities with SIC Descriptions
    2. Dataset API Name: Opportunities_with_SIC_Descriptions
    3. App Location: Sales Performance Datasets
  6. In the Columns tab, notice that SIC Code appears only once.
  7. Click Apply.
  8. Click Save.
  9. Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
  10. Click Back to Data Manager.

Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.

Run a Recipe

When you run a recipe, CRM Analytics performs the steps you added and creates the target dataset. You can also keep the target dataset fresh by scheduling the recipe to run regularly. Let’s create the dataset!

  1. In the data manager, click the Recipes tab.
  2. To run the recipe, click to the right of Opportunities with SIC Descriptions recipe and select Run Now. You'll see a green banner at the top of the data manager that confirms the run.
  3. Click again dropdown menu button and select Schedule.
  4. You want to schedule the recipe to run each weekday morning, after the dataflow runs. Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these settings.
    • Schedule by: Hour
    • Start at: 2:00 am
    • Run every: 24 Hours
    • Select days: Mon, Tue, Wed, Thu, Fri
      The Schedule dialogue, showing the schedule for Opportunities with SIC Descriptions.
  5. Click Save. Your recipe is scheduled.

Monitor a Recipe Job and Verify the New Dataset

Behind the scenes, CRM Analytics creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.

It’s also a good idea to open the new dataset to check that all the fields are there.

  1. In the data manager, click the Data Assets tab.
  2. On the right of the Opportunities with SIC Descriptions dataset, click and select Explore. If you don’t see the dataset, try refreshing your browser.
  3. Under Bars, click the + button. You should see the SIC Description field in the list.

Congratulations!

You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into CRM Analytics. You then pulled them together, cleaned them up, and created a set of data with all the necessary fields, available instantly.

Resources

Keep learning for
free!
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities