Start tracking your progress
Trailhead Home
Trailhead Home

Prepare Your Data

Learning Objectives

After completing this unit, you’ll be able to:
  • Describe how you can use the dataflow to prepare data for use.
  • Prepare data using a dataset recipe.

Prepare Data Overview

You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities with Accounts and Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a single dataset.

Here’s where you are on the data journey. Nearly there!

Data journey map with the Join External Data data recipe step highlighted

Prepare Data in the Dataflow

You used the dataflow in the previous unit to extract data from your Salesforce objects. The dataflow also did a little bit of preparation for you, if you remember. It added account and user fields to your opportunity data and created a dataset.

So in addition to extracting, the dataflow is also a great data preparation tool. You can use it to filter data, add and remove fields, add or update rows from another dataset, and add calculations to your data. However, to prepare data you manually add instructions to the dataflow, either through the dataflow editor or by writing JSON. This isn’t for everyone, but if you want to try it out, have a look at the Resources section and follow the links for more details.

But for your DTC Electronics assignment you’re going to prepare your data without the dataflow, using a dataset recipe.

Prepare Data in a Dataset Recipe

Use Data Prep, a user interface tool, to create dataset recipes that take data from existing datasets, prepare it, and output the results to a new dataset. Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data by transforming field values. You can remove fields and filter rows that you don’t need before you create the target dataset.

When you create a recipe, you specify the transformations, or steps, that you want to perform on a source. The source can be one or more datasets or—although not covered here—connected objects. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.

Dataset recipe overview

To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.

Create a Dataset Recipe

You create and manage dataset recipes in the data manager. Let’s head over there.

  1. In Analytics, click the gear icon Wave gear icon used to open the Data Manager and then click Data Manager. The data manager opens in a new browser tab.
  2. In the data manager, click the Dataflows & Recipes tab.
  3. On the Dataflows & Recipes tab, click the RECIPES subtab. The Recipes subtab shows you a list of any recipes you already have. Prepare tab
  4. Click Create Recipe. You see a list of available datasets.
  5. Choose the dataset you want to use as the base dataset. For this recipe, click Opportunities with Accounts and Users
  6. Not sure what your base dataset is? Ask yourself which dataset contains the data that you want to prepare or add fields to. That’s your base dataset. In your case, it’s the Opportunities with Accounts and Users dataset.
  7. Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
  8. Click Next. The recipe opens, displaying a preview of the data from the base dataset you selected.
There’s a lot going on in a dataset recipe. The process of creating a recipe can be bewildering, especially if it’s your first time. So let’s pause for a moment to let you get your bearings, then look at a few tricks to help you navigate.

By default, the central pane in a dataset recipe displays a real-time preview of your data as you prepare it.

Dataset recipe preview

If there are a lot of columns here, you can switch to the Columns tab to search for columns or hide ones that you don’t need. Let’s hide a few columns to declutter the preview.

  1. Click the Columns tab. You see a list view of the columns in the recipe. Dataset recipe preview edit
  2. To the right of the Account ID column, click the menu button ( Menu button on right of a column on the Columns tab) and select Hide Column. The column is moved to the Not in Recipe list at the bottom of the Columns tab.
  3. You don’t really need the following columns right now, so hide each of these.
    1. AccountId.BillingCountry
    2. AccountId.BillingCity
    3. Owner ID
    4. Created Date
    5. Opportunity ID
  4. The recipe preview can display up to 100 columns. If you hide a column from the preview, by default it’s not included in the target dataset. However, you can include any column when you later create the dataset.
  5. Click the Preview tab. The preview no longer includes the columns that you hid.

OK, now that you’ve decluttered the recipe a bit, let’s get back to adding SIC Description data to our base opportunity dataset.

Add Data in a Dataset Recipe

You can add columns from any other dataset to the data you already have in a recipe. You have to “match” the data so that Analytics can add the right values to the right rows in the new dataset. For example, your SIC Description dataset has an SIC Code field that you match to the account SIC Code field in the recipe. This “matched” field is called the lookup key. Let’s see it in action.
  1. On the dataset recipe page, click the Add Data button ( Add data button).
  2. Click the SIC Descriptions dataset. This is the “lookup” dataset that has the columns that you want to add. You see the Add Data window, which has 3 main sections. Add data dialog The Lookup Keys section (1) is where you choose how to match the data. If Analytics sees a possible match, it selects the lookup keys for you. If it can’t find a match, it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs. The Columns to Include section (2) is where you select which columns you want in the recipe after you add the data. Lookup keys are marked with the Key icon that appears next to lookup key fields in columns to include section icon. Columns that you’ve hidden and columns from the lookup dataset aren’t selected. The Lookup Results Preview section (3) shows a preview of the data for the columns that you’ve included.
  3. In the Lookup Keys section, confirm that Analytics has selected the AccountId.SIC and SIC Code lookup keys. If different keys are selected, click into each lookup key field to select the correct key.
  4. At the bottom of the Columns to Include list, make sure that the SIC Description column is selected. This is the only additional column that you need from the lookup dataset.
  5. Click Done. SIC Description appears as a new column in the recipe, and the change appears as a recipe step in the left pane. New field in dataset recipe

You can continue to add columns from other datasets in the same way.

Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.

Run a Dataset Recipe

When you run a recipe, Analytics performs the steps you added and creates the target dataset. Select an app for the new dataset, and select the final fields to include. You can also keep the target dataset fresh by scheduling the recipe to run on a regular basis. Let’s create the dataset!

  1. To open the Run Recipe dialog, click Create Dataset on the dataset recipe page. Run dataset recipe dialog By default, Analytics uses the recipe name for the target dataset that you’re creating. To use a different name, edit the Dataset Name field. Let’s leave the name as it is.
  2. From the App picklist (1), select Sales Performance Datasets.
  3. In the list of fields (2), notice that the columns that you hid from the preview are not selected. Select these columns to include them in the target dataset.
    1. Account ID
    2. AccountId.BillingCountry
    3. AccountId.BillingCity
    4. Owner ID
    5. Created Date
    6. Opportunity ID
  4. In the list of fields, the AccountId.SIC and SIC Code columns are both selected because they were the lookup keys. You only need one of them, so deselect SIC Code at the bottom of the list.
  5. Click Continue. You can now choose to schedule the recipe before you run it, or run it just once.
  6. You want to schedule the recipe to run each weekday morning, after the dataflow runs. So select Yes and click Schedule Recipe. The schedule options appear.
  7. Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 2:00 am
    3. Run every: 24 Hours
    4. Select days: M, Tu, W, Th, F
  8. Click Schedule and Run. Your recipe is queued to run.

Monitor the Recipe Job and Verify the New Dataset

Behind the scenes, Analytics again creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.

It’s also a good idea to open the new dataset to check that all the fields are there.

  1. In the data manager, close the recipe tab. Screenshot of recipe tab with close cross icon highlighted
  2. On the left of the data manager, click the Data tab.
  3. On the right of the Opportunities with SIC Descriptions dataset, click Dataflow menu button in data manager and select Explore. If you don’t see the dataset, try refreshing your browser.
  4. Under Bars, click the Add a group (+) button. You should see the SIC Description field in the list of dimensions.


You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into Analytics. You then pulled them together, cleaned them up, and created a single set of data with all the necessary fields, available instantly.

Let’s see just how far you got.

Data journey map showing that you have reached the end of the journey. Well done!