Start tracking your progress
Trailhead Home
Trailhead Home

Prepare Your Data

Learning Objectives

After completing this unit, you’ll be able to:
  • Describe how you can use the dataflow to prepare data for use.
  • Prepare data using a dataset recipe.

Prepare Data Overview

You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities with Accounts and Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a single dataset.

Here’s where you are on the data journey. Nearly there!

Data journey map with the Join External Data data recipe step highlighted

Prepare Data in a Dataflow

You used the dataflow in the previous unit to extract data from your Salesforce objects. The dataflow also did a little bit of preparation for you, if you remember. It added account and user fields to your opportunity data and created a dataset.

So in addition to extracting, the dataflow is also a great data preparation tool. You can use it to filter data, add and remove fields, add or update rows from another dataset, and add calculations to your data. However, to prepare data you manually add instructions to the dataflow, either through the dataflow editor or by writing JSON. This isn’t for everyone, but if you want to try it out, have a look at the Resources section and follow the links for more details.

But for your DTC Electronics assignment you’re going to prepare your data without the dataflow, using a dataset recipe.

Prepare Data in a Dataset Recipe

Use Data Prep, a user interface tool, to create dataset recipes that take data from existing datasets, prepare it, and output the results to a new dataset. Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data by transforming field values. You can remove fields and filter rows that you don’t need before you create the target dataset.

When you create a recipe, you specify the transformations, or steps, that you want to perform on a source. The source can be one or more datasets or—although not covered here—connected objects. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.

Dataset recipe overview

To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.

Create a Dataset Recipe

You create and manage dataset recipes in the data manager. Let’s head over there.

  1. In Analytics, click the gear icon Wave gear icon used to open the Data Manager and then click Data Manager. The data manager opens in a new browser tab.
  2. In the data manager, click the Dataflows & Recipes tab.
  3. On the Dataflows & Recipes tab, click the RECIPES subtab. The Recipes subtab shows you a list of any recipes you already have. Prepare tab
  4. Click Create Recipe.
  5. Close the Data Prep welcome mat.
  6. Click Select Data.
  7. Choose the dataset you want to use as the base dataset. For this recipe, click Opportunities with Accounts and Users. Not sure what your base dataset is? Ask yourself which dataset contains the data that you want to prepare or add fields to. That’s your base dataset. In your case, it’s the Opportunities with Accounts and Users dataset.
  8. Click Next. The recipe opens in Data Prep.
In case this is your first time creating a recipe, let’s pause to look at a few tricks to help you navigate.

By default, the recipe steps, called nodes, are displayed at the top, while details of the selected node and a real-time preview of your data is displayed below. Every recipe starts with an input node.  The recipe preview can display up to 100 columns.

Dataset recipe preview

If there are a lot of columns here, you can switch to the Columns tab to search for columns or hide ones that you don’t need. Let’s hide a few columns to declutter the preview.

  1. Click the Columns tab. You see a list view of the columns in the recipe. Dataset recipe preview edit
  2. To hide the Account ID column, select the column and click the Hide button (Hide button next to the node's name) above the Preview and Column section. Hidden columns are listed below the displayed columns.
  3. You don’t really need the following columns right now, so hide each of these.
    • AccountId.BillingCountry
    • AccountId.BillingCity
    • Owner ID
    • Created Date
    • Opportunity ID
  4. Click the Preview tab. The preview no longer includes the columns that you hid.

OK, now that you’ve decluttered the recipe a bit, let’s get back to adding SIC Description data to our base opportunity dataset.

Add Data in a Dataset Recipe

You can add columns from any other dataset to the data you already have in a recipe. You have to “match” the data so that Analytics can add the right values to the right rows in the new dataset. For example, your SIC Description dataset has an SIC Code field that you match to the account SIC Code field in the recipe. This “matched” field is called the join key. Let’s see it in action.
  1. In Data Prep, click the plus button (Plus button next to the input node) next to the Opportunities with Accounts and Users input node.
  2. Select Join.
  3. Click the SIC Descriptions dataset. This is the “lookup” dataset that has the columns that you want to add.
  4. Click Next. You see the Join window, which has 4 main sections. The Join screen where the type, columns, and other settings are configured.The Join Type section (1) lets you control whether the join is a lookup or another type. The Join Keys section (2) is where you choose how to match the data. If Analytics sees a possible match, it selects the join keys for you. If it can’t find a match, it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs. The Columns section (3) is where you select which columns you want in the recipe after you add the data. Columns that you’ve hidden and columns from the lookup dataset aren’t selected. The Preview section (4) shows a preview of the data for the columns that you’ve included.
  5. In the Join Keys section, confirm that Analytics has selected the AccountId.SIC and SIC Code lookup keys. If different keys are selected, click into each lookup key field to select the correct key.
  6. At the bottom of the Columns section, make sure that the SIC Description column is selected. This is the only additional column that you need from the lookup dataset.
  7. Click Apply. The Join node and SIC Descriptions input node are added to the recipe. New field in dataset recipe

You can continue to add columns from other datasets in the same way.

Configure a Recipe's Output

Let's add an output node so we can run the recipe exactly like we want it. Select an app for the new dataset, and select the final fields to include. 

  1. Click the plus button (Plus button next to the input node) next to the Join node.
  2. Select Output. Output node configuration. Settings are on the left, preview and columns are on the right.
  3. Set up the output node (1):
    • Dataset Display Label: Opportunities with SIC Descriptions
    • Dataset API Name: Opportunities_with_Accounts_and_Users
    • App Location: Sales Performance Datasets
  4. In the list of columns (2), notice that the columns that you hid from the preview are not selected. Select these columns to include them in the target dataset.
    • Account ID
    • AccountId.BillingCountry
    • AccountId.BillingCity
    • Owner ID
    • Created Date
    • Opportunity ID
  5. In the list of columns, the AccountId.SIC and SIC Code columns are both selected because they were the lookup keys. You only need one of them, so hide SIC Code at the bottom of the list.
  6. Click Apply.
  7. Click Save.
  8. Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
  9. Click Back to Data Manager.

Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.

Run a Dataset Recipe

When you run a recipe, Analytics performs the steps you added and creates the target dataset. You can also keep the target dataset fresh by scheduling the recipe to run on a regular basis. Let’s create the dataset!

  1. In the data manager, click the Dataflows & Recipes tab.
  2. On the Dataflows & Recipes tab, click the RECIPES subtab.
  3. To run the recipe, click Dataflow menu button in data manager to the right of Opportunities with SIC Descriptions recipe and select Run Now. You'll see a green banner at the top of the data manager that confirms the run.
  4. Click Dataflow menu button in data manager again and select Schedule.
  5. You want to schedule the recipe to run each weekday morning, after the dataflow runs. Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 2:00 am
    3. Run every: 24 Hours
    4. Select days: Mon, Tue, Wed, Thu, Fri
  6. Click Save. Your recipe is scheduled.

Monitor a Recipe Job and Verify the New Dataset

Behind the scenes, Analytics again creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.

It’s also a good idea to open the new dataset to check that all the fields are there.

  1. On the left of the data manager, click the Data tab.
  2. On the right of the Opportunities with SIC Descriptions dataset, click Dataflow menu button in data manager and select Explore. If you don’t see the dataset, try refreshing your browser.
  3. Under Bars, click the Add a group (+) button. You should see the SIC Description field in the list of dimensions.

Congratulations!

You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into Analytics. You then pulled them together, cleaned them up, and created a single set of data with all the necessary fields, available instantly.

Let’s see just how far you got.

Data journey map showing that you have reached the end of the journey. Well done!