Prepare Your Data
Prepare Data Overview
You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities with Accounts and Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a single dataset.
Here’s where you are on the data journey. Nearly there!
Prepare Data in a Dataflow
You used the dataflow in the previous unit to extract data from your Salesforce objects. The dataflow also did a little bit of preparation for you, if you remember. It added account and user fields to your opportunity data and created a dataset.
So in addition to extracting, the dataflow is also a great data preparation tool. You can use it to filter data, add and remove fields, add or update rows from another dataset, and add calculations to your data. However, to prepare data you manually add instructions to the dataflow, either through the dataflow editor or by writing JSON. This isn’t for everyone, but if you want to try it out, have a look at the Resources section and follow the links for more details.
But for your DTC Electronics assignment you’re going to prepare your data without the dataflow, using a dataset recipe.
Prepare Data in a Dataset Recipe
Use Data Prep, a user interface tool, to create dataset recipes that take data from existing datasets, prepare it, and output the results to a new dataset. Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data by transforming field values. You can remove fields and filter rows that you don’t need before you create the target dataset.
When you create a recipe, you specify the transformations, or steps, that you want to perform on a source. The source can be one or more datasets or—although not covered here—connected objects. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.
To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.
Create a Dataset Recipe
You create and manage dataset recipes in the data manager. Let’s head over there.
- In Analytics, click the gear icon and then click Data Manager. The data manager opens in a new browser tab.
- In the data manager, click the Dataflows & Recipes tab.
- On the Dataflows & Recipes tab, click the RECIPES subtab. The Recipes subtab shows you a list of any recipes you already have.
- Click Create Recipe.
- Close the Data Prep welcome mat.
- Click Select Data.
- Choose the dataset you want to use as the base dataset. For this recipe, click Opportunities with Accounts and Users. Not sure what your base dataset is? Ask yourself which dataset contains the data that you want to prepare or add fields to. That’s your base dataset. In your case, it’s the Opportunities with Accounts and Users dataset.
- Click Next. The recipe opens in Data Prep.
Add Data in a Dataset Recipe
- In Data Prep, click the plus button () next to the Opportunities with Accounts and Users input node.
- Select Join.
- Click the SIC Descriptions dataset. This is the “lookup” dataset that has the columns that you want to add.
- Click Next. You see the Join window, which has 4 main sections. The Join Type section (1) lets you control whether the join is a lookup or another type. The Join Keys section (2) is where you choose how to match the data. If Analytics sees a possible match, it selects the join keys for you. If it can’t find a match, it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs. The Columns section (3) is where you select which columns you want in the recipe after you add the data. Columns that you’ve hidden and columns from the lookup dataset aren’t selected. The Preview section (4) shows a preview of the data for the columns that you’ve included.
- In the Join Keys section, confirm that Analytics has selected the AccountId.SIC and SIC Code lookup keys. If different keys are selected, click into each lookup key field to select the correct key.
- At the bottom of the Columns section, make sure that the SIC Description column is selected. This is the only additional column that you need from the lookup dataset.
- Click Apply. The Join node and SIC Descriptions input node are added to the recipe.
You can continue to add columns from other datasets in the same way.
Configure a Recipe's Output
Let's add an output node so we can run the recipe exactly like we want it. Select an app for the new dataset, and select the final fields to include.
- Click the plus button () next to the Join node.
- Select Output.
- Set up the output node (1):
- Dataset Display Label: Opportunities with SIC Descriptions
- Dataset API Name: Opportunities_with_Accounts_and_Users
- App Location: Sales Performance Datasets
- In the list of columns (2), notice that the columns that you hid from the preview are not selected. Select these columns to include them in the target dataset.
- Account ID
- Owner ID
- Created Date
- Opportunity ID
- In the list of columns, the AccountId.SIC and SIC Code columns are both selected because they were the lookup keys. You only need one of them, so hide SIC Code at the bottom of the list.
- Click Apply.
- Click Save.
- Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
- Click Back to Data Manager.
Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.
Run a Dataset Recipe
When you run a recipe, Analytics performs the steps you added and creates the target dataset. You can also keep the target dataset fresh by scheduling the recipe to run on a regular basis. Let’s create the dataset!
- In the data manager, click the Dataflows & Recipes tab.
- On the Dataflows & Recipes tab, click the RECIPES subtab.
- To run the recipe, click to the right of Opportunities with SIC Descriptions recipe and select Run Now. You'll see a green banner at the top of the data manager that confirms the run.
- Click again and select Schedule.
- You want to schedule the recipe to run each weekday morning, after the dataflow runs. Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these settings.
- Schedule by: Hour
- Start at: 2:00 am
- Run every: 24 Hours
- Select days: Mon, Tue, Wed, Thu, Fri
- Click Save. Your recipe is scheduled.
Monitor a Recipe Job and Verify the New Dataset
Behind the scenes, Analytics again creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.
It’s also a good idea to open the new dataset to check that all the fields are there.
- On the left of the data manager, click the Data tab.
- On the right of the Opportunities with SIC Descriptions dataset, click and select Explore. If you don’t see the dataset, try refreshing your browser.
- Under Bars, click the Add a group (+) button. You should see the SIC Description field in the list of dimensions.
You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into Analytics. You then pulled them together, cleaned them up, and created a single set of data with all the necessary fields, available instantly.
Let’s see just how far you got.