Prepare Your Data
You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities with Accounts and Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a single dataset.
Here’s where you are on the data journey. Nearly there!
You used the dataflow in the previous unit to extract data from your Salesforce objects. The dataflow also did a little bit of preparation for you, if you remember. It added account and user fields to your opportunity data and created a dataset.
So in addition to extracting, the dataflow is also a great data preparation tool. You can use it to filter data, add and remove fields, add or update rows from another dataset, and add calculations to your data. However, to prepare data you manually add instructions to the dataflow, either through the dataflow editor or by writing JSON. This isn’t for everyone, but if you want to try it out, have a look at the Resources section and follow the links for more details.
But for your DTC Electronics assignment you’re going to prepare your data without the dataflow, using a dataset recipe.
A dataset recipe is a user-interface tool that lets you take data from existing datasets, prepare it, and output the results to a new dataset. Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data by transforming field values. You can remove fields and filter rows that you don’t need before you create the target dataset.
When you create a recipe, you specify the transformations, or steps, that you want to perform on a specific source dataset. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.
To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.
You create and manage dataset recipes in the data manager. Let’s head over there.
In Analytics, click the gear icon and then click Data
The data manager opens in a new browser tab.
- In the data manager, click the Dataflows & Recipes tab.
On the Dataflows & Recipes tab, click the DATASET RECIPES
The Dataset Recipes subtab shows you a list of any recipes you already have.
Click Create Recipe.
You see a list of available datasets.
Choose the dataset you want to use as the base dataset. For this recipe, click
Opportunities with Accounts and Users.
- Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
The recipe opens, displaying a preview of the data from the base dataset you selected.
- On the dataset recipe page, click the Add Data button ().
Click the SIC Descriptions dataset. This is the “lookup” dataset
that has the columns that you want to add.
You see the Add Data window, which has 3 main sections.The Lookup Keys section (1) is where you choose how to match the data. If Analytics sees a possible match, it selects the lookup keys for you. Here, Analytics can’t find a match, so it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs. The Columns to Include section (2) is where you select which columns you want in the recipe after you add the data. Lookup keys are marked with the icon. Columns that you’ve hidden and columns from the lookup dataset aren’t selected. The Lookup Results Preview section (3) shows a preview of the data for the columns that you’ve included.
- In the Lookup Keys section, click into the first lookup key field and select AccountId.SIC.
- Click into the second lookup key field and select SIC Code.
- At the bottom of the Columns to Include list, make sure that the SIC Description column is selected. This is the only additional column that you need from the lookup dataset.
SIC Description appears as a new column in the recipe, and the change appears as a recipe step in the left pane.
You can continue to add columns from other datasets in the same way.
Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.
When you run a recipe, Analytics performs the steps you added and creates the target dataset. Select an app for the new dataset, and select the final fields to include. You can also keep the target dataset fresh by scheduling the recipe to run on a regular basis. Let’s create the dataset!
To open the Run Recipe dialog, click Create Dataset on the dataset
- From the App picklist (1), select Sales Performance Datasets.
In the list of fields (2), notice that the columns that you hid from the preview are not
selected. Select these columns to include them in the target dataset.
- Account ID
- Owner ID
- Created Date
- Opportunity ID
- In the list of fields, the AccountId.SIC and SIC Code columns are both selected because they were the lookup keys. You only need one of them, so deselect SIC Code at the bottom of the list.
You can now choose to schedule the recipe before you run it, or run it just once.
You want to schedule the recipe to run each weekday morning, after the dataflow runs. So
select Yes and click Schedule Recipe.
The schedule options appear.
Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these
- Schedule by: Hour
- Start at: 2:00 am
- Run every: 24 Hours
- Select days: M, Tu, W, Th, F
Click Schedule and Run.
Your recipe is queued to run.
Behind the scenes, Analytics again creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.
It’s also a good idea to open the new dataset to check that all the fields are there.
- To return to the Analytics Studio, click the gear icon () and then click Analytics Studio.
- At the top of the Analytics Studio, click Datasets.
Click the Opportunities with SIC Descriptions dataset.
Under Bars, click the Add a group (+) button.
You should see the SIC Description field in the list of dimensions.
You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into Analytics. You then pulled them together, cleaned them up, and created a single set of data with all the necessary fields, available instantly.
Let’s see just how far you got.