Prepare Your Data
Prepare Data Overview
You’ve extracted all the data you need and you have two new datasets. The SIC Descriptions dataset contains data extracted from a CSV file. The Opportunities with Accounts and Users dataset has data extracted from objects in your Salesforce org. Your final task is to prepare this data and combine it into a single dataset.
Here’s where you are on the data journey. Nearly there!
Prepare Data in the Dataflow
You used the dataflow in the previous unit to extract data from your Salesforce objects. The dataflow also did a little bit of preparation for you, if you remember. It added account and user fields to your opportunity data and created a dataset.
So in addition to extracting, the dataflow is also a great data preparation tool. You can use it to filter data, add and remove fields, add or update rows from another dataset, and add calculations to your data. However, to prepare data you manually add instructions to the dataflow, either through the dataflow editor or by writing JSON. This isn’t for everyone, but if you want to try it out, have a look at the Resources section and follow the links for more details.
But for your DTC Electronics assignment you’re going to prepare your data without the dataflow, using a dataset recipe.
Prepare Data in a Dataset Recipe
Use Data Prep, a user interface tool, to create dataset recipes that take data from existing datasets, prepare it, and output the results to a new dataset. Use a recipe to combine data from multiple datasets, bucket the data, add formula fields, and cleanse the data by transforming field values. You can remove fields and filter rows that you don’t need before you create the target dataset.
When you create a recipe, you specify the transformations, or steps, that you want to perform on a source. The source can be one or more datasets or—although not covered here—connected objects. When you run the recipe, it applies these transformations and outputs the results to a new target dataset.
To keep your target dataset up to date, you can schedule a recipe to run on a recurring basis.
Create a Dataset Recipe
You create and manage dataset recipes in the data manager. Let’s head over there.
- In Analytics, click the gear icon and then click Data Manager. The data manager opens in a new browser tab.
- In the data manager, click the Dataflows & Recipes tab.
- On the Dataflows & Recipes tab, click the RECIPES subtab. The Recipes subtab shows you a list of any recipes you already have.
- Click Create Recipe. You see a list of available datasets.
- Choose the dataset you want to use as the base dataset. For this recipe, click Opportunities with Accounts and Users.
- Not sure what your base dataset is? Ask yourself which dataset contains the data that you want to prepare or add fields to. That’s your base dataset. In your case, it’s the Opportunities with Accounts and Users dataset.
- Enter a recipe name. Call this recipe Opportunities with SIC Descriptions.
- Click Next. The recipe opens, displaying a preview of the data from the base dataset you selected.
Add Data in a Dataset Recipe
- On the dataset recipe page, click the Add Data button ( ).
- Click the SIC Descriptions dataset. This is the “lookup” dataset that has the columns that you want to add. You see the Add Data window, which has 3 main sections. The Lookup Keys section (1) is where you choose how to match the data. If Analytics sees a possible match, it selects the lookup keys for you. If it can’t find a match, it selects the first fields from the recipe and the lookup dataset. You can keep this selection or choose different keys. You can use up to 5 key pairs. The Columns to Include section (2) is where you select which columns you want in the recipe after you add the data. Lookup keys are marked with the icon. Columns that you’ve hidden and columns from the lookup dataset aren’t selected. The Lookup Results Preview section (3) shows a preview of the data for the columns that you’ve included.
- In the Lookup Keys section, confirm that Analytics has selected the AccountId.SIC and SIC Code lookup keys. If different keys are selected, click into each lookup key field to select the correct key.
- At the bottom of the Columns to Include list, make sure that the SIC Description column is selected. This is the only additional column that you need from the lookup dataset.
- Click Done. SIC Description appears as a new column in the recipe, and the change appears as a recipe step in the left pane.
You can continue to add columns from other datasets in the same way.
Right now, you’ve done what you need to do in the recipe. But there’s a whole host of other data preparations you can do here. If you want to try some of them, take a look at the resources section for more details.
Run a Dataset Recipe
When you run a recipe, Analytics performs the steps you added and creates the target dataset. Select an app for the new dataset, and select the final fields to include. You can also keep the target dataset fresh by scheduling the recipe to run on a regular basis. Let’s create the dataset!
- To open the Run Recipe dialog, click Create Dataset on the dataset recipe page. By default, Analytics uses the recipe name for the target dataset that you’re creating. To use a different name, edit the Dataset Name field. Let’s leave the name as it is.
- From the App picklist (1), select Sales Performance Datasets.
- In the list of fields (2), notice that the columns that you hid from the preview are not selected. Select these columns to include them in the target dataset.
- Account ID
- Owner ID
- Created Date
- Opportunity ID
- In the list of fields, the AccountId.SIC and SIC Code columns are both selected because they were the lookup keys. You only need one of them, so deselect SIC Code at the bottom of the list.
- Click Continue. You can now choose to schedule the recipe before you run it, or run it just once.
- You want to schedule the recipe to run each weekday morning, after the dataflow runs. So select Yes and click Schedule Recipe. The schedule options appear.
- Schedule the dataflow to run every 24 hours at 2:00 AM each weekday by selecting these settings.
- Schedule by: Hour
- Start at: 2:00 am
- Run every: 24 Hours
- Select days: M, Tu, W, Th, F
- Click Schedule and Run. Your recipe is queued to run.
Monitor the Recipe Job and Verify the New Dataset
Behind the scenes, Analytics again creates a job for the recipe. You can go to the Monitor tab of the data manager to check on its progress.
It’s also a good idea to open the new dataset to check that all the fields are there.
- In the data manager, close the recipe tab.
- On the left of the data manager, click the Data tab.
- On the right of the Opportunities with SIC Descriptions dataset, click and select Explore. If you don’t see the dataset, try refreshing your browser.
- Under Bars, click the Add a group (+) button. You should see the SIC Description field in the list of dimensions.
You did it! You figured out the pieces of data, where they live, and got both external and Salesforce data into Analytics. You then pulled them together, cleaned them up, and created a single set of data with all the necessary fields, available instantly.
Let’s see just how far you got.