Start tracking your progress
Trailhead Home
Trailhead Home

Load Your Local Sales Data

Design the Dataflow

Your first task is to create a US Sales dataset for the opportunities and related account and owner data in your local Salesforce org. To extract local Salesforce data in Analytics, you use the dataflow. Your data includes non-US opportunities, so you add a filter in the dataflow to include only US opportunities.

Here’s the design of the dataflow.

Diagram showing the transformations used to extract and combine local opportunity, account, and user data in the dataflow

Create a Dataflow for US Sales Data

Your US org comes with default dataflows that are used in other Trailhead modules and projects. To start from scratch, you create your own dataflow.

  1. Log in to your US org. This is the first org that you signed up for.
  2. In the top-left corner, click the App Launcher ( App Launcher icon), and then find and select Analytics Studio.
  3. In the left-hand pane, click Data Manager.
  4. In the left-hand pane, click Dataflows & Recipes.
  5. Click Create Dataflow.
  6. Name the dataflow US Sales Dataflow.
  7. Click Create.
  8. At the top of the dataflow editor, click the Dataset Builder button ( The dataset builder button in the toolbar at the top of the dataflow editor).
  9. In the Dataset Name field, enter US Sales.
  10. Click Continue.
  11. In the Pick a Salesforce object to start search box, enter opp.
  12. In the object list, click Opportunity.
  13. Hover over the object and click the plus (+).
    Screenshot of the Opportunity object in the dataset builder with the plus icon highlighted
  14. Click to select these fields:
    • Amount
    • Close Date
    • Created Date
    • Lead Source
    • Name
    • Opportunity Type
    • Stage
  15. At the top of the field list, click the Relationships tab.
  16. In front of the Account ID relationship, click Join.
  17. In front of the Owner ID relationship, click Join.
  18. Hover over the Account object and click the plus (+).
  19. Select these fields:
    • Account Name
    • Account Type
    • Billing City
    • Billing Country
    • Billing State/Province
    • Industry
  20. To hide the field list, to the right of the Account object, click The x button that you click to close the select fields dialog.
  21. Repeat steps 18–19 to select these fields from the User object:
    • Full Name
    • Title
  22. Click Next. Analytics creates the sfdcDigest, augment, and register nodes needed to build the dataset in your dataflow.
The dataflow extracts all opportunities from your US org. But you want only opportunities related to US accounts. So you need a filter. There are two places you can add a filter data in a dataflow: in an sfdcDigest node or with a separate filter node. 

Tip: Wondering where to put a filter in your dataflow? The benefit of adding a filter in the sfdcDigest node is that you’re reducing the amount of data that the rest of the dataflow has to process. However, if you’re using that data to create more than one dataset, you might be removing data that’s needed elsewhere. So adding a separate filter node further downstream might be the solution. Another advantage to a separate filter node is that you can use it with data that has been combined from different sfdcDigest nodes. For example, here you want to filter the opportunities based on the account country, so you need a separate filter after the account data has been augmented with the opportunity data. Decision made. A separate filter node it is.

  1. Click the Filter button ( The filter button in the toolbar at the top of the dataflow editor).
  2. Complete these fields:
    1. Node Name: filter_US_Accounts
    2. Source: augment_Opportunity_Account
    3. Use SAQL: Deselected (When you enter a filter expression, you can use a standard or SAQL expression--each requires a unique syntax. In this case, we'll use a standard expression. For more information about filter expressions, see the resources at the end of this unit.)
    4. Filter: AccountId.BillingCountry:EQ:USA
      (Tip: To see a list of field names that you can use in the filter, click the Output Fields tab. You can copy field names here and paste them into your filter.)
  3. Click Create.
  4. On the right of the filter node, click and hold the > and drag a path onto the > on the left of the augment_User node.
  5. Click Save.
  6. Click the augment_User node and, in the Left Key field, select OwnerId.
  7. Click Save.
  8. Click Save without Propagating. Choose this option to not propagate newly added fields to downstream transformations in the dataflow.
  9. Click Update Dataflow.
  10. Click Update Dataflow.

Run the Dataflow and Explore the New Dataset

Before we run the dataflow, we're going to sync the data from the Opportunity, User, and Account objects in Salesforce. The data sync stages the Salesforce data in Einstein Analytics, making it available for the dataflow to use. 

  1. Click Data Manager in the top left corner to open Data Manager.
  2. Click the Connect tab in Data Manager.
  3. Click the dropdown to the right of SFDC_LOCAL, and select Run Now. This action runs data sync for all objects under the SFDC_LOCAL connection. (If we wanted, we can run individual data syncs for each object by clicking the dropdown next to each object and selecting Run Data Sync.)
  4. Click the Monitor tab in Data Manager. You can view the statuses of the sync jobs on the Jobs subtab.
  5. Click The refresh button at the top of the Monitor tab to refresh the view until the jobs finish. After all sync jobs complete successfully, move on to the next step.
  6. To run the dataflow, click the Dataflow & Recipes tab in Data Manager.
  7. Click the dropdown to the right of US Sales Dataflow, and click Run Now.
  8. Click the Monitor tab to view the status of the dataflow. The Jobs subtab shows current and historical job runs, including data syncs, dataflows, and recipes.
  9. If the dataflow status shows as Queued or Running, click The refresh button at the top of the Monitor tab to refresh the view until the job finishes.
  10. To view the statuses of dataflows only, click the Dataflows subtab. To view transformation-level details, click the arrow to the left of the dataflow name. The dataflow icon shows the status of the dataflow. If all’s well, it’s a checkmark and you see Successful when you hover over it. Screenshot of a dataflow with a status of successful in the Monitor tab
  11. When the dataflow has finished running, click the Data tab on the left of Data Manager.
  12. On the right of the new US Sales dataset, click The menu button on the right of a dataset on the Data tab and select Explore.
  13. Under Bars, click the Add a Group button (+) button and select AccountId.BillingCountry. Notice that only USA opportunities are in the dataset.
  14. Close the lens without saving.

Schedule the Data Sync and the Dataflow

As you can see, your dataflow already has a lot to do: extracting data, augmenting, and filtering. You can lighten this load by extracting your Salesforce data in a separate job, called a data sync, which you can schedule to run before your dataflow. By scheduling data sync ahead of time, your dataflow has less to do and runs faster. To further lighten the load, Analytics syncs Salesforce data incrementally by default, meaning that only data that’s changed gets synced. Let’s schedule the local connection’s data sync and the dataflow to run one after the other.

  1. In the left-hand pane, click Data Manager.
  2. Click the Connect tab.
  3. To the right of the SFDC_Local connection, click The menu button on the right of a connection on the Connect tab and select Schedule.
  4. Schedule the connection to run every 24 hours at 12:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 12:00 am
    3. Run every: 24 Hours
    4. Select days: M, Tu, W, Th, F
  5. Click Save.

This schedule ensures that the connection syncs first each day. You can now schedule your US Sales Dataflow to run after the data sync.

  1. Click the Dataflows & Recipes tab.
  2. To the right of US Sales Dataflow, click The menu button on the right of a dataflow on the Dataflows & Recipes tab and select Schedule.
  3. Schedule the dataflow to run every 24 hours at 4:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 4:00 am
    3. Run every: 24 Hours
    4. Select days: M, Tu, W, Th, F
  4. Click Save.