Start tracking your progress
Trailhead Home
Trailhead Home

Load Your Local Sales Data

Design the Dataflow

Your first task is to create a US Sales dataset for the opportunities and related account and owner data in your local Salesforce org. To extract local Salesforce data in Analytics, you use the dataflow. Your data includes non-US opportunities, so you add a filter in the dataflow to include only US opportunities.

Here’s the design of the dataflow.

Diagram showing the transformations used to extract and combine local opportunity, account, and user data in the dataflow

Create a Dataflow for US Sales Data

There’s already a default dataflow in your US org that’s used in other Trailhead modules and projects, but you want to start from scratch. You can’t delete this dataflow, but you can remove the existing nodes.

  1. Log in to your US org. This is the first org that you signed up for in step 1.
  2. In the top-left corner, click the App Launcher (App Launcher icon), and then find and select Analytics Studio.
  3. In the left-hand pane, click Data Manager.
  4. In the left-hand pane, click Dataflows & Recipes.
  5. Click the Default Salesforce Dataflow.
  6. Hover over the Extract_User node and click the trash icon that appears.
  7. Repeat step 6 to remove all the other nodes.

Now that you have an empty dataflow, you can start adding your nodes. The fastest way to add nodes to extract and augment local Salesforce data in the dataflow editor is with the dataset builder.

  1. At the top of the dataflow editor, click the dataset builder button (The dataset builder button in the toolbar at the top of the dataflow editor) .
  2. In the Dataset Name field, enter US Sales.
  3. Click Continue.
  4. In the Pick a Salesforce object to start search box, enter opp.
  5. In the object list, click Opportunity.
  6. Hover over the object and click the plus (+).Screenshot of the Opportunity object in the dataset builder with the plus icon highlighted
  7. Click to select these fields:
    1. Amount
    2. Close Date
    3. Created Date
    4. Lead Source
    5. Name
    6. Opportunity Type
    7. Stage
  8. At the top of the field list, click the Relationships tab.
  9. In front of the Account ID relationship, click Join.
  10. In front of the Owner ID relationship, click Join.
  11. Hover over the Account object and click the plus (+).
  12. Select these fields:
    1. Account Name
    2. Account Type
    3. Billing City
    4. Billing Country
    5. Billing State/Province
    6. Industry
  13. To hide the field list, to the right of the Account object, click The x button that you click to close the select fields dialog.
  14. Repeat steps 11–12 to select these fields from the User object:
    1. Full Name
    2. Title
  15. Click Next.

    Analytics creates the sfdcDigest, augment, and register nodes needed to build the dataset in your dataflow.

The dataflow extracts all opportunities from your US org. But you want only opportunities related to US accounts. So you need a filter. There are two places you can add a filter data in a dataflow: in an sfdcDigest node or with a separate filter node.



Wondering where to put a filter in your dataflow?

The benefit of adding a filter in the sfdcDigest node is that you’re reducing the amount of data that the rest of the dataflow has to process. However, if you’re using that data to create more than one dataset, you might be removing data that’s needed elsewhere. So adding a separate filter node further downstream might be the solution.

Another advantage to a separate filter node is that you can use it with data that has been combined from different sfdcDigest nodes. For example, here you want to filter the opportunities based on the account country, so you need a separate filter after the account data has been augmented with the opportunity data.

Decision made. A separate filter node it is.

  1. Click the filter button (The filter button in the toolbar at the top of the dataflow editor).
  2. Complete these fields:
    1. Node Name: filter_US_Accounts
    2. Source: augment_Opportunity_Account
    3. Use SAQL: Deselected
    4. Filter: AccountId.BillingCountry:EQ:USA


      To see a list of field names that you can use in the filter, click the Output Fields tab. You can copy field names here and paste them into your filter.

  1. Click Create.
  2. On the right of the filter node, click and hold the > and drag a path onto the > on the left of the augment_User node.
  3. Click Save.
  4. Click the augment_User node and in the Left Key field, select OwnerId.
  5. Click Save.
  6. Click Save without Propagating.
  7. Click Update Dataflow.
  8. Click Update Dataflow.
  9. Click Run Dataflow.
  10. Click Go to Data Monitor.

Monitor the Dataflow and Explore the New Dataset

  1. On the Monitor tab of the data manager, click the Dataflows subtab.
    The Dataflows subtab lets you see just your dataflow jobs.
  2. The dataflow icon shows the status of the dataflow. If all’s well, it’s a checkmark and you see “Successful” when you hover over it.
    Screenshot of a dataflow with a status of successful in the Monitor tab
  3. If the status shows as Queued or Running, click The refresh button at the top of the Monitor tab to refresh the view until the dataflow finishes.
  4. When the dataflow has finished running, click the Data tab on the left of the data manager.
  5. On the right of the new US Sales dataset, click The menu button on the right of a dataset on the Data tab and select Explore.
  6. Under Bars, click the Add a group (+) button and select AccountId.BillingCountry.
    Notice that only USA opportunities are in the dataset.
  7. Close the lens without saving.

Schedule the Data Sync and the Dataflow

As you can see, your dataflow already has a lot to do: extracting data, augmenting, and filtering. You can lighten this load by extracting your Salesforce data in a separate job, called a data sync, which you can schedule to run before your dataflow. By scheduling data sync ahead of time, your dataflow has less to do and runs faster. To further lighten the load, Analytics syncs Salesforce data incrementally by default, meaning that only data that’s changed gets synced. Let’s schedule the local connection’s data sync and the dataflow to run one after the other.

  1. In the left-hand pane, click Data Manager.
  2. Click the Connect tab.


    If you can’t see the Connect tab, enable it now.

    1. At the top of the Analytics Studio, click the gear icon and then click Setup.
    2. In the Quick Find box, enter analytics.
    3. In the results, under Analytics, click Settings.
    4. Select Enable Data Sync and Connections.
    5. Click Save.
    6. Close the current browser tab to return to the tab with the data manager. Refresh your browser if you don’t see the Connect tab.
    7. Click the Connect tab.
  3. On the right of the SFDC_Local connection, click The menu button on the right of a connection on the Connect tab and select Schedule.
  4. Schedule the connection to run every 24 hours at 12:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 12:00 am
    3. Run every: 24 Hours
    4. Select days: M, Tu, W, Th, F
  5. Click Save.

This schedule ensures that the connection syncs first each day. You can now schedule your default dataflow to run after the data sync.

  1. Click the Dataflows & Recipes tab.
  2. On the right of the Default Salesforce Dataflow, click The menu button on the right of a dataflow on the Dataflows & Recipes tab and select Schedule.
  3. Schedule the dataflow to run every 24 hours at 4:00 AM each weekday by selecting these settings.
    1. Schedule by: Hour
    2. Start at: 4:00 am
    3. Run every: 24 Hours
    4. Select days: M, Tu, W, Th, F
  4. Click Save.