Build Your CRM Analytics Dataset
After completing this unit, you’ll be able to:
- Explain the importance of preparing data for Einstein Discovery to analyze.
- Create a CRM Analytics dataset and populate it with sample data.
Prepare Data for Einstein Discovery
After you’ve selected the business outcome to improve, you need to gather and prepare the data for Einstein Discovery to analyze. So what’s relevant to the outcome, and what’s not?
To answer this question, it’s common for data scientists to employ their considerable expertise and business knowledge to research, experiment, and investigate data. Their upstream investment pays off with an optimized set of high-quality data to use for thorough analysis and model training.
Ideally, your dataset:
- Includes all the relevant factors associated with the business outcome you want to investigate and improve
- Omits extraneous columns that add complexity but no analytical value
- Contains high-quality data that is representative of the operational reality of the outcome you focus on
Einstein Discovery and the CRM Analytics data platform expedite this process for you by providing a suite of no-code, automated tools that you can leverage to do all of the heavy lifting for you. The CRM Analytics data platform provides you with a suite of data engineering tools and mechanisms to help you:
- Extract data from many different data sources.
- Load the data into CRM Analytics datasets that you design.
- Transform the data to maximize quality and readiness for analysis.
Einstein Discovery can analyze millions of rows and many columns of data. And it can help you select which columns have the strongest association with the outcome you want to improve.
In this module, we make it simple by providing you with a downloadable CSV file of sample opportunity history data that you ingest to create and populate a CRM Analytics dataset. That way, you can quickly start using Einstein Discovery to analyze this data, deploy a model, and get predictions and improvements.
Try Einstein Discovery with a Developer Edition Org
If you’d like to work through the steps in this Trailhead module, sign up for a free CRM Analytics Developer Edition org. This org is a safe environment where you can practice the skills you’re learning.
Note: For this trail, you can’t use an existing Developer Edition org. Instead, sign up for this special Developer Edition org, because:
- It comes provisioned with the CRM Analytics Plus license required for Einstein Discovery.
- It has the CRM Analytics Plus permission set required to access Einstein Discovery features. This includes the Manage Connected App permission needed to create a connected app for authentication of the REST client requests.
Even if you already have a CRM Analytics Developer Edition org, sign up for a new one now. The older CRM Analytics Developer Edition orgs don’t get recently released features. Signing up for a new one ensures that you get the latest and greatest capabilities.
Let’s get you set up so you can log in and get started.
- Go to trailhead.salesforce.com/promo/orgs/analytics-de.
- Fill out the form using an active email address.
- After you fill out the form, click Sign me up. A confirmation message appears.
- When you receive the activation email, open it and click the link.
- Complete your registration, and set your password and challenge question.
- Click Save. You are logged in to your CRM Analytics Developer Edition org and redirected to the Setup page.
Way to go! You now have a Salesforce org! Let's jump right in.
Note: You need your credentials later in the module. Be sure to save them somewhere safe so you can retrieve them when you need to.
Download the Sample Data
We’ve prepared a file with sample training data for opportunity history. Download the CSV file called opportunity_history.csv and save it to your computer.
Create and Populate a CRM Analytics Dataset
The next step is to get the data from the CSV file into a CRM Analytics dataset.
Note:For the best experience, make sure your browser is allowing pop-ups.
- If you’re not already logged in, log in to the Developer Edition org that you just signed up for.
- From the App Launcher find and select Analytics Studio.
- On the Analytics Studio home tab, click Create, select Dataset, then select CSV File.
- In the file-selection window that opens, find the CSV file you downloaded, opportunity_history.csv, select it, and then click Next.
- In the Dataset Name field, change the default name (opportunity_history), if you want. By default, Analytics Studio uses the file name as the dataset name. The name cannot exceed 80 characters.
- Select the app where you want to create the dataset. By default, Analytics Studio selects My Private App.
- Click Next. The Edit Field Attributes screen appears. Here, you can preview the data, and view or edit the attributes for each field.
- For now, accept the defaults, and click Upload File. Analytics Studio uploads the data, prepares and creates the dataset, and shows you progress as it happens.
When finished, you see details about the dataset you created. If you don't see the dataset details, take a look at your Datasets or search for opportunity_history in Analytics Studio.
If you don't see the dataset details, take a look at your Datasets or search for opportunity_history in Analytics Studio. On the dataset row, select Edit from the dropdown.
Considerations About the Sample Data
This sample training data has been simplified so that you can focus on learning how to use Einstein Discovery. When using this sample data, keep these things in mind.
- Our example CSV file contains a small number of columns. In practice, your use cases may involve more columns of training data.
- Our example CSV file contains about 7,000 rows of data. In general, the more rows of data you have to analyze, the better the results. Einstein Discovery needs at least 400 rows with outcome values to build a model.
- When training models, Einstein ignores rows that don’t have outcome values. Powered by AI and machine learning, you can analyze up to 20 million rows of data with Einstein Discovery!
- The sample data is modeled on opportunities. In practice, your use cases may involve data in multiple Salesforce objects, data that is external to Salesforce, or a combination of both.
- The model built from this sample data is for our Trailhead module purposes to show the basics. It’s designed to get you up and running quickly. However, the model produced by this sample data is not highly accurate or exemplary of the quality models you ultimately deploy into production. The performance of your model depends on the quality of your training dataset. To learn more, see Prepare Data for Analysis in Salesforce Help.
Now that you’ve got a CRM Analytics dataset, let's use it to create a model.