Create a Model
Learning Objectives
After completing this unit, you’ll be able to:
- Explain what an Einstein Discovery model is and why you’d use it.
- Explain key elements in an Einstein Discovery model.
- Configure and create a model in Einstein Discovery.
What Is a Model?
A model is a sophisticated custom equation based on a comprehensive statistical understanding of past outcomes that's used to predict future outcomes. An Einstein Discovery model is a collection of performance metrics, settings, predictions, and data insights. Einstein Discovery walks you through the steps to create a model based on the outcome you want to improve (your model's goal), the data you’ve assembled for that purpose (in the CRM Analytics dataset), and other settings that tell Einstein Discovery how to conduct the analysis and communicate its results.
Create a Model
Here’s how to create a predictive model using the CRM Analytics dataset you prepared in the previous unit.
- If you're still viewing the dataset you loaded in the previous unit, click Create Model and go to step 4. Otherwise, on the Analytics Studio home page, click Create and select Model.
- On the New Model screen, click Create from Dataset and click Continue.
- Select the opportunity_history dataset you created in the previous unit, then click Next.
- In the Create Model screen, specify your goal. The goal defines the outcome you want to analyze and train the model to predict. Specify whether you want to maximize or minimize the outcome result.
In this module, your goal is to maximize opportunity wins. For I Want to Predict, select IsWon and, next to Maximize, change IsWon: TRUE. Accept all other the default settings and click Next.
- In the Configure Model Columns screen, accept the default (Automated), and click Create Model.
Einstein begins analyzing the data using statistical analysis, machine learning algorithms, and AI to build the predictive model.
When finished, Einstein displays the performance overview of the model.
Investigate Data Quality Alerts
During analysis and training, Einstein Discovery examines your data for quality issues, such as duplicate impact (referred to as the multicollinearity data alert), potential bias, frequently missing values, or several other data quality issues. Any time a potential data quality issue is detected, Einstein notifies you with a data alert. To learn more about data alerts, check out Handle Quality Alerts.
In the model performance overview, view Assess Deployment Readiness and click the View All Alerts button to examine all alerts for your model.
The Data Alerts panel shows you each occurrence and gives you the option to either take action or ignore the alert.
In our model, Einstein detected multicollinearity in the data. The problem is that two or more variables (Amount and Lead Source) are highly correlated to each other which could have a duplicate impact on the outcome. For this module, go ahead and select Ignore Alert for Amount and Lead Source.
Einstein also detected an opportunity to categorize Amount values by different ranges (buckets). Under Suggested Buckets, select Apply the new buckets.
Create a Model Version
Click Next. Einstein Discovery prompts you to describe the new model version.
Every time you make changes to the model, you need to re-run the analysis and retrain the model by creating a new version. In our situation, a new version is required because Einstein needs to analyze your data again using the latest settings. Earlier in this module, you learned that developing an Einstein Discovery solution is an iterative process. Model versions help you keep track of each iteration.
In the box, type Ignore multicollinearity alerts and apply buckets to Amount
, and then click Train Model.
Einstein re-analyzes and re-trains the model in a new version. When the new version is complete, you'll see the model performance overview again. In the new version, you see the new version number and there's no longer alerts to be reviewed.
Edit Model Settings
To get started customizing your model, click Settings.
Now you can examine your model settings.
Dataset Details
You see the number of rows and columns in your dataset (1). In Einstein Discovery, we refer to each row in the dataset as an observation, and to each column as a variable.
Variables Table
The variables table (2) shows you the variables in the model.
- The first variable (IsWon) is your outcome variable—the business outcome you’re trying to improve. Your goal is to maximize IsWon.
- Next are the explanatory variables—these are variables that you explore to determine whether, and to what degree, they influence the outcome variable for the model.
- The importance is the relative influence of a variable on the model's predicted outcome. Importance indicates how much the model chooses to use a variable when predicting the outcome. The level of importance is quantified as a percentage. The higher the percentage, the greater the impact. Importance is an advanced metric that considers interactions between variables. If two variables are highly correlated and contain similar information, the model chooses the better variable to use.
- Use the column dropdown to show correlation instead of importance. The correlation is simply the statistical association—or “co-relationship”—between explanatory variables and an outcome variable. The strength of the correlation is quantified as a percentage. The higher the percentage, the stronger the correlation. Keep in mind that correlation is not causation. Correlation merely describes the strength of association between variables, not whether they causally affect each other. Think of correlation as a measure of how well this field on its own would be able to predict the outcome.
- Data alerts appear if Einstein Discovery detected possible issues in your data that warrant special attention.
General Settings
In the right panel, General Settings (3) show you the dataset the model uses. You can also see - and change - the validation and algorithm of the model.
Edit Variable Settings
In the variables table, click Industry. In the right panel, configure settings for the selected variable.
- Select Analyze for bias (1) if you suspect that the variable might indicate bias, which activates Einstein Discovery’s bias detection capabilities. To learn more, check out Ethical Model Development with Einstein Discovery: Quick Look.
- Select an available Transform option (2) if you want to transform values for this variable during analysis. Options include fuzzy matching, detect sentiment, text clustering, and replace missing values. Transformations change the data for the model only—the values in the dataset are unchanged. For example, fuzzy matching fixes slight typographic variations in text values (such as spelling or typos), so the model can make more accurate categorizations and better predictions.
-
Include Only (3) shows you the values associated with the variable, starting with the most frequent value. If you clear the checkbox next to a value, Einstein can either omit the value from analysis, or it can merge the value into the “Other” group.
- The Histogram (4) shows you how frequently values occur in the dataset.
What Else Can You Do with Models?
In addition to everything you just learned about what to do with models, you can also:
- See version history, and go to another version.
- Bookmark insight charts.
- Compare to another model.
- View and copy R Code.
- Rename a model.
- Change the app the model is saved in.
- Delete a model.
What’s Next?
In this unit, you created a model, resolved data alerts, and created a new version of a model. In the next unit, you evaluate the model.
Resources