Understand Descriptive Insights
- Navigate to a story’s descriptive insights and explore them.
- View insights showing how one variable explains variations in the outcome variable.
- View insights showing how pairs of variables (the interaction effect) explains variations in the outcome variable.
About Descriptive Insights
After you create a story, the first insights you see are descriptive insights. Descriptive insights are the primary insights in your story. They help you explore, at an overview level, what factors contributed to the outcome, based on a statistical analysis of your dataset and augmented by machine learning. Einstein Discovery uses bar charts to help you visualize descriptive insights.
Your Story’s Outcome Variable and Goal
When you configured the story, you told Einstein Discovery to maximize the variable CLV in the Acquired Account data. CLV is the outcome variable in your story, and maximizing CLV is your goal. All the insights in this story show you how different variables, and combinations of variables, explain variations in CLV. The top insights in the list reflect the most statistically significant variations in the outcome variable.
Examine Correlations of Explanatory Variables
The Variables Panel lists the explanatory variables in your story and shows their correlation to the outcome variable, expressed as a percentage.
The correlation percentage represents the relative strength of the statistical association between an explanatory variable and the outcome variable. The higher the correlation, the stronger the relationship, and therefore the more interesting for our investigation.
At first glance, what does this list reveal about Einstein Discovery's analysis of the AcquiredAccount data? It shows us that Division is the variable that explains the most variation in CLV (14.16%). Einstein Discovery did a statistical calculation to find the coefficient of determination, R2 (R-squared). R2 tells you how much Division explains variation of the outcome variable (CLV)—in other words, how much predictive power the Division variable has. More observations describe other factors that affect CLV.
A second set of variables - Type, Rating, Industry, and Account Score - have weaker correlations but may still be worth investigating. Variables with the lowest correlation - AccountScore, BillingState, StartDate, Ownership, and CloseDate - may not warrant further consideration at this time.
View a First-Order Analysis
Let's start by looking at the first insight in the list.
This type of insight, called a first-order analysis, examines how one variable (Division) explains variation in the outcome variable (CLV).
Let's look at the different parts of the insight.
The left side of the insight includes explanatory text.
Explanatory text includes:
- title of the insight: CLV by Division
- list of the most important observation summaries (associated with the blue bars in the graph) for which variations were statistically significant (above or below average).
Hovering over a hyperlink highlights the associated bar in the chart on the right.
Clicking a hyperlink drills down into a chart of the observation that shows the data filtered by your selection. Breadcrumbs above the insight show the filter for the insights list.
The bar graph on the right side provides a visualization of this insight:
In this graph:
- CLV is the vertical axis (outcome variable).
- Division is the horizontal axis (explanatory variable). It displays bars for each type (or category) of Division.
- The orange horizontal line in the chart shows the average CLV, which is 20135.72.
- Blue bars show categories that extend further above and below the average CLV, making them the most statistically interesting categories. Of these divisions, Raw Materials and Mapping are the most significantly above average, and Standard Hardware is the most significantly below average.
- Gray bars show variables that are close to the average CLV. These divisions are statistically less significant and, therefore, are not described in the explanatory text on the left. When considering gray bars, you can't assume meaning in the differences from the other categories.
Hover Over a Bar to See Details
Hover over a bar in the graph to see a pop-up details box. For example, if you hover over Raw Materials, you see:
- Total shows you the total CLV for Raw Materials.
- Standard Deviation gives you an idea of how much the items in the category differ from the average. A smaller standard deviation tells you that most of the numbers are close to the average. In the example above, the standard deviation of the raw materials category is 8,440. Here’s an image that shows you two curves with different standard deviations. The average is in the middle, at the peak. In the blue curve, notice that more of the values are closer to the average. It has a smaller standard deviation. In the red curve, the values are more spread out, and therefore it has a larger standard deviation.
- Count is the number of things (rows in the dataset,or observations) that are in the category. In this example, our Raw Materials division has 417 customers.
- Difference from Average shows you how much the category average (Raw materials) is above or below the Global Average. If the number is negative, then it's below the Global Average.
- CLV represents the average CLV for Raw materials, which is the sum of every value in the category, divided by the number of values (Count).
- Global Average is the average CLV for all categories within Division.
View a Second-Order Analysis
Let’s take a look into insights associated with Division. To filter the Insights List, click Division in the Variables panel.
Above the Insights List, the breadcrumbs show you how you’re currently filtering insights in the list.
This insight is a refinement of the first insight discussed previously, CLV by Division. It adds a second variable, When Type is Consulting, meaning that the combination of the two variables (CLV is Division and Type is Consulting) gives a strong signal. This type of insight, called a second-order analysis, shows how much the combined effect of this pair of variables explains variation in the outcome variable. This is also known as the interaction effect on the outcome.
Notice that what stands out first in the chart is the blue bar for Naval, which shows that Consulting is highest when Division is Naval.
The chart displays bars of data side-by-side for comparison purposes. For Naval, the blue bar represents Type is Consulting and the gray bar represents all other types.
Looking over at the explanatory text. As expected, the first—and therefore most significant—observation in the explanatory text is Naval is 6,780 higher. This result may have been worsened by Type is Customer.
Scroll down to the next insight, When Industry is Retail, Division: Standard Hardware and Standard Materials do better. It is also a second-order analysis chart, and it looks at the two-variable combination, Division and Industry is Retail. This insight is another statistically relevant pattern related to Division.
There are two bars for each division. The bar on the left represents the division's average value when only the retail industry is included. The bar on the right represents the average value for the division when all industries except retail are included. Comparing these bars lets you see how differently this pairing behaves.
The reason that Einstein Discovery flags this insight is that this particular industry, Retail, behaves differently from the rest of the population, with regard to Division. In this case, each bar refers to a Division when Industry is Retail. When we compare each division against the rest of the population, we compare that division in the retail industry versus that division in all other industries. If those two groups are statistically different, the bar is highlighted in blue.
In the graph, hover over the Standard Hardware blue bar.
As before, this box displays information about the Total, Average, Standard Deviation, Count, and Difference from Average for that category. In addition, Difference From Average for Other Buckets is shown to be 2,450. Why? Because that is the difference between Standard Hardware when Industry is Retail and Standard Hardware in all other industries.
In the Variables panel, click Type. The CLV by Type insight reflects another first-order analysis (single variable).
After Division, Type is the next most explanatory single variable, statistically speaking. In other words, Type is the second-strongest, first-order term.