Understand What Happened Insights
Learning Objectives
- Navigate to a story’s What Happened insights and explore them.
- View insights showing how one variable explains variations in the outcome variable.
- View insights showing how pairs of variables (the interaction effect) explains variations in the outcome variable.
About What Happened Insights
After you create a story, the first insights you see are What Happened insights. What Happened insights are the primary insights in your story. They are descriptive insights that help you explore, at an overview level, what factors contributed to the outcome, based on a statistical analysis of your dataset and augmented by AI and machine learning. Einstein Discovery uses bar charts to help you visualize What Happened insights.
Your Story’s Outcome Variable and Goal
When you configured the story, you told Einstein Discovery to maximize the variable CLV in the AcquiredAccount data. CLV is the outcome variable in your story, and maximizing CLV is your goal. All the insights in this story show you how different variables, and combinations of variables, explain variations in CLV. The top insights in the list reflect the most statistically significant variations in the outcome variable.
View a First-Order Analysis
Let's start by looking at the first insight in the list.
What does Einstein Discovery's analysis of the AcquiredAccount data reveal? That Division is the variable that explains the most variation in CLV. This type of insight, called a first-order analysis, examines how one variable (Division) explains variation in the outcome variable (CLV).
Let's look at the different parts of the insight.
Explanatory Text
The left side of the insight includes explanatory text.
Explanatory text includes:
- title of the insight: CLV by Division
- summary of the insight: Division explains 14.2% of the variation in CLV.
- list of the most important observation summaries (associated with the blue bars in the graph) for which variations were statistically significant (above or below average).
Hovering over a hyperlink highlights the associated bar in the chart on the right.
Clicking a hyperlink drills down into a chart of the observation that shows the data filtered by your selection.
The key takeaway from this insight is that Division explains 14.2% of the variation in CLV. Einstein Discovery did a statistical calculation to find the coefficient of determination, R ^{2} (R-squared). R ^{2} tells you how much Division explains variation of the outcome variable (CLV)—in other words, how much predictive power the Division variable has. More observations describe other factors that affect CLV.
Graph
The image on the right side of the insight is a bar graph:
In this graph:
- CLV is the vertical axis (outcome).
- Division is the horizontal axis. It displays bars for each type (or category) of Division.
- The orange horizontal line in the chart shows the average CLV, which is 20135.72.
- Blue bars show variables that extend further above and below the average CLV—the most interesting correlations. Of these divisions, Raw Materials and Mapping are the most significantly above average, and Standard Hardware is the most significantly below average.
- Gray bars show variables that are close to the average CLV. These divisions are statistically less significant and, therefore, are not listed in the explanatory text on the left. When considering gray bars, you can't assume meaning in the differences from the other categories.
Hover Over a Bar to See Details
Hover over a bar in the graph to see a pop-up details box. For example, if you hover over Raw Materials, you see:
- Total shows you the total CLV for Raw Materials.
- Average is the sum of every value in the category, divided by the number of values (Count).
- Standard Deviation gives you an idea of how much the items in the category differ from the average. A smaller standard deviation tells you that most of the numbers are close to the average. In the example above, the standard deviation of the raw materials category is 8,440. Here’s an image that shows you two curves with different standard deviations. The average is in the middle, at the peak. In the blue curve, notice that more of the values are closer to the average. It has a smaller standard deviation. In the red curve, the values are more spread out, and therefore it has a larger standard deviation.
- Count is the number of things (rows in the dataset,or observations) that are in the category. In this example, our Raw Materials division has 417 customers.
- Difference from Average shows you how far above or below the average the category is. If the number is negative, it's below average.
View a Second-Order Analysis
In the Insights Navigation Bar, click Type and Division from the dropdown.
This insight is a refinement of the first insight discussed previously, CLV by Division. It adds a second variable, When Type is Consulting, meaning that the combination of the two variables (CLV is Division and Type is Consulting) gives a strong signal. This type of insight, called a second-order analysis, shows how much the combined effect of this pair of variables explains variation in the outcome variable. This is also known as the interaction effect on the outcome.
Notice that what stands out first in the chart is the blue bar for Naval, which shows that Consulting is highest when Division is Naval.
The chart displays bars of data side-by-side for comparison purposes. For Naval, the blue bar represents Type is Consulting and the gray bar represents all other types.
Looking over at the explanatory text. As expected, the first—and therefore most significant—observation in the explanatory text is Naval is 6,780 higher. This result may have been worsened by Type is Customer.
Scroll down to the next insight, CLV by Division when Industry is Retail. It is also a second-order analysis chart, and it looks at the two-variable combination, Division and Industry is Retail. This insight is another statistically relevant pattern related to Division.
There are two bars for each division. The bar on the left represents the division's average value when only the retail industry is included. The bar on the right represents the average value for the division when all industries except retail are included. Comparing these bars lets you see how differently this pairing behaves.
The reason that Einstein Discovery flags this insight is that this particular industry, Retail, behaves differently from the rest of the population, with regard to Division. In this case, each bar refers to a Division when Industry is Retail. When we compare each division against the rest of the population, we compare that division in the retail industry versus that division in all other industries. If those two groups are statistically different, the bar is highlighted in blue.
In the graph, hover over the Standard Hardware blue bar.
As before, this box displays information about the Total, Average, Standard Deviation, Count, and Difference from Average for that category. In addition, Difference From Average for Other Buckets is shown to be 2,450. Why? Because that is the difference between Standard Hardware when Industry is Retail and Standard Hardware in all other industries.
Click the x next to Division in the Insights Navigation Bar to clear this filter and show CLV by Type, another important insight reflecting a first-order analysis (single variable).
After Division, Type is the next most explanatory single variable, statistically speaking. In other words, Type is the second-strongest, first-order term. Click the x next to Type to clear the filter and show all the insights again.