Understand What Happened Insights
- Navigate to a story’s What Happened insights and explore them.
- View insights showing how one variable explains variations in the outcome variable.
- View insights showing how a two-variable combination explains variations in the outcome variable.
After you create a story, the first insights you see are What Happened insights. These are the primary insights in your story. They are descriptive insights that help you explore, at an overview level, what factors contributed to the outcome, based on a statistical analysis of your dataset. Einstein Discovery uses bar charts to help you visualize What Happened insights.
Your Story’s Outcome Variable and Goal
When you configured the story, you told Einstein Discovery to maximize the variable CLV in the AcquiredAccount data. CLV is the outcome variable in your story, and maximizing CLV is your goal. All the insights in this story show you how different variables and combinations of variables explain variations in CLV. The top insights in the list reflect the most statistically significant variations in the outcome variable.
For each category in the Einstein Analytics dataset, Einstein Discovery performs a statistical calculation called a t-test to find out whether the category is statistically significant. The t-test helps to identify categories that exhibit patterns that are statistically different from the other categories. For example, for the category called Naval, the first step is to split the data into two groups: Naval and not Naval. The second step is to use the t-test to determine whether these two groups are statistically different.
Let's start by looking at the first insight in the list.
According to Einstein Discovery's statistical analysis of the AcquiredAccount data in the Einstein Analytics dataset, Division is the variable that explains the most variation in CLV. This type of insight, called a first-order analysis, examines how one variable (Division) explains variation in the outcome variable (CLV).
Let's take a look at the different parts of the insight.
The left side of the insight includes explanatory text.
Explanatory text includes:
- title of the insight: CLV by Division
- summary of the insight: Division explains 14.2% of the variation in CLV.
- list of the most important observation summaries (associated with the blue bars in the graph) for which variations were statistically significant (above or below average).
Hovering over a hyperlink highlights the associated bar in the chart on the right.
Clicking a hyperlink drills down into a chart of the observation that shows the data filtered by your selection.
The key takeaway from this insight is that Division explains 14.2% of the variation in CLV. Einstein Discovery did a statistical calculation to find the coefficient of determination, R2 (R squared). R2 tells you how much Division explains variation of the outcome variable (CLV)—in other words, how much predictive power the Division variable has. Additional observations describe other factors that affect CLV.
The image on the right side of the insight is a bar graph:
In this graph:
- CLV is the vertical axis and Division is the horizontal axis.
- The orange horizontal line in the chart shows the average CLV, which is just above 20K.
- Blue bars show variables that extend further above and below the average CLV. These are the most interesting correlations. Of these divisions, Raw Materials and Mapping are the most significantly above average, and Standard Hardware is the most significantly below average.
- Gray bars show variables that are close to the average CLV. These divisions are statistically less significant and, therefore, are not listed in the explanatory text on the left. When considering gray bars, you can't draw the conclusion that the differences from the other categories is meaningful.
Hover Over a Bar to See Details
Hover over a bar in the graph to see a pop-up details box. For example, if you hover over Raw Materials, you see:
Notice how, when a blue bar is selected, the corresponding explanatory text on the left is highlighted.
- Difference from Overall shows you how far above or below the average the category is. If the number is negative, it's below average.
- Totalshows you the total CLV for Raw Materials.
- Average is the sum of every value in the category, divided by the number of values (Count).
Standard Deviation gives you an idea of how much the items in the
category differ from the average. A smaller standard deviation tells you that most of the
numbers are close to the average. In the example above, the standard deviation of the raw
materials category is 8,440.
Here’s an image that shows you two curves with different standard deviations. The average is in the middle, at the peak. In the blue curve, notice that more of the values are closer to the average. It has a smaller standard deviation. In the red curve, the values are more spread out, and therefore it has a larger standard deviation.
- Count is the number of things that are in the category. In this example, our Raw Materials division has 417 customers.
Scroll down to the insight with the title CLV by Division when Type is Consulting.
This is a refinement of the first insight discussed previously, CLV by Division. It adds a second variable, When Type is Consulting, meaning that the combination of the two variables (CLV is Division and Type is Consulting) gives a strong signal. This type of insight, called a second-order drill-down, compares how much multiple variables explain variation in the outcome variable.
Notice that what stands out first in the chart is the blue bar above Naval, which shows that Consulting is highest when Division is Naval.
The chart displays bars of data side-by-side for comparison purposes. For Naval, the blue bar represents Type is Consulting and the gray bar represents all other types.
Looking over at the explanatory text. As expected, the first—and therefore most significant—observation in the explanatory text is Naval is 6,780 higher. This result may have been worsened by Type is Customer.
Scroll down to the next insight, CLV by Division when Industry is Retail. It is also a second-order drill-down chart, and it looks at the two-variable combination, Division and Industry is Retail. This insight is another statistically relevant pattern related to Division.
There are two bars for each division. The bar on the left represents the division's average value when only the retail industry is included. The bar on the right represents the average value for the division when all industries except retail are included. Comparing these bars lets you see how differently this pairing behaves.
The reason that Einstein Discovery flags this insight is that this particular industry, Retail, behaves differently from the rest of the population, with regard to Division. In this case, each bar refers to a Division when Industry is Retail. When we compare each division against the rest of the population, we compare that division in the retail industry versus that division in all other industries. If those two groups are statistically different, the bar is highlighted in blue.
In the graph, hover over the Standard Hardware blue bar.
As before, this box displays information about the Difference from Overall, Total, Average, Standard Deviation, and Count for that category. In addition, Difference From Usual is shown. In this example, Difference From Usual for Standard Hardware when Industry is Retail is 2,450. Why? Because that is the difference between Standard Hardware when Industry is Retail and Standard Hardware in all other industries.
Now, scroll down through the insights until you see CLV by Type, another important first-order analysis.
After Division, Type is the next most explanatory single variable, statistically speaking. In other words, Type is the second-strongest first-order term.