Understand What Happened Insights

Learning Objectives

After completing this unit, you’ll be able to:
  • Navigate to a story’s What Happened insights and explore them.
  • View insights showing how one variable explains variations in the outcome variable.
  • View insights showing how pairs of variables (the interaction effect) explains variations in the outcome variable.

About What Happened Insights



The instructions in this unit assume that you have successfully created an Einstein Discovery story. Refer to the instructions in "Use Stories to Get the Big Picture," the first unit in this Trailhead module.

After you create a story, the first insights you see are What Happened insights. What Happened insights are the primary insights in your story. They are descriptive insights that help you explore, at an overview level, what factors contributed to the outcome, based on a statistical analysis of your dataset and augmented by AI and machine learning. Einstein Discovery uses bar charts to help you visualize What Happened insights.

Your Story’s Outcome Variable and Goal

When you configured the story, you told Einstein Discovery to maximize the variable CLV in the AcquiredAccount data. CLV is the outcome variable in your story, and maximizing CLV is your goal. All the insights in this story show you how different variables, and combinations of variables, explain variations in CLV. The top insights in the list reflect the most statistically significant variations in the outcome variable.




For each category in the Einstein Analytics dataset, Einstein Discovery performs a statistical calculation called a t-test to find out whether the category is statistically significant. The t-test helps to identify categories that exhibit patterns that are statistically different from the other categories. For example, for the category called Naval, the first step is to split the data into two groups: Naval and not Naval. The second step is to use the t-test to determine whether these two groups are statistically different.

View a First-Order Analysis

Let's start by looking at the first insight in the list.

CLV by Division insight



Don’t worry if the images here differ slightly from the screens you see in Einstein Discovery. The interface elements are usually the same, but some of the details—including the data they show—can differ slightly.

What does Einstein Discovery's analysis of the AcquiredAccount data reveal? That Division is the variable that explains the most variation in CLV. This type of insight, called a first-order analysis, examines how one variable (Division) explains variation in the outcome variable (CLV).

Let's look at the different parts of the insight.

Explanatory Text

The left side of the insight includes explanatory text.

Explanation text for an insight.

Explanatory text includes:

  • title of the insight: CLV by Division
  • summary of the insight: Division explains 14.2% of the variation in CLV.
  • list of the most important observation summaries (associated with the blue bars in the graph) for which variations were statistically significant (above or below average).

Hovering over a hyperlink highlights the associated bar in the chart on the right.

Clicking a hyperlink drills down into a chart of the observation that shows the data filtered by your selection.

Graph of filtered data for an insight.



Click the x to remove the filter and return to the previous screen.

The key takeaway from this insight is that Division explains 14.2% of the variation in CLV. Einstein Discovery did a statistical calculation to find the coefficient of determination, R 2 (R-squared). R 2 tells you how much Division explains variation of the outcome variable (CLV)—in other words, how much predictive power the Division variable has. More observations describe other factors that affect CLV.


The image on the right side of the insight is a bar graph:

Graph for an insight.

In this graph:

  • CLV is the vertical axis (outcome).
  • Division is the horizontal axis. It displays bars for each type (or category) of Division.
  • The orange horizontal line in the chart shows the average CLV, which is 20135.72.
  • Blue bars show variables that extend further above and below the average CLV—the most interesting correlations. Of these divisions, Raw Materials and Mapping are the most significantly above average, and Standard Hardware is the most significantly below average.
  • Gray bars show variables that are close to the average CLV. These divisions are statistically less significant and, therefore, are not listed in the explanatory text on the left. When considering gray bars, you can't assume meaning in the differences from the other categories.

Hover Over a Bar to See Details

Hover over a bar in the graph to see a pop-up details box. For example, if you hover over Raw Materials, you see:

Hover over a bar in the graph to see details.

The popup shows you the underlying statistical details where Division is Raw Materials:
  • Total shows you the total CLV for Raw Materials.
  • Average is the sum of every value in the category, divided by the number of values (Count).
  • Standard Deviation gives you an idea of how much the items in the category differ from the average. A smaller standard deviation tells you that most of the numbers are close to the average. In the example above, the standard deviation of the raw materials category is 8,440. Here’s an image that shows you two curves with different standard deviations. The average is in the middle, at the peak. In the blue curve, notice that more of the values are closer to the average. It has a smaller standard deviation. In the red curve, the values are more spread out, and therefore it has a larger standard deviation. Two curves illustrating different standard deviations
  • Count is the number of things (rows in the dataset,or observations) that are in the category. In this example, our Raw Materials division has 417 customers.
  • Difference from Average shows you how far above or below the average the category is. If the number is negative, it's below average.

Let’s explore the next insight in the list.

View a Second-Order Analysis

In the Insights Navigation Bar, click Type and Division from the dropdown.

Scroll down to the insight with the title CLV by Division when Type is Consulting.

CLV by Division when Type is Consulting insight

This insight is a refinement of the first insight discussed previously, CLV by Division. It adds a second variable, When Type is Consulting, meaning that the combination of the two variables (CLV is Division and Type is Consulting) gives a strong signal. This type of insight, called a second-order analysis, shows how much the combined effect of this pair of variables explains variation in the outcome variable. This is also known as the interaction effect on the outcome.

Notice that what stands out first in the chart is the blue bar for Naval, which shows that Consulting is highest when Division is Naval.

CLV by Division when Type is Consulting insight - chart

The chart displays bars of data side-by-side for comparison purposes. For Naval, the blue bar represents Type is Consulting and the gray bar represents all other types.

Looking over at the explanatory text. As expected, the first—and therefore most significant—observation in the explanatory text is Naval is 6,780 higher. This result may have been worsened by Type is Customer.

Scroll down to the next insight, CLV by Division when Industry is Retail. It is also a second-order analysis chart, and it looks at the two-variable combination, Division and Industry is Retail. This insight is another statistically relevant pattern related to Division.

CLV by Division when Industry is Retail insight

There are two bars for each division. The bar on the left represents the division's average value when only the retail industry is included. The bar on the right represents the average value for the division when all industries except retail are included. Comparing these bars lets you see how differently this pairing behaves.

The reason that Einstein Discovery flags this insight is that this particular industry, Retail, behaves differently from the rest of the population, with regard to Division. In this case, each bar refers to a Division when Industry is Retail. When we compare each division against the rest of the population, we compare that division in the retail industry versus that division in all other industries. If those two groups are statistically different, the bar is highlighted in blue.

In the graph, hover over the Standard Hardware blue bar.

Statistics for CLV by Division when Industry is Retail - Standard Hardware insight

As before, this box displays information about the Total, Average, Standard Deviation, Count, and Difference from Average for that category. In addition, Difference From Average for Other Buckets is shown to be 2,450. Why? Because that is the difference between Standard Hardware when Industry is Retail and Standard Hardware in all other industries.

Click the next to Division in the Insights Navigation Bar to clear this filter and show CLV by Type, another important insight reflecting a first-order analysis (single variable).

CLV by Type insight

After Division, Type is the next most explanatory single variable, statistically speaking. In other words, Type is the second-strongest, first-order term. Click the next to Type to clear the filter and show all the insights again.

Keep learning for
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities