Start tracking your progress
Trailhead Home
Trailhead Home

Prepare Your Data for Case Classification

Learning Objectives

After completing this unit, you’ll be able to:

  • Prepare your data for case classification.
  • Identify considerations for case classification.

Gather and Review Closed Case Data

To accurately predict field values on new cases, someone at Ursa Major Solar needs to verify that field values on closed cases are accurate. Reviewing closed cases for accuracy can take time. A lot of time. But the more time spent auditing field values on closed cases, the more accurate field predictions will be. 

As an admin—not a support agent or manager—Maria has to ask for help. She needs a service expert to verify and confirm that Ursa Major Solar’s closed cases contain clean data. She turns to Ryan De Lyon, a friendly customer service manager located in the Phoenix office.

Maria and Ryan De Lyon chatting in the office.
Ryan is happy to help. Auditing existing closed case data is an investment in his service team’s productivity—the more effort he puts in, the better the predictions. With Maria’s guidance, he lays out the audit process.
  1. Identify the most useful case fields to predict. Exclude fields that change over the life of a case, such as Case Status . For now, Ryan chooses Priority and Case Reason, because predicting their values saves agents time and doesn’t require their attention.
  2. Exclude from the audit any closed cases that don’t have Priority and Case Reason filled out.
  3. Export 1,000 closed cases from Salesforce to a spreadsheet or CSV file for a faster review of data. He can review as few as 400 closed cases in Salesforce, but fewer than 400 closed cases can cause the predictive model to fail.
  4. Review the following fields on the exported cases: Subject, Description, Priority, Case Reason, and any field that will be used in a filter to narrow the scope of the predictive model—for example, Type .

As Ryan audits closed cases, he also looks for data design issues to fix.

Verify That You Have the Right Data

Because the accuracy of closed case data is crucial to building an effective predictive model, it’s important to make sure that the fields you want to predict are populated and correct. Even when they’re based off of a high volume of data, predictions might be inaccurate if the data contains incomplete or incorrect values.

Since Ursa Major Solar is training a predictive model from the text of cases, that text must have some of the right information—words or phrases—to classify cases correctly. Ryan has a few more things to keep an eye out for during the audit.

  • Adjust past data so that the model is built with the best, most correct data.
  • Make sure that both customers and support agents include adequate information—or use language that is unique enough—on cases to help classify them. If humans can’t classify cases, neither can Einstein Case Classification.
  • If you have fields with similar names or field values, consider unifying them into one field or value for clarity. For example, if values for Case Reason include Return, Return Issue, Return Slip, and Return Tracking, combine them. If support agents have trouble determining the correct field or value, the predictive model will have trouble determining the best recommendation.
  • If you have overloaded fields, consider separating them into distinct fields. Change a catch-all field into a more specific set of fields describing the values in more detail. For example, change a value like Returns to Defect Returns, Gift Returns, or Refund Returns.

Get the Best Results

Along with auditing closed case data and verifying that you have the right data, there are a few more things to keep in mind to get the best results from the predictive model.

Case Volume

To accurately predict values on case fields, Einstein needs lots of closed cases to learn from. You must meet the following minimum data requirements, but the more closed cases you have, the better!

  • To build a predictive model, Einstein needs at least 400 closed cases*, but 10,000 or more is ideal. If you add criteria to limit which cases Einstein learns from, Einstein just counts the number of closed cases that match your criteria.
  • To predict a field’s value, Einstein needs at least 400 closed cases* with a value in that field.

* Here, “closed cases” means all closed cases that were created in the past six months and include a subject or description. Encrypted fields aren’t supported, so the subject and description must be unencrypted.

Ethics and Artificial Intelligence

Ask yourself these questions before you enable Einstein Case Classification:

  • Are you aware of any dataset biases? When data is incorrectly labeled or categorized or oversimplified, it results in measurement bias. Measurement bias can be introduced when a person makes a mistake labeling data, or through machine error. A characteristic, factor, or group can be over or underrepresented in your dataset.
  • Are you including diverse participants in the design process of the dataset? Before someone starts building a given system, they often make assumptions about what they should build, who they should build for, and how it should work, including what kind of data to collect from whom. This doesn’t mean that the creators of a system have bad intentions, but as humans, we can’t always understand everyone else’s experiences or predict how a given system will impact others.

As Ryan begins reviewing and adjusting closed case data to help build the best predictive model possible, Maria gets ready to implement Einstein Case Classification for Ursa Major Solar.