Skip to main content
Register now for TDX! Join the must-attend event to experience what’s next and learn how to build it.

Understand Why an Agent Can Be Confidently Wrong

Learning Objectives

After completing this unit, you’ll be able to:

  • Distinguish between hallucinations, incorrect responses caused by unreliable data, and interaction or guardrail issues.
  • Explain why investigating the cause of an AI response is critical to improving agent reliability.
  • Use a simple framework to analyze what went wrong before choosing mitigation strategies.
Note

This module was produced in collaboration with Dreamin’ in Data, a nonprofit and part of the Datablazers Community. Learn about partner content on Trailhead.

Meet Luna Stone, data architect, and Charlotte Liu, Salesforce architect.

Luna Stone’s headshot. Luna Stone, Data Architect

Charlotte Liu’s headshot. Charlotte Liu, Salesforce Architect

Northern Trail Outfitters (NTO) created a customer journey map as part of its AI and data strategy. One use case stood out: using agentic AI to improve case deflection. Everyone is excited about the potential for faster 24/7 customer service, reduced costs, and agents focused on high-value work.

Executives also set a clear directive: Move quickly while ensuring the solution is trustworthy, scalable, and built on solid data.

So NTO launches a rapid proof‑of‑value project for a Case Deflection agent. Salesforce architect Charlotte Liu and data architect Luna Stone partner to run the pilot, where they need to ensure:

  • Business stakeholders feel confident that the use case will create value.
  • IT stakeholders can make informed technology investment and architectural decisions.
  • Finance stakeholders can validate the cost-benefit of larger investments.

Proof-of-Value Project Scope

Since NTO uses Salesforce, Charlotte limits the scope to the most impactful personas and data sources.

Persona

User Story

Head of Customer Service

As the head of customer service, I want to deliver the highest-quality service 24/7 by automatically answering the most frequent questions about orders and open support cases, so that customers receive fast, consistent responses, operational costs decrease, and human agents can focus on escalations and complex cases.

Consumer

As a consumer, I want to check my recent orders or case statuses, make updates as needed, or request help at any time, so I can resolve issues quickly with minimal effort, without having to repeat information.

Customer Service Rep

As a customer service rep, I want to address customer support needs quickly and need full visibility into their orders, cases, and interaction history so that I don’t have to search across systems or ask customers to repeat details.

Since this is an internal proof-of-value project, a select group of service reps seek to replicate their experience with the agent and share their feedback with Charlotte. To demonstrate results quickly and adjust based on pilot findings, Charlotte keeps the initial scope small. She uses available data in Salesforce CRM, with Agentforce powering the prompts, and defers long-term architecture decisions.

Pilot Findings

Early testing shows clear promise. The agent handles many routine customer requests quickly, helping customers check order status, review policies, and resolve simple support questions without waiting for a service representative.

However, as a group of pilot users and service reps begin testing the experience more deeply, they report several concerning responses. In some cases, the answers are incomplete or confusing. In others, the answers are simply wrong—especially when the agent interacts with long‑time customers whose history spans multiple systems.

At this point, Luna, NTO’s data architect, is brought in to work with Charlotte to design the right solution architecture to improve the case deflection agent’s responses.

Ask the Right Questions

Luna is responsible for ensuring the underlying data is reliable and ready for AI use. The first task is not to adjust prompts or add more data immediately. Instead, Luna and Charlotte must determine whether the agent’s issues are caused by the system design, the underlying data, or the way users interact with the agent.

Rather than treating every wrong answer the same, Luna and Charlotte begin by asking:

  • Did the agent generate an answer that does not exist in any verified data source?
  • Did the agent respond using available data, but that data was incomplete, outdated, duplicated, or inconsistent?
  • Did the prompt provide enough context to be answered correctly?
  • Did the agent have sufficient guardrails to answer the question?

The questions help uncover the cause of the issue and how to fix it. Treating all failures the same leads to wasted effort, delayed adoption, and erosion of trust.

Note

This module focuses on determining data reliability risks for AI agents. To explore the importance of applying guardrails to agent behavior, check out Define Agent Guardrails and Explore Agentforce Guardrails and Trust Patterns on Trailhead.

AI Data Readiness and Reliability

AI agents need context, not just access to data. Ensuring context, consistency, and correctness involves determining data readiness and reliability.

Data readiness is defined as the state of your data as it is being prepared, made accessible, and structured for a specific, immediate purpose (for example, AI or analytics).

Data reliability is defined as the quality of that data to ensure it is fit-for-purpose. In other words, what is necessary to market to a customer is not the same as what is necessary to sell to or support a customer.

AI data readiness means having data that is accessible, complete, contextual, compliant, and correct, so an AI agent can use it to deliver business results.

Data Reliability Examples from the NTO Pilot

In practice, most AI reliability issues fall into three categories. Luna and Charlotte review feedback from the proof‑of‑value project with these categories in mind to understand what went wrong and why.

Issue Type

Example

Potential Cause

Hallucination: The model generates information that is not grounded in trusted data.

The agent states that the customer previously opted into a loyalty program based on a historical inquiry, even though no such field or record exists in the CRM.

Instead of using explicit rules that indicate loyalty membership, the agent misinterpreted the interaction transcript and filled in missing information.

Incorrect responses: The agent uses real data, but the data itself is incomplete, outdated, or fragmented.

When the case deflection agent attempts to review order history, it retrieves incomplete information about a customer’s cases and purchases and refers to the wrong order in its response to the customer.

Many cases were related to duplicate Service Cloud contacts because NTO allowed guest orders.

Without unified profiles, the agent lacked the complete customer context.

Interaction or guardrail gaps: The system allows prompts or queries that lead the agent to retrieve or present information in ways that create confusion or risk.

When a user asks the agent to show all orders associated with a phone number, the agent returns orders belonging to multiple customers who share that number.

The system lacks prompt guardrails and identity verification checks to confirm that the phone number uniquely identifies a single customer before retrieving sensitive order information.

Note

While hallucinations get much attention in the news, the real culprit often can be bad data. To learn more, check out Are Hallucinations a Problem (Or Is It Just Your Bad Data)?

These examples show why investigating the cause of a wrong AI response is critical. Treating all failures as the same can lead to wasted effort, delayed adoption, and loss of trust in AI systems.

There are many reasons an AI agent can generate wrong responses, such as model limitations or gaps in prompt engineering. While these are important considerations, this badge focuses on the framework that Luna and Charlotte use to assess and remediate AI reliability risks due to data reliability.

The next units explore how data and metadata reliability affects AI agent responses and how improving that reliability can help agents deliver more accurate and trustworthy responses.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback