Understand Why an Agent Can Be Confidently Wrong
Learning Objectives
After completing this unit, you’ll be able to:
- Distinguish between hallucinations, incorrect responses caused by unreliable data, and interaction or guardrail issues.
- Explain why investigating the cause of an AI response is critical to improving agent reliability.
- Use a simple framework to analyze what went wrong before choosing mitigation strategies.
Meet Luna Stone, data architect, and Charlotte Liu, Salesforce architect.
Luna Stone, Data Architect
Charlotte Liu, Salesforce Architect
Northern Trail Outfitters (NTO) created a customer journey map as part of its AI and data strategy. One use case stood out: using agentic AI to improve case deflection. Everyone is excited about the potential for faster 24/7 customer service, reduced costs, and agents focused on high-value work.
Executives also set a clear directive: Move quickly while ensuring the solution is trustworthy, scalable, and built on solid data.
So NTO launches a rapid proof‑of‑value project for a Case Deflection agent. Salesforce architect Charlotte Liu and data architect Luna Stone partner to run the pilot, where they need to ensure:
- Business stakeholders feel confident that the use case will create value.
- IT stakeholders can make informed technology investment and architectural decisions.
- Finance stakeholders can validate the cost-benefit of larger investments.
Proof-of-Value Project Scope
Since NTO uses Salesforce, Charlotte limits the scope to the most impactful personas and data sources.
Persona |
User Story |
|---|---|
Head of Customer Service |
As the head of customer service, I want to deliver the highest-quality service 24/7 by automatically answering the most frequent questions about orders and open support cases, so that customers receive fast, consistent responses, operational costs decrease, and human agents can focus on escalations and complex cases. |
Consumer |
As a consumer, I want to check my recent orders or case statuses, make updates as needed, or request help at any time, so I can resolve issues quickly with minimal effort, without having to repeat information. |
Customer Service Rep |
As a customer service rep, I want to address customer support needs quickly and need full visibility into their orders, cases, and interaction history so that I don’t have to search across systems or ask customers to repeat details. |
Since this is an internal proof-of-value project, a select group of service reps seek to replicate their experience with the agent and share their feedback with Charlotte. To demonstrate results quickly and adjust based on pilot findings, Charlotte keeps the initial scope small. She uses available data in Salesforce CRM, with Agentforce powering the prompts, and defers long-term architecture decisions.
Pilot Findings
Early testing shows clear promise. The agent handles many routine customer requests quickly, helping customers check order status, review policies, and resolve simple support questions without waiting for a service representative.
However, as a group of pilot users and service reps begin testing the experience more deeply, they report several concerning responses. In some cases, the answers are incomplete or confusing. In others, the answers are simply wrong—especially when the agent interacts with long‑time customers whose history spans multiple systems.
At this point, Luna, NTO’s data architect, is brought in to work with Charlotte to design the right solution architecture to improve the case deflection agent’s responses.
Ask the Right Questions
Luna is responsible for ensuring the underlying data is reliable and ready for AI use. The first task is not to adjust prompts or add more data immediately. Instead, Luna and Charlotte must determine whether the agent’s issues are caused by the system design, the underlying data, or the way users interact with the agent.
Rather than treating every wrong answer the same, Luna and Charlotte begin by asking:
- Did the agent generate an answer that does not exist in any verified data source?
- Did the agent respond using available data, but that data was incomplete, outdated, duplicated, or inconsistent?
- Did the prompt provide enough context to be answered correctly?
- Did the agent have sufficient guardrails to answer the question?
The questions help uncover the cause of the issue and how to fix it. Treating all failures the same leads to wasted effort, delayed adoption, and erosion of trust.
AI Data Readiness and Reliability
AI agents need context, not just access to data. Ensuring context, consistency, and correctness involves determining data readiness and reliability.
Data readiness is defined as the state of your data as it is being prepared, made accessible, and structured for a specific, immediate purpose (for example, AI or analytics).
Data reliability is defined as the quality of that data to ensure it is fit-for-purpose. In other words, what is necessary to market to a customer is not the same as what is necessary to sell to or support a customer.
AI data readiness means having data that is accessible, complete, contextual, compliant, and correct, so an AI agent can use it to deliver business results.
Data Reliability Examples from the NTO Pilot
In practice, most AI reliability issues fall into three categories. Luna and Charlotte review feedback from the proof‑of‑value project with these categories in mind to understand what went wrong and why.
Issue Type |
Example |
Potential Cause |
|---|---|---|
Hallucination: The model generates information that is not grounded in trusted data. |
The agent states that the customer previously opted into a loyalty program based on a historical inquiry, even though no such field or record exists in the CRM. |
Instead of using explicit rules that indicate loyalty membership, the agent misinterpreted the interaction transcript and filled in missing information. |
Incorrect responses: The agent uses real data, but the data itself is incomplete, outdated, or fragmented. |
When the case deflection agent attempts to review order history, it retrieves incomplete information about a customer’s cases and purchases and refers to the wrong order in its response to the customer. |
Many cases were related to duplicate Service Cloud contacts because NTO allowed guest orders. Without unified profiles, the agent lacked the complete customer context. |
Interaction or guardrail gaps: The system allows prompts or queries that lead the agent to retrieve or present information in ways that create confusion or risk. |
When a user asks the agent to show all orders associated with a phone number, the agent returns orders belonging to multiple customers who share that number. |
The system lacks prompt guardrails and identity verification checks to confirm that the phone number uniquely identifies a single customer before retrieving sensitive order information. |
These examples show why investigating the cause of a wrong AI response is critical. Treating all failures as the same can lead to wasted effort, delayed adoption, and loss of trust in AI systems.
There are many reasons an AI agent can generate wrong responses, such as model limitations or gaps in prompt engineering. While these are important considerations, this badge focuses on the framework that Luna and Charlotte use to assess and remediate AI reliability risks due to data reliability.
The next units explore how data and metadata reliability affects AI agent responses and how improving that reliability can help agents deliver more accurate and trustworthy responses.
Resources
- Salesforce Blog: Less Hallucinations, More Trust: Salesforce’s Path to Consistent Enterprise-Ready AI
- Salesforce+ Video: How Salesforce Help Powers Agentforce with Data 360
- Salesforce Ben: Are Hallucinations a Problem (Or Is It Just Your Bad Data)?
- Trailhead: Grounding an Agent with Data
- Trailhead: Prompt Engineering Techniques
