Skip to main content

Improve Agentic Prompts with Data and Iteration

Learning Objectives

After completing this unit, you’ll be able to:

  • Define success criteria for your AI agent.
  • Use a quality framework to evaluate agent responses against your criteria.
  • Describe how to iterate on agent instructions to improve performance.

Test Your Agent

Your agent design is a hypothesis. You've made your best guess about what users will say and how to guide the agent to respond effectively, but you won't know if you're right until people interact with it.

A robot in a lab testing computer equipment.

[This image is AI-generated with Google Docs Gemini using this prompt: Using the Salesforce Trailhead style, display a robot undergoing testing. Don't include any words in the image. Don't include any Salesforce logos or mentions of Salesforce or Trailhead.]

Testing helps you find problems early, before you invest time and resources into building the final product. You uncover:

  • Confusing language with phrases or terms that make sense to you but confuse users.
  • Roadblocks where the user gets stuck and doesn’t know what to do next.
  • Unnatural interactions that feel robotic or don't follow the natural back-and-forth flow of a real conversation.

What Does a Successful Conversation Look Like?

Before testing your agent, define what success means for it. A success criteria checklist is a simple set of goals the agent must achieve to be considered effective. Include two types of criteria in your checklist.

Task-Specific Criteria

What must the agent accomplish to fulfill its specific role? Consider if there are any nonnegotiable criteria that must pass for the agent to be successful. Those must be assessed first and pass before evaluating other criteria.

  • Example for a weather agent: “Must include the current weather conditions and temperature in the unit that’s standard for the user's location.”
  • Example for a support agent: “Must provide the correct shipping status when given an order number.”

General Quality Criteria

How should the agent behave in every conversation? For this, consider four quality pillars.

  • Clear: Is the response easy to understand?
  • Context-aware: Is the response relevant to the utterance?
  • Respectful: Is the tone polite and helpful?
  • Action-oriented: Does the response help the user move forward?

Each of these four pillars includes additional dimensions and questions to consider.

Pillar

Dimension

Ask Yourself

Clear

Can I understand it?

Factual and reliable

Is the info right—not made up or vague?

Conversational

Does it sound like a real person, not a stiff robot?

Consistent

Does it use the same tone and terms throughout?

Context-aware

Does it respond to what I asked?

Responsive

Does it show it heard me and react accordingly?

Effective

Does it fit my task, now, in this place?

Respectful

Does it treat me well?

Approachable

Is it easy for anyone to understand and use?

Trusted

Does it stay honest about what it can and can’t do?

Teachable

If I correct it, does it learn or adapt?

Action-oriented

Does it move things forward?

Decisive

Does it guide me clearly to the next step?

Helpful

Does it reduce my effort with links, actions, or shortcuts?

How to Test

Because generative agents can create a wide variety of responses, the best way to test them is to interact with them directly in a tool like an Agent Builder. The goal is to check if your agent's instructions–the prompts and guidelines you wrote–are effective at generating appropriate responses to user utterances.

Your job is to act like a real user and try to push the agent’s boundaries. A key part of testing is to use a variety of utterances, not just the obvious ones. Try to include:

  • Happy path utterances that are the simple, direct questions or statements you expect from users.
    Example: “What’s the status of order 12345?”
  • Synonym utterances with different ways of asking the same thing.
    Example: “Where's my stuff? My order number is 12345.”
  • Vague utterances that are less specific or incomplete phrases.
    Example: “My order.”
  • Off-topic utterances about something unrelated to the agent’s purpose.
    Example: “What's the weather today?”

The Improvement Loop

For each test utterance, you evaluate the agent’s response against your success criteria. A low score isn’t a failure. It’s an opportunity to improve your instructions.

Imagine a support agent with this simple instruction, then walk through the steps.

“You’re a customer support agent. When a customer asks for an order status, look up the order number and provide the status.”

  1. Test: You type a vague utterance, “Order status.”
  2. Observe: The agent responds, “Provide your number.”
  3. Evaluate:
    • Task-Specific (Get status). ⚠️ Partial. It doesn’t provide the status, but is asking for the required information to get the status.
    • Quality (Clear). ❌ No. The response is simple, but the language is stiff. It isn’t clear exactly what kind of number it’s asking for.
    • Quality (Context-Aware). ⚠️ Partial. It’s asking for the information it needs, but doesn't acknowledge it has understood the user intent to get the order status.
    • Quality (Respectful). ⚠️ Partial. It’s a bit blunt and demanding.
    • Quality (Action-Oriented). ✅ Yes. The agent asks for missing information needed to perform the task.
  4. Diagnose: The agent's response was too direct because its instructions were too simple. It didn’t include instructions or guidance for getting the order number.
  5. Improve: Revise the instructions to make the agent more helpful and polite. Add a task-specific success criteria that the agent should politely ask for the order number if the user doesn’t provide it.

Based on the improvements you identified, you revise the agent’s instructions.

“You are a friendly and helpful customer support agent. When a user asks for an order status, first check if they have provided their 12-digit order number. If they haven’t, politely ask for it so you can look up the order status.”

Now, retest with the same “Order status” utterance, the agent might generate a much better response, like: “I’d be happy to check your order status! What’s your 12-digit order number?”

Make improvements and keep testing different utterances until you feel sure the agent always meets all of your success criteria. This continuous cycle is the key to turning a good agent design into a great one.

Onward and Upward

Way to go! You’ve learned great skills that help you not only create successful agentic conversations, but also test and refine the quality of your agent over time.

Resources

Condividi il tuo feedback su Trailhead dalla Guida di Salesforce.

Conoscere la tua esperienza su Trailhead è importante per noi. Ora puoi accedere al modulo per l'invio di feedback in qualsiasi momento dal sito della Guida di Salesforce.

Scopri di più Continua a condividere il tuo feedback