Skip to main content
Register now for TDX! Join the must-attend event to experience what’s next and learn how to build it.

Explore Data Readiness for AI

Learning Objectives

After completing this unit, you’ll be able to:

  • Define AI data readiness.
  • Differentiate AI data readiness from general data quality.
  • Identify key dimensions of data readiness for AI.
  • Differentiate structured from unstructured data readiness.
Note

This module was produced in collaboration with Dreamin' in Data, a non-profit and part of the Datablazers Community. Learn about partner content on Trailhead.

Define the Scope

If you completed Data Reliability Risk Factors for AI, you followed data architect Luna Stone and Salesforce architect Charlotte Liu at Northern Trail Outfitters (NTO) as they ran a proof-of-value project for the Case Deflection Agent. The project was successful. It clearly demonstrated the business benefits of the agent and the speed of delivery of the Agentforce platform.

The project also highlighted the need for data that is fit-for-purpose. The pilot findings showed Charlotte and Luna the potential reliability risks posed by structured data, unstructured data, and incomplete or missing metadata. Their next step is a larger-scale formal data readiness assessment. They must determine exactly what data and quality of that data the Case Deflection Agent needs in order to provide faster 24/7 customer service and reduce costs at NTO.

Charlotte will design the overall solution architecture using Agentforce, Data 360, and Service Cloud as the technical foundation. Luna will focus on whether the underlying data can be trusted for autonomous agent behavior. Her goal as the data architect is to identify risks and recommend remediations.

Luna at her desk thinking about the flow of data across the user experience

She wants to avoid:

  • Consumers getting stuck in an automated experience and unnecessarily escalating issues to a service representative.
  • Service reps overwhelmed by extra work and unhappy customers.
  • Stakeholders losing trust in the AI investment because agents are not delivering the intended business value, giving incorrect responses.
  • Leaders facing compliance issues due to agents exposing sensitive business information.

Dimensions of AI Data Readiness

As mentioned in Data Reliability Risk Factors for AI, AI data readiness means having data that’s accessible, complete, contextual, compliant, and correct. So an AI agent can use it to deliver business results.

To ensure data readiness, Luna will:

  • Identify the data sources, both structured and unstructured, that give the agent the context it needs.
  • Determine when to extract structured data from unstructured documents to ensure information is presented consistently.
  • Assess the data quality of all structured data sources to minimize the risk of hallucinations or incorrect responses.
  • Quantify risks based on data profiling insights and make remediation recommendations: cleansing, standardization, enrichment, or profile unification.
  • Define data reliability KPIs to assess when information is sufficient for agentic engagement and when a representative should step in, informing the guardrails built into the solution.
  • Develop monitoring capabilities to ensure data remains reliable over time.

Luna must validate that the data is sufficiently reliable, governed, and contextually controlled for autonomous agent processing. This means confirming that only trustworthy, relevant data is accessible to agents, with guardrails that minimize hallucinations and incorrect responses.

AI data readiness builds on traditional data quality practices that ensure data is fit for the intended business outcome. To ensure that only trustworthy, relevant data is accessible to AI agents, Luna investigates multiple dimensions in her AI data readiness plan.

Dimension

Insufficient

Agentic AI Ready

Availability:

Is the data present?

Data objects or documents exist in at least one system and are available to technical teams when needed.

Structured and unstructured data are available through a curated, trusted system of reference designed for AI consumption, with a clearly defined update cadence (on-demand, batch, real-time) and version control for unstructured documents.

Accessibility:

Does the agent see only what it should?

Data access is governed by standard user roles and permissions, but not specifically designed to provide an AI context or guardrails.

AI agent access is limited to permitted data and records using record and field-level security controls and sensitivity policies.

Criticality:

What fields are key to the business outcome?

What records have sufficient data to make them actionable?

All fields are treated as equally important for the use case.

Data reliability KPIs identify critical data elements.

Agents assess whether a record meets specified reliability levels and manage exceptions, such as through human intervention.

Completeness:

Are fields populated when expected?

Is there a complete view of customer interactions?

Required fields are generally populated across core objects.

A complete customer profile, linking all relevant structured records and unstructured documents needed for the intended outcome, is present.

Consistency:

Do similar concepts appear in multiple formats (for example, email opt in = Yes, No, or email opt out = Y, N)?

Are values standardized across sources to preserve meaning?

Field values and formats follow enterprise standards within individual systems.

A unified, consistent view exists across systems, objects, and document sources regardless of structure or labeling.

Correctness (or Accuracy):

Has data been validated against authoritative systems of record?

Can we detect potential inaccuracies based on patterns in the data?

Data is periodically reviewed and verified, or corrected by data stewards.

Data architecture delivers up-to-date information, with mechanisms to detect, flag, and prevent incorrect data from influencing AI outputs. Users have a loop-back mechanism to flag potentially incorrect data or responses.

Recency (or Timeliness):

Is the data updated at a cadence appropriate for the use case?

Records are updated in accordance with integration service level agreements (SLAs).

Information is available at the cadence required by the use case. Monitoring can detect stale or missing data.

Uniqueness:

Are there duplicate or overlapping data that could cause the agent to give the wrong answer?

Duplicate records are managed within a given source object.

A field used for classification (for example, country, segment) is managed by reconciling multiple values that represent the same concept (for example, item type: hat versus cap).

Identity resolution links all related records within and across systems using available, reliable information. Intentional duplicates are maintained to preserve the business or security context.

Data granularity is preserved across source systems while standardized to ensure a common, coherent understanding for AI interpretation.

Sensitivity:

Is sensitive data protected from unauthorized use?

Data sensitivity is classified at the business application level or in a data catalog.

Metadata and data are continuously monitored and assessed to enforce sensitivity policies, ensuring agents access only what is appropriate within defined business, legal, and ethical guardrails.

Note

“When I present, my sessions are often titled [Not only] AI Ready, as ensuring data is fit for purpose benefits not just AI agents but all business decisions.”

—Mehmet Orun, Datablazer

Unstructured and Structured Data Readiness

Historically, enterprise analytics and AI solutions, such as predictive models and next-best-action recommendations, have relied primarily on structured data. Today, generative AI has made unstructured data more accessible and actionable, unlocking insights from PDFs, email, chat transcripts, call logs, and knowledge articles. In fact, 80–90% of enterprise data exists in unstructured formats.

Luna’s goal is to unlock and responsibly activate the value of all data that’s relevant to NTO’s use case, whether structured or unstructured.

To do this, she begins by assessing the data sources that end users rely on to complete tasks manually today. She wants to ensure that those sources are accessible and actionable for agentic flows.

Assess Data Early to Architect for Reliability

NTO stakeholders want to be confident that their investment in AI and supporting architecture will deliver the intended business outcomes.

Luna can provide the evidence and insights that Charlotte needs for her solution architecture recommendations. By assessing data as part of solution planning, she ensures that NTO can prioritize what matters most and design an AI solution grounded in trustworthy, reliable data.

With her data assessment, Luna can:

  • Quantify data reliability risks.
  • Identify gaps in needed data.
  • Determine which architectural components can be required (such as identity resolution, enrichment, transformation, or governance controls).

Let’s Recap

You’ve explored the many dimensions of data readiness. Next, discover how to evaluate and prepare unstructured data for AI.

Resources

Salesforce 도움말에서 Trailhead 피드백을 공유하세요.

Trailhead에 관한 여러분의 의견에 귀 기울이겠습니다. 이제 Salesforce 도움말 사이트에서 언제든지 새로운 피드백 양식을 작성할 수 있습니다.

자세히 알아보기 의견 공유하기