Assess Structured Data Readiness for AI

Learning Objectives

After completing this unit, you’ll be able to:

Explain how structured data provides the business context that allows AI agents to interpret unstructured information correctly.
Identify common structured data reliability risks that can affect AI outcomes.
Evaluate whether structured data is reliable enough to support agentic AI use cases.

Structured Data: The Contextual Foundation for AI

While unstructured documents often contain the information needed to answer customer questions, structured data provides the context that makes those answers more accurate. Structured data defines who the customer is, what they purchased, which orders or cases are involved, and what information an agent is allowed to access. In this way, structured data provides the contextual foundation for AI.

To support the Case Deflection Agent, NTO must unify customer data across Service Cloud, Commerce Cloud, and other transactional systems. Relevant business documents and interaction content must also be associated with the correct customer or transaction.

Together, this structured data forms the trusted foundation for enabling agents to interpret unstructured content more accurately. The same foundation also supports other enterprise data activation use cases, including analytics, automation, and personalization.

Without reliable structured data, even well-grounded AI agent responses can be applied to the wrong customer, order, or transaction.

So What Exactly Is Structured Data?

Structured data is information organized according to a predefined data model, typically stored in relational tables or well-defined schemas (for example, CRM objects, CSV files, XML or JSON documents). Its structure enables consistent querying, reporting, governance, and integration across enterprise systems.

For the first phase of NTO’s Case Deflection Agent, structured data includes:

Contact and Case objects from Salesforce CRM
Customer and sales order data from Commerce Cloud

Structured Data Categories and Considerations

Content Type	Description	Example	Data Reliability Considerations
Master data	Core business entities that represent key actors or objects in the enterprise. Relatively stable and shared across systems.	Customer, supplier, partner, employee, product	Are there duplicate records within and across objects that store master data? Is there a complete view to understand the overall relationship? Is there a view that only shows relevant and permissible records?
Transactional data	Records of business events or activities tied to master data entities. High volume and time-sensitive.	Opportunity, order, shipment, invoice, case	Are records accessible consistently, even across different data sources? Are there orphaned transactions where the link to the necessary master data is missing?
Reference data	Controlled vocabularies or classification schemes are used to standardize meaning across systems.	Geographic classification (region, country, state/province) Foreign exchange reference	Are values standardized across systems? Are outdated codes deprecated? Is the semantic meaning consistent for AI interpretation?
Metadata	Data about data that describes structure, lineage, sensitivity, and governance attributes.	Field definitions Data classifications	Is metadata consistent and accurate? Can AI systems correctly interpret fields’ meanings and sensitivities?

Common challenges that you need to address with structured data.

Different data schemas with conflicting data types or inconsistent content values require transformation for consistency (for example, aligning and standardizing customer segments).
Inaccessible, incomplete, unused, or abandoned fields can increase implementation effort or increase the risk of hallucinations.
Disconnected records within and across systems (duplicate records, transactions not explicitly related to a customer or product record) can lead to an incomplete understanding.

If duplicates, disconnected records, or inconsistent metadata exist, agents might reason over fragmented or misleading context. Reliable structured data ensures that a return complaint in a chat log is linked to the correct case and order, and a delivery issue email is associated with the right contact and product.

Evaluate Structured Data for AI

To determine whether structured data is reliable enough for AI use, Luna assesses a series of common reliability risks.

Question

Impact

What to Do Based on the Answer

Which field or records are relevant to the agent?

Luna profiles in-scope objects with a specific end user persona and with a focus on recent usage (such as the last 1 to 2 years).

AI agents accessing more fields or records than employee users can introduce security risks.

Unused, abandoned (no longer used), or default-valued fields can mislead AI reasoning or introduce hallucination risk.

Ensure agent permissions are limited to the same records and fields, or a subset thereof, that human users can access.

Inspect fields with only one value or heavy default-value usage for reliability. If found unreliable, remove from agentic access.

Additionally, remove agents’ access to unused or abandoned fields to mitigate hallucination risk.

What are the critical fields for the use case?

Luna profiles case and contact objects to determine which fields are consistently populated in the past year when a case successfully closed on the first attempt versus when escalation was needed.

Agents can determine when they have insufficient data to act upon and when to request additional information.

Propose data reliability KPI formulas to assess overall data reliability readiness for agentic interaction.

Determine if first or third-party data sources can bridge data gaps to improve field-level data completeness.

Design agentic guardrails to check record-level data reliability KPIs to determine if service rep assistance is necessary.

Are customer records unified across systems?

Luna reviews:

The percentage of cases not linked to a contact,
The percentage of guest orders, and
The uniqueness percentage of contact fields, such as email or phone.

These metrics help show whether customer data is fragmented and whether customer records should be unified.

If customer transactions or interactions are fragmented, it is highly likely that customer information will be outdated or inaccurate, leading to incorrect responses.

Use data profiling insights to quantify the risk of customer data completeness.

Use Data 360’s identity resolution capabilities to create a unified profile.

Is the data standardized and consistent across sources?

Luna uses data profiling insights to identify string fields with fewer than 1,000 distinct values to assess if data standardization within or across objects is necessary.

Different formats or meanings for the same concept can lead the AI to interpret data incorrectly.

Apply data standardization and normalization across systems, for example, by using Data 360 data transforms.

Is metadata sufficient and available to guide reasoning engines (field descriptions, sensitivity tags, source-of-truth designations)?

Luna assesses the data dictionary and sensitivity classifications for completeness and consistency.

Incomplete or misleading metadata can lead to misinterpretation, the leakage of sensitive information, or unsafe AI behavior.

Identify when to enrich or augment data dictionaries, for example, CRM object descriptions or help texts, or the semantic layer in Data 360.

Propose using automated tagging to identify sensitive fields.

Use sensitivity classifications and governance policies.

NTO Investigates Structured Data Risks

Luna investigates an issue reported by pilot participants: The Case Deflection Agent could not find all related cases or orders for the NTO customer Rachel Rodriguez. She discovers that Rachel appears in the system as “Rachel Rodriguez” and “Rachael Rodriguez,” even though both records use the same email address.

Customer information from Service Cloud and Commerce Cloud for NTO customer Rachel Rodriguez

To understand how widespread the problem is, Luna plans to profile the Contact, Case, and Order objects. She looks for situations where the same contact point—such as an email address or phone number—appears in multiple records.

By comparing the number of unique contact values to the total number of populated values, Luna can estimate how much duplication exists. This helps her identify customer records and related transactions that should be linked or unified.

Now that Luna has identified important risk areas to focus on, and a real-life problem identified in the Case Deflection Agent pilot, she proceeds to profile NTO’s data. Move on to the next unit to follow Luna’s data profiling process.

Geschätzte Zeit

Themen

Benötigen Sie Hilfe?

Agentforce 360 Platform – Ressourcen