Skip to main content
Register now for TDX! Join the must-attend event to experience what’s next and learn how to build it.

Profile Data to Guide Architecture Decisions

Learning Objectives

After completing this unit, you’ll be able to:

  • Explain risks associated with relying only on discovery interviews to assess AI data readiness.
  • Describe how data profiling provides quantified evidence about data reliability.
  • Identify how profiling insights guide architecture decisions for AI solutions.

Discovery Interviews Versus the ‘Data Reality’

Discovery interviews are often the first step in designing AI solutions. Architects meet with business leaders, admins, and subject-matter experts to understand how data is expected to work within their use cases and processes. For example, stakeholders might explain:

  • Which objects support a business process
  • Which fields are important for a use case
  • How customer, product, or transaction records are related

These discussions help define scope and intent, but they do not always reflect the actual condition of the data.

Over time, systems evolve. Fields become unused, integrations change, and records can become fragmented across systems. As a result, the data can behave differently than expected.

This is where data profiling becomes essential.

Identify Data Reliability Risks with Data Profiling

Even in narrowly scoped agentic AI projects involving only a few objects, the scope of data assessment can span hundreds or thousands of fields and millions of records. Data profiling provides measurable evidence and concrete examples of data quality risks.

If you completed Data Profiling Fundamentals, you learned how Luna used profiling to uncover completeness gaps, data inconsistencies, and risk signals. To aid her process, Luna chooses a native data profiling solution from AgentExchange, so she can continue to work within the Salesforce trust boundaries.

Luna starts by assessing the CRM objects for the Case Deflection Agent. She quantifies the overall record volume and the number of recently created transactional records (for example, cases). This helps her estimate how many cases an agent can deflect in a given year and guide her with cost modeling.

Luna discovers that only 201K cases were active over the past year, a fraction of the total 2.17 million cases present in the data. With this insight, she can further narrow the data scope that AI agents need access to, reducing consumption costs and ensuring the right context.

Agentforce Data Risk dashboard by Cuneiform, a native data profiling solution from AgentExchange, showing the number of objects analyzed for the Case Deflection Agent use case, overall versus recent record volume to guide business value and consumption planning.

Luna further analyzes the objects’ content and determines that only 60% of the available fields (432 potential relevant fields out of 652 profiled fields) should be accessible to the agent. She determines this by identifying and excluding abandoned fields (those not used in the past year), empty (unused) fields, and single-value fields.

Agentforce Data Risk dashboard by Cuneiform, a native data profiling solution from AgentExchange, showing a series of reliability risks, including unused or unreliable fields, customer data completeness, and data dictionary risk.

Note

For CRM orgs older than 5 years, a significant percentage of custom fields can be unused or abandoned.

Sources—ISV Research: Elements.cloud, Gearset, Hubbl Diagnostics, PeerNova

Luna also quantifies the risk of having incomplete data with the customer context risk estimate. The Fragmented Customer Data Risk indicator measures how much customer data is spread across duplicate or disconnected records. It accounts for how often customer contact points appear across different records and calculates a risk percentage.

Incomplete data directly impacts whether an agent has enough information to provide reliable answers. At a 36.6% risk, 4 out of every 10 customer interactions can have incomplete data and might not be linked to the right customer. If the Case Deflection Agent cannot connect customer interactions to the correct customer, it cannot fulfill its function properly.

Data profiling provides Luna with the evidence she needs to justify an investment in Data 360 as a system of context. It also highlights the need to incorporate additional data quality improvements, following Salesforce’s Data Quality Management framework.

The Data Quality Management Framework

Data Quality Management Framework

Following this framework ensures that data isn’t just collected, but actively refined.

  • Data profiling: Analyze data and perform a statistical analysis of critical data elements across data sources.
  • Data cleaning and standardization: Cleanse and purge unnecessary data and fields and create standards to consistently format data.
  • Data enrichment: Enrich your data with additional data points for a more complete picture, building richer profiles.
  • Profile unification: Identify and manage duplicate or related records that might span multiple sources. Create contextual profiles through matching and reconciliation rules.
  • Data monitoring: Monitor data health through well-established governance processes and controls (data stewardship).
Note

This unit introduced the role of data profiling in assessing AI data readiness.

To learn more about the Salesforce Customer Success Group’s Data Quality Management Framework and how data profiling insights guide effective data cleansing, standardization, enrichment, and data unification decisions, complete the Explore Data Quality Fundamentals trail.

Profiling Supports Better Architecture Decisions

By combining what she learns from discovery interviews with evidence gathered during profiling, Luna can make better decisions about how to prepare data for AI.

For example, profiling results can lead to architecture choices, such as:

  • Implementing identity resolution to unify fragmented customer records
  • Adding data enrichment to fill important data gaps
  • Improving integration pipelines from systems of record
  • Defining agent guardrails when data reliability is uncertain

These decisions ensure AI agents have reliable context and help reduce the risk of incorrect responses.

In the next unit, you explore how data readiness insights guide architecture decisions.

Resources

Teilen Sie Ihr Trailhead-Feedback über die Salesforce-Hilfe.

Wir würden uns sehr freuen, von Ihren Erfahrungen mit Trailhead zu hören: Sie können jetzt jederzeit über die Salesforce-Hilfe auf das neue Feedback-Formular zugreifen.

Weitere Infos Weiter zu "Feedback teilen"