Skip to main content

Assess the Need for Customer Profile Unification

Learning Objectives

After completing this unit, you’ll be able to:

  • Determine whether customer profile unification is needed based on data profiling results.
  • Identify which fields and contact points are suitable for matching.

Start with the Data, Not Assumptions

Before you choose or build a profile unification solution, you should first confirm that fragmented customer data is actually impacting business processes or customer interactions.

Users might report:

  • Duplicate outreach
  • Inconsistent customer recognition
  • Missing transaction history
  • Disconnected service interactions
  • Incomplete customer context across systems

However, if you don’t understand how customer data is fragmented—or why—before you implement a fix, you risk incomplete solutions, unnecessary complexity, or, worse, incorrect combinations of different customers’ identities.

Instead, you should begin by profiling your data to understand:

  • How customer data is fragmented across systems or processes.
  • Whether records are disconnected, duplicated, or missing context.
  • Whether identifiers are reliable enough to support matching.
  • Which fields or contact points might create incorrect matches.
  • Whether cleanup, normalization, or enrichment is required before profile unification begins.

Investigate Customer Context Gaps

In Data Profiling Fundamentals, you were introduced to many data profiling insights commonly used in data quality management. Data profiling helps answer a fundamental question: Does the organization have a customer recognition problem—and, if so, why?

At Northern Trail Outfitters (NTO), service teams report that agents can’t always see a customer’s complete history across orders, support interactions, loyalty programs, and guest checkouts.

Before configuring match rules or choosing a unification approach, Luna first profiles the customer-related objects and systems involved in these interactions.

She reviews profiling insights such as:

  • Fill rates for key identifiers.
  • Shared or placeholder contact points.
  • Distinct value counts.
  • Frequency distributions.
  • Disconnected transactions and interactions.

These insights help Luna determine whether:

  • Customer context is fragmented.
  • Data from an object should be used to build or link to a customer profile.
  • Records are suitable for matching.
  • Additional cleanup, normalization, or enrichment is required before profile unification can begin.

Luna looks for the following key signals in data profiling results.

Distinct Ratio Quantifies Data Fragmentation Risk

A distinct ratio helps Luna identify fields that can indicate customer data fragmentation. Fields commonly used for customer interaction—such as email address, phone number, or loyalty ID—often contain many distinct values because they relate to individual customers rather than a smaller group of internal employees or operational users.

Distinct ratio measures how often a value appears in a field relative to the total number of individual values in that field.

  • A 100% ratio means every value in the field is distinct.
  • A distinct ratio below 100% indicates that some values in that field are repeated. If the field is intended as a contact point or identifier, this indicates a data duplication or fragmentation risk.

When a contact-point field contains many distinct values but the distinct ratio is less than 100%, it can indicate that the same identifier appears across multiple records. This can signal potential duplicates, disconnected interactions, or fragmented customer history that should be investigated before profile unification begins.

To start, Luna compares distinct ratios across contact-point and identifier fields to identify which fields are more likely to represent customer interactions versus internal operational activity. Fields with very low distinct ratios—such as less than 10%—often indicate shared, repeated, or operational values associated with employees, queues, or system processes. Fields with higher distinct ratios—such as above 25%—are more likely to represent customer-specific identifiers because they contain a wider variety of values across records.

These thresholds are not fixed rules. The expected distinct ratio for a field varies based on the organization, industry, business process, and data source. For example, a B2C organization might expect very high distinct ratios for customer email fields, while some B2B processes can legitimately reuse shared contact points across accounts.

Top-Value Frequency Indicates Placeholder or Shared Identifiers

Top-value frequency shows how frequently the most common value appears in a field compared to all other values. When a small number of values appear disproportionately often, it can signal placeholder data, default entries, or shared identifiers that weaken customer recognition. Luna might see many records with the same values, such as:

  • test@test.com
  • unknown@email.com
  • N/A
  • A shared phone number like 555-555-5555

In NTO’s data, these values might appear in fields intended to uniquely identify a customer, such as email or phone number. While each record appears valid because it is populated, these shared values make it difficult or impossible to distinguish one customer from another or to reliably match records across objects and systems.

Luna treats high top-value concentration in identifier fields as a red flag. It suggests that the data might be technically complete but not meaningful enough to support accurate matching or unification decisions.

Low Fill Rates in Fields Indicate the Existence of Invisible Duplicates

Low fill rates in identifier fields can indicate that invisible duplicates might exist within the dataset. When important matching fields such as email address, phone number, loyalty ID, or mailing address are missing, related records might not contain enough shared information to be confidently matched.

Note

Fill rate helps determine whether the record includes the information, such as phone number, address, or loyalty ID, needed to support business activities.

Low fill rate doesn’t automatically mean a field is unimportant. Some sparsely populated fields can contain highly reliable or business-critical information because they’re populated only when relevant or required for a specific process.

Disconnected Transactions Can Create Fragmented Customer Understanding

Not all customer context gaps are caused by duplicate records. Sometimes transactions or interactions exist, but they’re not connected to a customer profile.

For example:

  • A guest checkout order can contain an email address but no Contact reference.
  • A support case can include a phone number, but isn’t linked to an Account or Contact.
  • A loyalty interaction can contain a loyalty ID without a connected customer profile.

These disconnected transactions can prevent teams from seeing a customer’s complete history across systems and channels.

Luna looks for this type of fragmentation by profiling engagement or transactional objects (such as Orders and Cases) and identifying records in which customer identifiers exist, but the related customer reference field is not populated.

For example, Luna might discover:

  • Orders with an email address but no Contact ID
  • Cases with a phone number but no Account relationship
  • Loyalty transactions with a loyalty number but no linked customer profile

When identifier fields are populated but relationships between records are missing, it can indicate a customer context gap and a profile unification opportunity.

At NTO, Luna treats low population rates in customer reference fields as a signal that important customer interactions might remain disconnected from the broader customer profile.

Identify Fields to Use for Matching

After confirming the need for profile unification, the next step is to identify fields that can reliably help match customer records across objects and systems.

Good matching fields are not simply fields with unique values. They should also represent the customer rather than an internal employee or business process. Applying this logic, Luna automatically excludes Assistant Email.

Luna then uses data profiling to evaluate which fields are most appropriate for matching, based on distinct ratios and field fill rates. As a result, she excludes empty fields and fields that are likely to contain employee contact points such as Customer Owner Email.

Data profiling results from Cuneiform for Salesforce with contact point fields and metrics like population rate, record count, and distinct values.

By avoiding empty or inaccessible fields, such as Prior Email and Business Fax, Luna uses only relevant or accessible fields in her match process design.

The goal is not to use every available field. The goal is to identify the most reliable customer identifiers that support accurate customer recognition across systems and business processes.

Note

Never match on a single field, such as name or email address. Instead, use a combination of identifiers to reduce the risk of false positives. Exclude shared, placeholder, or unusually common contact points and identifiers that could incorrectly link unrelated customers together.

Let’s Recap

With the need for profile unification confirmed and potential fields identified for matching, Luna will prepare the data. Move on to the next unit to follow Luna as she completes preparing NTO’s data for profile unification.

Resources

Salesforce 도움말에서 Trailhead 피드백을 공유하세요.

Trailhead에 관한 여러분의 의견에 귀 기울이겠습니다. 이제 Salesforce 도움말 사이트에서 언제든지 새로운 피드백 양식을 작성할 수 있습니다.

자세히 알아보기 의견 공유하기