Assess the Need for Customer Profile Unification
Learning Objectives
After completing this unit, you’ll be able to:
- Determine whether customer profile unification is needed based on data profiling results.
- Identify which fields and contact points are suitable for matching.
Start with the Data, Not Assumptions
Before you choose or build a profile unification solution, you should first confirm that fragmented customer data is actually impacting business processes or customer interactions.
Users might report:
- Duplicate outreach
- Inconsistent customer recognition
- Missing transaction history
- Disconnected service interactions
- Incomplete customer context across systems
However, if you don’t understand how customer data is fragmented—or why—before you implement a fix, you risk incomplete solutions, unnecessary complexity, or, worse, incorrect combinations of different customers’ identities.
Instead, you should begin by profiling your data to understand:
- How customer data is fragmented across systems or processes.
- Whether records are disconnected, duplicated, or missing context.
- Whether identifiers are reliable enough to support matching.
- Which fields or contact points might create incorrect matches.
- Whether cleanup, normalization, or enrichment is required before profile unification begins.
Investigate Customer Context Gaps
In Data Profiling Fundamentals, you were introduced to many data profiling insights commonly used in data quality management. Data profiling helps answer a fundamental question: Does the organization have a customer recognition problem—and, if so, why?
At Northern Trail Outfitters (NTO), service teams report that agents can’t always see a customer’s complete history across orders, support interactions, loyalty programs, and guest checkouts.
Before configuring match rules or choosing a unification approach, Luna first profiles the customer-related objects and systems involved in these interactions.
She reviews profiling insights such as:
- Fill rates for key identifiers.
- Shared or placeholder contact points.
- Distinct value counts.
- Frequency distributions.
- Disconnected transactions and interactions.
These insights help Luna determine whether:
- Customer context is fragmented.
- Data from an object should be used to build or link to a customer profile.
- Records are suitable for matching.
- Additional cleanup, normalization, or enrichment is required before profile unification can begin.
Luna looks for the following key signals in data profiling results.
Distinct Ratio Quantifies Data Fragmentation Risk
A distinct ratio helps Luna identify fields that can indicate customer data fragmentation. Fields commonly used for customer interaction—such as email address, phone number, or loyalty ID—often contain many distinct values because they relate to individual customers rather than a smaller group of internal employees or operational users.
Distinct ratio measures how often a value appears in a field relative to the total number of individual values in that field.
- A 100% ratio means every value in the field is distinct.
- A distinct ratio below 100% indicates that some values in that field are repeated. If the field is intended as a contact point or identifier, this indicates a data duplication or fragmentation risk.
When a contact-point field contains many distinct values but the distinct ratio is less than 100%, it can indicate that the same identifier appears across multiple records. This can signal potential duplicates, disconnected interactions, or fragmented customer history that should be investigated before profile unification begins.
To start, Luna compares distinct ratios across contact-point and identifier fields to identify which fields are more likely to represent customer interactions versus internal operational activity. Fields with very low distinct ratios—such as less than 10%—often indicate shared, repeated, or operational values associated with employees, queues, or system processes. Fields with higher distinct ratios—such as above 25%—are more likely to represent customer-specific identifiers because they contain a wider variety of values across records.
These thresholds are not fixed rules. The expected distinct ratio for a field varies based on the organization, industry, business process, and data source. For example, a B2C organization might expect very high distinct ratios for customer email fields, while some B2B processes can legitimately reuse shared contact points across accounts.
Top-Value Frequency Indicates Placeholder or Shared Identifiers
Top-value frequency shows how frequently the most common value appears in a field compared to all other values. When a small number of values appear disproportionately often, it can signal placeholder data, default entries, or shared identifiers that weaken customer recognition. Luna might see many records with the same values, such as:
- test@test.com
- unknown@email.com
- N/A
- A shared phone number like 555-555-5555
In NTO’s data, these values might appear in fields intended to uniquely identify a customer, such as email or phone number. While each record appears valid because it is populated, these shared values make it difficult or impossible to distinguish one customer from another or to reliably match records across objects and systems.
Luna treats high top-value concentration in identifier fields as a red flag. It suggests that the data might be technically complete but not meaningful enough to support accurate matching or unification decisions.
Low Fill Rates in Fields Indicate the Existence of Invisible Duplicates
Low fill rates in identifier fields can indicate that invisible duplicates might exist within the dataset. When important matching fields such as email address, phone number, loyalty ID, or mailing address are missing, related records might not contain enough shared information to be confidently matched.
Disconnected Transactions Can Create Fragmented Customer Understanding
Not all customer context gaps are caused by duplicate records. Sometimes transactions or interactions exist, but they’re not connected to a customer profile.
For example:
- A guest checkout order can contain an email address but no Contact reference.
- A support case can include a phone number, but isn’t linked to an Account or Contact.
- A loyalty interaction can contain a loyalty ID without a connected customer profile.
These disconnected transactions can prevent teams from seeing a customer’s complete history across systems and channels.
Luna looks for this type of fragmentation by profiling engagement or transactional objects (such as Orders and Cases) and identifying records in which customer identifiers exist, but the related customer reference field is not populated.
For example, Luna might discover:
- Orders with an email address but no Contact ID
- Cases with a phone number but no Account relationship
- Loyalty transactions with a loyalty number but no linked customer profile
When identifier fields are populated but relationships between records are missing, it can indicate a customer context gap and a profile unification opportunity.
At NTO, Luna treats low population rates in customer reference fields as a signal that important customer interactions might remain disconnected from the broader customer profile.
Identify Fields to Use for Matching
After confirming the need for profile unification, the next step is to identify fields that can reliably help match customer records across objects and systems.
Good matching fields are not simply fields with unique values. They should also represent the customer rather than an internal employee or business process. Applying this logic, Luna automatically excludes Assistant Email.
Luna then uses data profiling to evaluate which fields are most appropriate for matching, based on distinct ratios and field fill rates. As a result, she excludes empty fields and fields that are likely to contain employee contact points such as Customer Owner Email.

By avoiding empty or inaccessible fields, such as Prior Email and Business Fax, Luna uses only relevant or accessible fields in her match process design.
The goal is not to use every available field. The goal is to identify the most reliable customer identifiers that support accurate customer recognition across systems and business processes.
Let’s Recap
With the need for profile unification confirmed and potential fields identified for matching, Luna will prepare the data. Move on to the next unit to follow Luna as she completes preparing NTO’s data for profile unification.
