Prepare Your Data
Learning Objectives
After completing this unit, you’ll be able to:
- Explain how to identify and resolve data challenges.
- Define data requirements for your project.
The Importance of Data Readiness
Your organization needs to be data-ready to start an AI project, which means the data for the project is accurate, available, accessible, and securely governed.
In many organizations, data quality is a huge barrier to implementing AI projects. And with good reason! Data is the foundation of AI algorithms, allowing them to learn, adapt, and make better decisions. High-quality data can improve the accuracy, efficiency, reliability, and fairness of AI systems.
It’s critical to address data quality issues before implementing your AI project. However, don’t let the idea of perfect data get in the way of the project. Many projects get stuck in data readiness because teams are trying to chase perfection. Instead, work with your team to identify reasonable goals for data readiness. You can use the Build stage to identify and address any gaps in your data that affect the AI output.
This unit gives an overview of how to assess your data quality and prepare your data for an AI project.
Create a Data Inventory
Becca knows the best way to get a complete view of the data for her project is to create a data inventory. A data inventory helps you manage diverse data assets and identify potential issues.
Follow these steps to create your data inventory.
- Identify what data you need in your project.
- Identify where the data is stored.
- Answer some questions about your data.
- Is the data type structured, unstructured, or semi-structured? (Learn more about data classification in Data Fundamentals for AI.)
- How often is your data refreshed?
- Is the data updated in real-time, hourly, daily, monthly, or static?
- How can the data be accessed?
- Have governance standards been implemented for the data?
- What are some data considerations that can cause challenges in your project?
- Is the data type structured, unstructured, or semi-structured? (Learn more about data classification in Data Fundamentals for AI.)
Coral Cloud’s Data Inventory
Let’s continue with Becca’s AI project to automate the check-in process for Coral Cloud Resorts. As a refresher, here’s Becca’s implementation plan, with key data points in bold.
- Use a flow to create a Guest Event record based on the latest reservation data in Data Cloud.
- Teach Agentforce how to launch the flow through conversational language. So for example, when guest Sofia Rodriguez arrives to begin her stay, the staff can simply ask Einstein to “Check in Sofia Rodriguez” and Einstein does the rest!
- Use Prompt Builder to generate a personalized welcome email that suggests excursions the guest might be interested in and send it.
Becca reviews her plan to figure out what data she needs to implement the solution.
- In step 1, she needs reservation data. Coral Cloud uses an external platform called Reserv-o-matic to store reservation data, so she uses Data Cloud to bring that data into Salesforce.
- In step 2, she needs to be able to retrieve reservation data based on the customer’s name. Customer data is available in Salesforce.
- In step 3, she needs data on previous excursions the guest purchased. Customer purchase history is also available in Salesforce.
After tracking down the required data sources, Becca creates a data inventory.
Data Name | Data Source | Type of Data | Update Cadence | Considerations |
---|---|---|---|---|
Contact records | CRM | Structured | Daily | Dates are in MM/DD/YY format |
Reservations | Reserv-o-matic | Structured | Real-time | Dates are in DD/MM/YY format. |
Excursions | CRM | Structured | Daily | Dates are in MM/DD/YY format |
Capture the Project’s Data Requirements
A project’s data requirements are the base requirements needed for your project to be successful. Understanding your data requirements reduces unnecessary work.
Assess Data Quality
High-quality data creates reliable and effective AI projects. (Learn more about assessing data quality in Data Quality.) As you assess your data quality, identify where your data’s falling short. These are areas for data cleaning. Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. This includes closing data gaps. Data cleaning can be time-consuming, so don’t clean data that you don’t need for your project.
As Becca notes in her data inventory, reservation dates are in DD/MM/YY format while Contact records and excursions are MM/DD/YY. The dates aren’t a consistent format, so they don’t meet the quality criteria. Becca writes a quick program to convert all reservation dates to MM/DD/YY format.
As Becca cleans up a few more data issues, she starts to realize there’s way too much data for her to make it perfect. Coral Cloud is a world-class resort with thousands of guests per year. She feels discouraged until she realizes she was overestimating the data requirements of her project, and cleaning reservations from past years. She only needs to clean reservations that are in the future, since those are the only reservations that will use the automated check-in. Becca filters the reservations by future dates. By understanding the project’s data requirements, she now has way fewer records to work through.
Migrate and Integrate Data
When you have data from multiple sources, you need to migrate your data. This means bringing data from one source into a central source. If your project is built in Salesforce, bring your external data into Salesforce. After you migrate data, integrate it by combining data from different sources into a unified, comprehensive view. Only migrate and integrate data that’s needed for your project. This helps keep your project manageable and avoid cluttering your system with unnecessary data.
Since Becca’s project involves creating a Guest Event record based on the reservation data in Reserv-o-matic and the Contact record in Salesforce, she knows she needs to link the reservation data to the Contact record. Otherwise, the flow won’t know which reservation belongs to which contact. Becca doesn’t want to integrate unneeded data, so she takes a look at the reservation records to identify which fields are unnecessary. She sees that reservations have a Notes field for customers to put in special requests. There’s no specific format, and many customers leave it blank. Becca doesn’t need the Notes field to create a Guest Event record, so she deletes this field before she migrates the reservations into Salesforce.
Becca sets up a Data Stream to bring data in from Reserv-o-matic. She then uses Identity Resolution to match the Sofia in Salesforce to the Sofia in Reserv-o-matic. Now, Sofia’s record has both her contact details from Salesforce and her reservation details from Reserv-o-matic.
Establish Data Governance
The fewer people working with your data, the more consistent it will be. Limit governance to the necessary people. In Becca’s case, she only gives herself and her manager access.
Plan Analytics
Come up with an analytics plan to measure success. This is important for monitoring performance and demonstrating the return on investment (ROI) of your project. Demonstrating ROI is key to getting support for developing your project further or for future AI projects.
The analytics plan should align with the project goals you outlined in the previous unit. As a refresher, here’s Becca’s project goals.
- Reduce check-in time by 50%.
- Maintain customer satisfaction at the same level as or higher than before the project.
She decides how to collect and analyze data to measure whether her project met these goals. Becca comes up with this plan.
- Calculate screen time on the front desk computers at the end of each day. Compare the average screen time before and after implementing the AI check-in process.
- Offer an optional survey at the end of each guest’s stay where they can rate their satisfaction. Compare the average satisfaction before and after implementing the AI check-in process.
Now Becca has a concrete way to demonstrate the impact of her project.
Resolve the Data Challenges
After defining the requirements for her project, Becca finishes resolving the most critical data challenges, which typically include quality issues, integration hurdles, gaps in the data, and sometimes even outdated data infrastructure. Becca knows that if she doesn’t resolve the issues early on, Coral Cloud’s new AI project might be built on unreliable or inaccurate data.
Becca is making a ton of progress on her project so far! She wrangles data like a real pro. In the next unit, learn how Becca assesses the risks of her AI project and implements the project in a trustworthy, responsible way.