Skip to main content
Join us at TDX in San Francisco or on Salesforce+ on March 5-6 for the Developer Conference for the AI Agent Era. Register now.

Explore Large Language Model Data Masking

Learning Objectives

After completing this unit, you’ll be able to:

  • Explain the importance of masking sensitive information from large language models (LLMs).
  • Describe Einstein Trust Layer functionality.

Before You Start

We recommend completing the following content, which covers the Einstein Trust Layer and reviews creating prompt templates using Prompt Builder.

Introduction to Large Language Model Data Masking

Safeguarding your data with generative AI technology can be challenging. Einstein Trust Layer uses data masking to help prevent sensitive information from exposure to third-party large language models (LLMs). Hence the name LLM Data Masking. Let’s learn more about the process.

How LLM Data Masking Works

Detection and Masking Process

Einstein Trust Layer uses advanced pattern matching and machine learning techniques to detect sensitive data in prompts. After it’s identified, this data is masked, meaning it’s replaced with placeholder text. For example, let’s say the name Jim is the first instance of a person’s name in the prompt text. Jim is replaced by <Person_0>. By replacing real data with placeholder text in the prompt, the Einstein Trust Layer ensures that sensitive details aren’t exposed to the LLM.

Demasking

After receiving the generated response from the LLM, the Einstein Trust Layer reverts the masked data back to its original form. Demasking helps make the response you see accurate and relevant to the task at hand.

How to Prepare for Data Masking

Before you configure data masking, evaluate the data security and privacy risks to your business and your use cases posed by generative AI. One of the key risks that LLM data masking helps mitigate is data leakage. Data leakage can come from exposing private or personal information to third-party LLMs, such as customer names, credit card numbers, or Social Security numbers.

Salesforce has a zero data retention policy with third-party LLM providers like Open AI and Azure Open AI. This means that the data you send through the prompt isn’t retained by the LLM. Even with the zero data retention policy in place, you may still want to make sure sensitive data is not exposed to the LLM.

Take a look at some questions that can help you make a risk assessment and create a risk profile in order to determine your risk tolerance.

  • What are your current governance and security policies?
  • Is your company bound by data-handling regulations like General Data Protection Regulation (GDPR) or Payment Card Industry Data Security Standard (PCI DSS)? Some of these regulations require that you don’t expose certain types of sensitive data to LLMs.
  • What are your data residency requirements?
  • If your company is global, what country or region’s data do you need to protect?
  • Is the sensitive data necessary for grounding the prompt in order to generate the response that’s relevant and useful?
  • What are your AI application use cases? Will these use cases include sensitive data in the prompts?

For a more detailed discussion on risk and mitigation strategies and creating a risk profile, see AI + Data: Project Planning.

Also consider the impact of masking certain types of data and the impact on the quality of the response generated. Your specific use cases and testing is the key to striking the right balance between data security and the quality of the response.

Let’s walk through some examples to understand the impact better.

  1. In the prompt text, you’re asking the LLM to help with downloading an Amazon app to your device. However, because Amazon is a company name, it’s masked from the LLM. The LLM is unable to provide a useful response because it doesn’t understand which app you need help downloading.
  2. In the prompt text, you’re asking the LLM to summarize a sales order. The sales order is a 10-digit number that has a similar pattern to a US phone number. As a result, the sales order is masked because the LLM is preserving the privacy of the phone number before sending the prompt to the LLM. This can cause the LLM to generate a response that may not be accurate or useful.

How to Use the Einstein Trust Layer

You walk through the setup in detail in the next unit. For now, let’s gain insight into what’s enabled by default and what you can do to customize it further.

Einstein Trust Layer

After you set up Einstein AI with Data Cloud, LLM data masking is turned on by default for certain sensitive data types. This makes it easy to keep sensitive data safe. You can review the default settings and update them according to your organization’s governance policies and regulatory requirements.

To track data masking activities, the Einstein Trust Layer logs these actions in Data Cloud as part of an audit trail. You can use the prebuilt dashboards to verify data masking and demasking or create custom reports and dashboards.

Supported Data Entries and Languages

The Einstein Trust Layer supports a variety of data types for masking, including company names, credit card numbers, and email addresses. These entries are supported in multiple languages, making sure that your global operations can comply with data masking requirements. Read the Einstein Trust Layer Region and Language Support help article to see which data types and languages are supported.

In this unit, you learned about the importance of data masking and how it relates to the Einstein Trust Layer. In the next unit, you learn how to configure data masking in the Einstein Trust Layer and view data masking at run time in Prompt Builder.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback