Skip to main content
A Trailblazer Community não estará disponível entre 19/10/2024 7:00PM PDT e 19/10/2024 10:00PM PDT. Planeje suas atividades de acordo com essa informação.

Create a Search Index Configuration

Learning Objectives

After this unit, you’ll be able to:

  • Describe how search index configurations and grounding works in Data Cloud.
  • Create a search index configuration via easy mode.

Ground Search on Unstructured Data with Search Index Configurations

Grounding search on unstructured and structured data enhances your use of generative AI, analytics, and automation tools across the Salesforce platform. Grounded search brings customer-specific data into applications like Einstein Copilot, Tableau, and Flow Builder, ensuring that outputs are finely tuned to your users’ intents and contexts. This alignment results in more accurate and relevant AI-generated content, deeper insights from analytics, and more efficient automation workflows for your teams and customers.

To ground search, you must break your unstructured data into semantically appropriate chunks and from those chunks, create vector embeddings—numerical representations of your chunked data. The chunked content, stored in the Data Cloud search index, is searchable from and can be used in Einstein generative AI applications (Prompt Builder and Einstein Copilot), and automation (Flow Builder) and analytics (Tableau) applications.

A graphical diagram shows the flow for creating and using a vector index.

Chunk Unstructured Data

In the previous unit, we covered how Data Cloud references unstructured data through unstructured data model objects (UDMOs). You can also chunk UDMOs or any DMOs with text fields, such as Salesforce Knowledge articles. This is what you’ll do in this unit.

When you chunk UDMOs or DMOs, you break them down into manageable, semantically meaningful chunks. These units of text are stored in Data Cloud in chunk data model objects (CDMOs), which are created from data model objects or unstructured data model objects.

Understand How Chunking Works

Data Cloud supports an HTML-based passage extraction chunking strategy and a token-based extraction strategy.

Semantic-based passage extraction uses the semantic meaning inherent in HTML tags to chunk a document into passages. HTML elements such as headings (<h1>, <h2>), lists (<ul>, <ol>), or bold text (<strong>) acting as a subheading are considered logical boundaries for passages.

Window-based extraction strategy uses block-level elements such as <div> and <p> tags, or raw text separated by line breaks to chunk documents into passages. If a paragraph doesn’t contain any HTML, the extraction is done at the sentence level.

With token-based extraction, chunked content is divided into tokens so that it can be vectorized, which is the process of turning data into numerical vectors that are machine-readable. Sentences, words, letters, or punctuation marks can be tokens. AI models typically specify a token count for generated responses. The number of tokens you can generate for a chunk depends on the embedding model you’re using.

Let’s see what happens after your data is chunked.

Create Vector Embeddings from Chunked Content

After Data Cloud chunks your content, it creates a vector embedding—a numerical representation of the chunked content that can be retrieved or used in your Salesforce generative AI, automation, or analytics applications.

Vector embeddings are numerical representations of text that store relationships between words or phrases. The embedding captures the semantic meaning of the content, so chunks of content that are semantically similar have similar vector embeddings. These representations help machines to process and comprehend language effectively.

In Data Cloud, vector embeddings are referenced by index data model objects (IDMOs), which we take a closer look at later in this unit.

Note

Read more about the vector embeddings and chunked content in the Salesforce Help.

Create Search Index Configurations

To get your unstructured data ready for search, you need to chunk and vectorize it. To do so, you create a search index configuration. You want to create a search index configuration for any data objects with text fields that contain informational concepts, narratives, or detailed descriptions that your users search to find relevant results. An example of such data are Salesforce Knowledge articles or other text documents (like chat transcripts) stored in an external blob store like Amazon S3.

Data Cloud provides two methods for creating a search index configuration.

  • Easy Setup allows Data Cloud to automatically apply defaults for the chunking and vectorization strategies. We use this method in this unit.
  • Advanced Setup gives you more control over chunking and vectorization in the Search Index setup.

Create a Search Index Configuration from Knowledge Articles with Easy Setup

In the previous unit, you created a data stream and data lake object from the Knowledge bundle in the Salesforce CRM connector, which provides a handful of sample Knowledge articles.

The Knowledge Article Version object is useful to index, as you can use this object to query, retrieve, or search across all types of articles depending on their version. The Knowledge Article Version object includes these fields that should be indexed for search.

  • Name: The name or title of the Knowledge article
  • Description: The description or summary of the Knowledge article, mapped from Summary
  • Custom text fields: Any rich text fields (131K limit) that hold unstructured data

Create a Search Index Configuration for the Knowledge Article Version DMO

You'll complete these steps in your Data Cloud org in order to pass the challenge at the end of this unit.

  1. If you haven’t already, launch your Data Cloud playground.
  2. From App Launcher, select Data Cloud.
  3. Click Search Index | New.
    If you don’t see Search Index in the Data Cloud navigation, click the More drop down menu, and then select Search Index.
  4. Click Easy Setup | Next.
  5. From the New Search Index Configuration page, select the Knowledge Article Version DMO, and click Next.
  6. When asked to supply a Search Index Configuration Name, replace the auto-generated name with My_kav. (The Search Index Configuration API Name is automatically populated).
  7. Click Save.

That’s it! Your new search index configuration, My_kav, is listed under the search index tab.

View the Knowledge Article Version CDMO and IDMOs

After you create a search index configuration, its status changes to Submitted and then to In-progress as it processes data from the source DMO/UDMO. If there are no failures, the status changes to Ready. You won’t see any records in Data Explorer until the search index status is Ready.

Note

It can take several minutes for Data Cloud to process the data in the search index, so go grab a beverage or stretch your legs. When you come back, click refresh and check if the search index status is Ready.

The most useful content in a Knowledge article is found in the Description field. Usually, the sample articles are small enough that there is just one chunk. This means that for each record in the Knowledge Article Version CDMO and IDMO, there's one chunk and one vector respectively, but lengthier content could have more records in each DMO.

Let’s take a quick look at the CDMO and IDMO we created for the Knowledge Article Version DMO.

  1. Confirm that the search index status is Ready.
  2. From Data Cloud, click Data Explorer.
  3. From the Object drop-down menu, select Data Model Object.
  4. From the Select an Object field, select My_kav chunk.
    Now you should be able to view a list of all the chunks Data Cloud created from the sample Knowledge articles.
  5. From the Select an Object field, select My_kav index.
    Now you should be able to view a list of all the vector records Data Cloud created from the sample Knowledge articles.

You can use the CDMO and IDMOs contained in the search index throughout Salesforce in applications like Flow Builder, Einstein Copilot, Prompt Builder, and even Tableau. Or check out the vector search docs to learn more about running vector search queries.

Connecting unstructured data to Data Cloud allows you to ground search results on a wealth of data for a variety of customer-focused use cases. By chunking and vectorizing that data, you can use vector search in Einstein generative AI applications, Flow Builder, and even Tableau to enhance your AI, analytics, and automation capabilities.

Resources

Compartilhe seu feedback do Trailhead usando a Ajuda do Salesforce.

Queremos saber sobre sua experiência com o Trailhead. Agora você pode acessar o novo formulário de feedback, a qualquer momento, no site Ajuda do Salesforce.

Saiba mais Continue compartilhando feedback