Skip to main content
Join us at TDX in San Francisco or on Salesforce+ on March 5-6 for the Developer Conference for the AI Agent Era. Register now.

Get to Know Search Index Types in Data Cloud

Learning Objectives

After completing this unit, you’ll be able to:

  • Describe the search indexes supported in Data Cloud.
  • Identify which search index to build for your use case.

Use Search in Data Cloud to Ground AI

Grounding AI on customer-specific data enhances the value of generative AI in applications, analytics, and automation tools across the Salesforce Platform. Grounding AI can be achieved with unstructured, semi-structured, or structured data. By using the user query to retrieve the relevant CRM data for grounding the AI model, applications like Agentforce, Tableau, and Flow Builder ensure that outputs are finely tuned to your users’ intent. Use search in Data Cloud to ensure accurate and relevant AI-generated content, deeper insights from analytics, and more efficient automation workflows for your teams and customers.

In Data Cloud you can build search indexes on any data, including unstructured data in knowledge bases. Data Cloud supports the following search index types.

  • Vector Search
  • Hybrid Search

To build search indexes in Data Cloud, bring your data into Data Cloud. Data Cloud ingests unstructured data, maps it to standard data model objects (DMO) or unstructured data model objects (UDMO), and creates meaningful content chunks from the data. Data Cloud then creates vector embeddings to build a search index that helps applications understand semantic and lexical similarities with the data.

Note

To learn the definitions of vector embeddings and other Data Cloud terms, refer to Data Cloud Glossary of Terms.

Select a Search Type

Before you decide which search type is best suited for your specific use case and data set, let us first dig into how these search types differ from each other and what type of search queries result in the most relevant response.

Vector Search

Vector search, also known as semantic search, involves retrieving semantically similar data (or data chunks) for a given search query. This data can also include videos, audio, and call transcripts. Vector search retrieval is done by chunking the data, creating vector embeddings, and searching for vector embeddings that have close semantic similarities to the search query.

Data from various data sources ingested into Data Cloud. Data Cloud chunks the data and creates vector embeddings to build a vector index. C360 applications like Tableau, Agentforce, and so on, then query this vector index and get relevant results.

Vector search works well for long-form search queries where the users are looking for general information. The search query retrieves data that has a high vector search score that correlates to closest semantic matches.

For example, here’s a query looking for information about how Google Chrome browser works. The search query retrieves chunks that have the highest vector search score which relates to the closest semantic match with the search query.

Query:

select c.Chunk_c, v.score_c from vector_search(table(WikiArticle_c_vector_search_2_index__dlm),
 'how does Google Chrome internet browser work', '', 100) as v join WikiArticle_c_vector_search_2_
chunk_dlm as c on v.SourceRecordId_c=c.RecordId_c ORDER by v.score_c desc limit 3;

Result:

The image shows query results for a vector search in the descending order of vector search score. Chunks of data that have the closest semantic match to the search query are at the top of the results.

Hybrid Search

Hybrid search combines the strengths of semantically aware vector search with the ability of keyword search to handle domain vocabulary. Hybrid search merges the retrieved information from both types of searches and then ranks the results using a fusion ranker function to show the most relevant information.

The default hybrid search fusion ranker function is optimized on internal benchmarks for a variety of search-based tasks. The training and evaluation data is based on actual captured queries from Einstein Search and Gen-AI applications like Einstein Search Answers.

Data from various sources ingested into Data Cloud. Data Cloud chunks the data and creates vector embeddings. From the chunked and vectorized data, Data Cloud builds a vector search index and a keyword search index. The hybrid search fusion ranker function then ranks the retrieved result and provides the most relevant response to C360 apps querying the data.

Hybrid search is a great option for long form search queries where specific search terms are also included. The search query retrieves data that has a high keyword search score that correlates to exact keyword matches and high vector search score that correlates to the closest semantic matches. This results in retrieving data with a high hybrid search score that correlates to the most relevant search results.

For the same query example we used for vector search, keyword search promotes higher ranking positions for more relevant content, thus providing the LLM with better grounding.

Query:

select c.Chunk__c, h.hybrid_score__c, h.keyword_score__c, h.vector_score__c from
hybrid_search(table(WikiArticle_c_hybrid_search_2_index__dlm), 'how does Google Chrome
internet browser work ?', '', 100) as h join WikiArticle_c_hybrid_search_2_chunk__dlm
as c on h.SourceRecordId__c=c.RecordId__c ORDER by h.hybrid_score__c desc limit 2;

Result:

The image shows query results for a hybrid search in the descending order of hybrid search score. Chunks of data that have the closest semantic and keyword match to the search query are at the top of the results.

Note

For the same query in the Google Chrome example used, hybrid search is much more powerful than a pure vector search as it returns chunks that include both information about how browsers work along with specific details on Google Chrome browser.

In Summary

Build search indexes in Data Cloud to ground AI on your organization’s unstructured, semi-structured, or structured data.

Select a search type that works best for the search queries from your end users and applications. If your users’ queries are mainly about general information or the queries are long (more than five words), then a vector search is sufficient for this scenario. Vector search provides relevant results when a user query has contextual content, which is usually longer queries.

To get the most accurate and relevant results that combine both semantic search matches and keyword search matches for a query, create a hybrid search index.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback