Get to Know Search Index Types in Data Cloud
Learning Objectives
After completing this unit, you’ll be able to:
- Describe the search indexes supported in Data Cloud.
- Identify which search index to build for your use case.
Use Search in Data Cloud to Ground AI
Grounding AI on customer-specific data enhances the value of generative AI in applications, analytics, and automation tools across the Salesforce Platform. Grounding AI can be achieved with unstructured, semi-structured, or structured data. By using the user query to retrieve the relevant CRM data for grounding the AI model, applications like Agentforce, Tableau, and Flow Builder ensure that outputs are finely tuned to your users’ intent. Use search in Data Cloud to ensure accurate and relevant AI-generated content, deeper insights from analytics, and more efficient automation workflows for your teams and customers.
In Data Cloud you can build search indexes on any data, including unstructured data in knowledge bases. Data Cloud supports the following search index types.
- Vector Search
- Hybrid Search
To build search indexes in Data Cloud, bring your data into Data Cloud. Data Cloud ingests unstructured data, maps it to standard data model objects (DMO) or unstructured data model objects (UDMO), and creates meaningful content chunks from the data. Data Cloud then creates vector embeddings to build a search index that helps applications understand semantic and lexical similarities with the data.
Select a Search Type
Before you decide which search type is best suited for your specific use case and data set, let us first dig into how these search types differ from each other and what type of search queries result in the most relevant response.
Vector Search
Vector search, also known as semantic search, involves retrieving semantically similar data (or data chunks) for a given search query. This data can also include videos, audio, and call transcripts. Vector search retrieval is done by chunking the data, creating vector embeddings, and searching for vector embeddings that have close semantic similarities to the search query.
Vector search works well for long-form search queries where the users are looking for general information. The search query retrieves data that has a high vector search score that correlates to closest semantic matches.
For example, here’s a query looking for information about how Google Chrome browser works. The search query retrieves chunks that have the highest vector search score which relates to the closest semantic match with the search query.
Query:
select c.Chunk_c, v.score_c from vector_search(table(WikiArticle_c_vector_search_2_index__dlm), 'how does Google Chrome internet browser work', '', 100) as v join WikiArticle_c_vector_search_2_ chunk_dlm as c on v.SourceRecordId_c=c.RecordId_c ORDER by v.score_c desc limit 3;
Result:
Hybrid Search
Hybrid search combines the strengths of semantically aware vector search with the ability of keyword search to handle domain vocabulary. Hybrid search merges the retrieved information from both types of searches and then ranks the results using a fusion ranker function to show the most relevant information.
The default hybrid search fusion ranker function is optimized on internal benchmarks for a variety of search-based tasks. The training and evaluation data is based on actual captured queries from Einstein Search and Gen-AI applications like Einstein Search Answers.
Hybrid search is a great option for long form search queries where specific search terms are also included. The search query retrieves data that has a high keyword search score that correlates to exact keyword matches and high vector search score that correlates to the closest semantic matches. This results in retrieving data with a high hybrid search score that correlates to the most relevant search results.
For the same query example we used for vector search, keyword search promotes higher ranking positions for more relevant content, thus providing the LLM with better grounding.
Query:
select c.Chunk__c, h.hybrid_score__c, h.keyword_score__c, h.vector_score__c from hybrid_search(table(WikiArticle_c_hybrid_search_2_index__dlm), 'how does Google Chrome internet browser work ?', '', 100) as h join WikiArticle_c_hybrid_search_2_chunk__dlm as c on h.SourceRecordId__c=c.RecordId__c ORDER by h.hybrid_score__c desc limit 2;
Result:
In Summary
Build search indexes in Data Cloud to ground AI on your organization’s unstructured, semi-structured, or structured data.
Select a search type that works best for the search queries from your end users and applications. If your users’ queries are mainly about general information or the queries are long (more than five words), then a vector search is sufficient for this scenario. Vector search provides relevant results when a user query has contextual content, which is usually longer queries.
To get the most accurate and relevant results that combine both semantic search matches and keyword search matches for a query, create a hybrid search index.
Resources
- Salesforce Help: Unstructured Data in Data Cloud
- Salesforce Help: Vector Search
- Salesforce Help: Hybrid Search (Beta)