Augment Prompts with Relevant Knowledge

Learning Objectives

After completing this unit, you’ll be able to:

Explain why retrieval augmented generation (RAG) improves the accuracy and relevance of LLM responses.
Describe what’s needed to set up and use RAG in your Salesforce org.

What Is Retrieval Augmented Generation?

Retrieval augmented generation is a popular framework for grounding large language model (LLM) prompts. Grounding means that you augment an LLM prompt with contextual, specific information in order to enhance the quality, accuracy, and relevance of the LLM-generated output.

RAG runtime flow: LLM prompt augmented with relevant information to instruct LLM response generation.

To break it down:

RAG retrieves relevant information from a knowledge store containing structured and unstructured content.
RAG augments the prompt by combining this information with the original prompt.
The LLM, instructed by the augmented prompt, generates a response.

Many LLMs were trained generally across the Internet on static and publicly available content. RAG adds information to a prompt that is accurate, up-to-date, and not available as part of the LLM's trained knowledge. It’s like supplementing the LLM’s capabilities by providing relevant information retrieved from a knowledge store that contains the latest, best version of the facts. With RAG, prompt template users can bring proprietary data to the LLM without retraining and fine-tuning the model, resulting in generated responses that are more pertinent to their context and use case.

RAG in Data Cloud

RAG in Data Cloud enables you to augment your prompt templates in Prompt Builder with unstructured content in Data Cloud. Imagine how much more valuable your LLM responses could be if you grounded your prompts with contextually relevant business data, such as service replies, cases, knowledge articles, conversation transcripts, RFP (request for proposal) responses, emails, or meeting notes?

Offline Preparation

To implement RAG, start by connecting structured and unstructured data that RAG uses to ground LLM prompts. Data Cloud uses a search index to manage structured and unstructured content in a search-optimized way.

Offline preparation steps: ingest, chunk, vectorize, and index.

Offline preparation involves the following tasks in Data Cloud.

Connect your unstructured data.
Create a search index configuration that chunks and vectorizes the content. Data Cloud supports two search options: vector search and hybrid search (beta). Hybrid searches combine vector + keyword search.
- Chunking breaks the text into smaller units, reflecting passages of the original content, such as sentences or paragraphs.
- Vectorization converts chunks into numeric representations of the text that capture semantic similarities.
Store and manage the search index.

When you create a search index, Data Cloud automatically creates a default retriever for it. This retriever is the resource that you embed in a prompt template to search for and return relevant information from the knowledge store. To support a variety of use cases for search, you can create custom retrievers in Einstein Studio that focus your search on the relevant subset of information to add to the prompt.

Integration and Run-time Use in Prompts

Once your offline preparation is complete, the final step is to embed the retriever in a prompt template and, optionally, to further customize search settings for that particular prompt.

RAG detailed runtime flow: query, vectorize, retrieve relevant content, augment, and submit to the LLM.

Each time a prompt template with a retriever is run:

The retriever is invoked with a dynamic query initiated from the prompt template.
The query is vectorized (converted to numeric representations). Vectorization enables search to find semantic matches in the search index (which is already vectorized).
The query retrieves the relevant context from the indexed data in the search index.
The original prompt is populated with the information retrieved from the search index.
The prompt is submitted to the LLM, which generates and returns the prompt response.

See RAG in Action

Check out the following video to see how easy it is to augment a prompt template using RAG.

Conclusion

RAG in Data Cloud allows you to improve the accuracy and relevance of your LLM responses by safely grounding your prompts with proprietary data from a harmonized data model. RAG in Data Cloud is integrated with the Einstein generative AI platform so that you can natively incorporate RAG functionality into out-of-the-box apps such as Prompt Builder and Agent Builder.

Hai bisogno di aiuto?