Learn the Basics of Large Language Models
Learning Objectives
After completing this unit, you’ll be able to:
- Describe a large language model (LLM).
- Explain how LLMs are trained.
- Achieve a better understanding of LLM fine-tuning.
What’s a Large Language Model?
Imagine you have a supersmart digital assistant that has read vast amounts of text, including text from books, articles, websites, and other written content up to the year 2021. However, it doesn't “contain” entire books in the way that a library does. Instead, it processes patterns from the textual data it is trained on.
You can ask this digital assistant any question, and it will try to give you an answer based on what it has “read.” It doesn't really “understand” like people do, but it's really good at remembering and connecting information.
That digital assistant is like a large language model (LLM). LLMs are advanced computer models designed to understand and generate humanlike text. They're trained on vast amounts of text data to learn patterns, language structures, and relationships between words and sentences.
How Do Large Language Models Work?
At their core, LLMs like GPT-3 predict one token (for example, a word or character) at a time, building a sequence from start to finish. Given a request, they try to predict the next token, and the next, and the next, and so on.
Predictions made by LLMs refer to their ability to generate or complete text based on patterns they've seen during training, performing impressive feats of pattern recognition over vast amounts of text. They can generate coherent and contextually relevant content across a wide range of topics.
The large part of large language models refers to the size and complexity of these models. They use significant computational resources, such as powerful servers with multiple processors and a lot of memory. These resources let the model handle and process huge amounts of data, which improves its ability to understand and generate high-quality text.
LLMs vary by size, but they typically contain billions of parameters. Parameters are the variables that the model learns during its training process, representing the knowledge and understanding it gains from the data. The more parameters, the more capacity the model has to learn and capture intricate patterns in the data.
To give you an idea of how many parameters LLMs use, earlier versions of the GPT (generative pre-trained transformer) models, like GPT-3, have around 175 billion parameters. These models are considered quite large and have significantly advanced the capabilities of language processing. GPT-4 is said to have over 1 trillion parameters.
These numbers are indeed impressive, but the sheer size of these models is also accompanied by challenges like the computational resources required to train them, their environmental impact, potential biases, and more.
Large language models are like incredibly knowledgeable virtual assistants that can help with a wide range of language-related tasks. They can assist in writing, provide information, offer creative suggestions, and even engage in conversation. The model creator's goal is to assist and make interactions with technology more natural and humanlike. However, users should be aware of their limitations and use them as a tool rather than an infallible source of truth.
What Is LLM Training?
Training an LLM is like teaching a robot how to understand and use human language. And how do you train a robot to understand and use human language? Here's one way you could do it.
- Gather books and articles. Imagine collecting a massive pile of books, articles, and other writings to teach the robot.
- Practice reading. You make the robot read a sentence, and then ask it to guess the next word. At first, it might guess randomly since it's still learning.
- Check answers. After the robot makes a guess, you show it the correct word from the actual text. If the robot's guess is wrong, you give it feedback, like saying, “Oops! That's not right.”
- Repeat. You keep doing this “guess and check” over and over, with tons of sentences. The robot starts getting better at guessing the next word as it reads more.
- Test. Occasionally, you test the robot with sentences it hasn't seen before to see if it's really learning or just memorizing.
- Specialize. If you want the robot to be especially good at, say, medical language, you might give it extra lessons with medical books.
- Graduate. Once the robot gets really good at understanding and generating text, say, “Great job!” and let it help people with various language tasks.
And that’s it! Training is like a mix of reading practice, quizzes, and special lessons until the robot becomes a language expert. The same basic idea applies to LLMs.
How Does Fine-Tuning Work?
Fine-tuning is the process of further training a pre-trained model on a new dataset that is smaller and more specific than the original training dataset.
Imagine you've taught a robot to cook dishes from all over the world using the world's biggest cookbook. That's the basic training. Now, let's say you want the robot to specialize in making just Italian dishes. You'd then give it a smaller, detailed Italian cookbook and have it practice those recipes. This specialized practice is like fine-tuning.
Fine-tuning is like taking a robot (or model) that knows a little bit about a lot of things, and then training it further on a specific topic until it becomes an expert in that area.
Why Is Fine-Tuning Important?
- Transfer learning: Pre-trained models have already learned a lot of generic features from their extensive training datasets. Fine-tuning allows these models to transfer that general knowledge to specific tasks with relatively small datasets.
- Efficiency: Training a deep learning model from scratch requires a lot of data and computational resources. With fine-tuning, you’re starting with a model that already knows a lot, so you can achieve good performance with less data and time.
- Better Performance: Models fine-tuned on specific tasks often outperform models trained from scratch on those tasks, as they benefit from the broader knowledge captured during their initial training.
What’s in a Version?
For each version, the underlying architecture might remain similar, but the scale, training data, or certain parameters can change. Each new version aims to improve upon the weaknesses of the previous one, handle a broader range of tasks, or reduce biases and errors.Here’s a simplified explanation.
Version 1 (for example, OpenAI’s GPT-1 or Google’s BERT-base)
- The beginning: The first release of the model. It works well, but it's like the first draft of a novel—there's room for improvement.
- Size and data: Uses a certain amount of data and has a particular number of parameters (like the “brain cells” of the model).
Version 2 (OpenAI’s GPT-2)
- Improvements: Based on the learnings from the first version, adjustments are made. It's like editing your novel based on feedback.
- Size and data: Often bigger with more parameters. Might be trained on more diverse or larger datasets.
Version 3 (OpenAI’s GPT-3)
- Even better: Incorporates more feedback, research, and technological advancements.
- Size and data: Much larger. For instance, GPT-3 has 175 billion parameters, making it much more capable but also requiring more resources.
Fine-tuned versions:
- After the main versions are released, sometimes there are specialized versions fine-tuned for specific tasks. It’s like taking a general novel and adapting it into a mystery, romance, or sci-fi version.
Other iterations:
- Models like BERT have variations (RoBERTa, DistilBERT, and so on) that are essentially different “versions” with tweaks in training strategy or architecture.
LLM versions are like consecutive editions of a book series, with each new release aiming to be a more refined, expansive, and captivating read.
Next, let’s look at how LLMs can be used with Salesforce.