Skip to main content
Join Trailblazers for Dreamforce 2024 in San Francisco or on Salesforce+ from September 17-19. Register now

Explore Image Generation Models

Learning Objectives

After completing this unit, you’ll be able to:

  • Describe the advantages to using diffusion models over generative adversarial networks.
  • Identify popular generative AI tools and describe their uses.

Moving from Words to Imagery

While generative artificial intelligence (gen AI) is a relatively new technology, it’s already helping people and organizations work more efficiently. Maybe you’ve used it to summarize meeting notes, make a first-pass outline for a writing project, or create some code. These applications of generative AI tools all have something in common: They’re only focused on creating text in one form or another.

There’s another world of gen AI tools that can create high-quality images, 3D objects, and animations, all using the power of large language models (LLMs). So if you’ve begun using gen AI to supercharge writing tasks, it’s likely you can benefit from using gen AI to enhance your work with imagery and animations.

In this badge you learn about some of the current, rapidly improving capabilities of generative AI in the multimedia space. You discover ways to effectively incorporate gen AI into your workflow. And you reflect on some of the challenging questions surrounding the responsible use of generative AI for imagery creation.

Note

This module references concepts such as AI model training/machine learning, large language models, and data quality/bias. If you need a review of those concepts, check out the Get Started with Artificial Intelligence trail.

Advances in AI Models

Let’s take a moment to appreciate how this world has been affected by large language models. Before LLMs really took off, for years researchers had been training AI to produce imagery. But those models have been limited in some pretty significant ways.

For example, one type of neural network architecture that showed promise was the generative adversarial network (GAN). In short, two networks were set up to play a cat-and-mouse game. One would try to create realistic images, the other would try to distinguish between those generated images and real images. Over time, the first network became very good at tricking the second.

This method is capable of generating very convincing images of all sorts of subjects, including people. However GANs usually excel at creating images of just one kind of subject. So a GAN that’s great at creating images of cats would be terrible at creating images of mice. There’s also the possibility that a GAN will experience “mode collapse,” where the first network creates the same image again and again, because that image is known to always trick the second. An AI that only creates one image isn’t exactly useful.

What would be really useful is an AI model that could create images of a variety of subjects, whether we ask for a cat, a mouse, or a cat in a mouse costume.

A cute, hand-drawn image of a cat wearing a mouse costume.

AI-generated image using DreamStudio at stability.ai with the prompt: “A cute, hand-drawn image of a cat wearing a mouse costume.”

As the above ai-generated image demonstrates, those models already exist! They’re known as diffusion models because the underlying math relates to the physical phenomenon of something diffusing, like a drop of dye in a glass of water. Like most AI models, the technical details are the stuff of incredibly complex research papers.

The important thing to know is that diffusion models are trained to make connections between images and text. It helps that there are a lot of captioned cat pictures on the internet. With enough samples, a model can extract the essence of “cat,” “mouse,” and “costume.” Then, it embeds that essence into a generated image using diffusion principles. It’s complicated, but the results are often stunning.

The number of available diffusion models is growing by the day, but four of the most well-known are DALL-E, Imagen, Stable Diffusion, and Midjourney. Each differs in the data used for training, the way it embeds language details, and how users can interact with it to control output. So results differ significantly from tool to tool. And what one model does well today, another might do better tomorrow as research and development speeds forward.

Uses of Generative AI for Imagery

Generative AI can do more than just create cute cat cartoons. Often gen AI models are fine-tuned and combined with other algorithms and AI models. This allows artists and tinkerers alike to create, manipulate, and animate imagery in a variety of ways. Let’s check out some examples.

Text-to-Image

You can achieve an incredible amount of artistic variety using text-to-image gen AI. In our example, we chose a hand-drawn style of a cat. But we could have gone for hyperrealistic, or represented the scene as a tiled mosaic. If you can imagine it, diffusion models can interpret your intention with some success.

In the next unit you learn tips for how to get the best results, but for now understand that the first limit to what you can create is what you can imagine. Browse what others are creating with the different diffusion models.

The ability to use image generation inline with text generation has emerged recently. So, as you develop a story with some GPT tools, they can use the context to generate an image. Even better, if you need another image that includes the same subject, like our costume cat, those models can use the first image as reference to maintain character consistency.

Text-to-3D Model

Typically, the tools to create 3D models are technical and require a high level of skill to master. Yet, we’re at a time when 3D models are appearing in more places than ever, from commerce, to manufacturing, to entertainment. Let generative AI help meet some of the demand. Models like the one used for DreamFusion can generate amazing 3D models, along with supporting resources to describe the coloring, lighting, and material properties of the models.

Image-to-Image

If a picture is worth a thousand words, imagine how useful it is as part of the prompt for a generative AI model! Some models are trained to extract meaning from pictures, using similar training that allows for text-to-image generation. This two-way translation is the basis for the following use cases.

  • Style transfer: Start with a simple sketch and a description of what’s happening in the scene and let gen AI fill in all of the details. The output can be in a specific kind of artistic style, like a Renaissance painting or an architectural drawing. Some artists do this iteratively to build an image.
  • Paint out details: Imagine you visit the Leaning Tower of Pisa and get a great photo of yourself pretending to hold up the tower with your own strength. Unfortunately, 20 other people are in the picture doing the same thing. No worries, now you can cut them out and let AI fill in the gaps with realistic grass and sky for a pristine photo.
  • Paint in details: What might it look like to put a party hat on a panther? There’s a dangerous way of finding out, or the much safer way of using generative AI. Tools are used to identify specific locations for items in a scene, and like magic, they appear as if they were always there.
  • Extend picture borders: Generative AI uses the context of the picture to continue what is likely to appear beyond the border of a scene.

Animation

Because there’s a certain amount of randomness inherent to every generated image, creating a series of slightly different images is its own challenge for generative AI. So when you play one image after the other, the variations jump out, lines and shapes shifting and shimmering. But researchers have developed methods of reducing that effect so generated animations have an acceptable level of consistency.

All of the previous use cases for still imagery can be adapted to animation in some way. For example, style transfer can take a video of a skateboarder doing a trick and turn it into an anime style video. Or use a model trained on speech patterns to animate the lips of a generated 3D character.

There are enormous possibilities to create stunning imagery with generative AI. In the next unit, you learn responsible ways to make use of generative AI’s capabilities.

Resources