Explain the Cost Structure of Amazon Bedrock

Learning Objectives

After completing this unit, you’ll be able to:

Describe Amazon Bedrock pricing.
Analyze a sample real-world scenario, and choose the right pricing model.

How Much Does Amazon Bedrock Cost?

Amazon Bedrock offers several pricing models designed to fit different business needs and usage patterns.

On-Demand

You pay for what you use, with no long-term commitments.

On-Demand pricing can vary depending on the model you choose.

Text models: You're charged for the number of input and output tokens processed.
Embedding models: You're charged for the number of input tokens processed.
Image models: You're charged for every image generated.

The On-Demand model also supports cross-region inference for some models. With this support, you can manage traffic spikes across different AWS Regions while keeping costs tied to the source region.

Batch

With Batch mode, you can submit multiple prompts at once as a single input file and receive responses as a single output file. The responses are stored in your Amazon S3 bucket that you can access later. This approach offers a 50% lower price compared to on-demand pricing. The batch option is ideal for processing large amounts of data that don’t require immediate responses.

Provisioned Throughput

The Provisioned Throughput pricing model is built for high-volume workloads that need guaranteed performance. You can purchase model units for a specific base or custom model. A model unit ensures a specific processing speed that’s measured by the maximum number of input or output tokens processed per minute. If you choose Provisioned Throughput, you're charged by the hour, with commitment options for either 1-month or 6-month terms.

Model Customization

When you customize a model, you pay for two things: The training process (calculated by the number of tokens processed times the number of training rounds, known as epochs) and monthly model storage costs. Custom models require Provisioned Throughput, which starts with one model unit available without commitment. Additional throughput is available for either 1-month or 6-month terms.

Custom Model Import

You can import custom weights for supported model architectures in Amazon Bedrock. There's no charge to import a custom model. After you import the model, you start incurring costs using On-Demand mode. You're charged for inference based on the number of model copies and the duration of time each model copy is active (billed in 5-minute windows). Pricing might vary based on factors such as architecture, context, length, AWS Region, hardware generation, and more.

Choose the Right Pricing Model

Let’s look at the following scenario to better understand how Amazon Bedrock incurs costs with each pricing option.

Suppose an ecommerce company wants to add an AI-based chatbot. The company processes up to 10,000 daily customer inquiries, each using 100 input and 150 output tokens. Traffic peaks during business hours and holidays. The company also regularly updates product information and FAQs. Let’s take a look at how each model would handle this and things to consider.

Pricing Model	Details
On-Demand	This pricing model works well for daily operations. It accommodates varying volumes of inquiries, supports unexpected traffic spikes, and only charges for actual usage.
Batch	This pricing model is ideal for predictable tasks that don’t need instant responses. The company can use this pricing model for bulk changes to product details and FAQs, securing a 50% lower price compared to On-Demand inference pricing.
Provisioned Throughput	To accommodate large consistent inference workloads during high-volume periods, the company can consider adding Provisioned Throughput for a 1-month commitment term.
Model customization	A model customization would be valuable for managing unique product names and industry-specific terminology. However, the setup costs and time-intensive training period might be excessive for simple FAQ responses and product updates.
Custom model import	This approach reduces initial development costs compared to full customization and allows for quicker deployment. However, instead of investing in custom model import, the company could develop a ‌system of engineered prompts and response templates for its most common customer interactions. During peak holiday seasons, these templates can be quickly modified to include seasonal offerings and special promotions.

The company can use a mixed approach. Run daily operations with the On-Demand pricing, manage routine updates through Batch processing, and add Provisioned Throughput during high-volume periods. Additionally, the company can develop specialized prompts or escalate them to human agents. This approach lets the company minimize its AI implementation costs while still providing effective customer service.

Wrap Up

Selecting the right pricing model on Amazon Bedrock requires understanding your workload patterns and business needs. Remember that cost optimization often comes from a hybrid approach. Whether you choose a combination of On-Demand pricing for flexibility, Batch for routine tasks, or Provisioned Throughput for consistent high-volume periods, be ready to adapt your strategy as your needs evolve.

Estimación de tiempo

Temas

¿Necesita ayuda?