AWS Bedrock

Amazon Bedrock provides access to foundation models (FMs) from AI companies through a unified API. Build generative AI applications with text generation, embeddings, and image generation capabilities.

Table of Contents Core Concepts Common Patterns CLI Reference Best Practices Troubleshooting References Core Concepts Foundation Models

Pre-trained models available through Bedrock:

Claude (Anthropic): Text generation, analysis, coding Titan (Amazon): Text, embeddings, image generation Llama (Meta): Open-weight text generation Mistral: Efficient text generation Stable Diffusion (Stability AI): Image generation Model Access

Models must be enabled in your account before use:

Request access in Bedrock console Some models require acceptance of EULAs Access is region-specific Inference Types Type Use Case Pricing On-Demand Variable workloads Per token Provisioned Throughput Consistent high-volume Hourly commitment Batch Inference Async large-scale Discounted per token Common Patterns Invoke Model (Text Generation)

AWS CLI:

Invoke Claude

aws bedrock-runtime invoke-model \ --model-id anthropic.claude-3-sonnet-20240229-v1:0 \ --content-type application/json \ --accept application/json \ --body '{ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [ {"role": "user", "content": "Explain AWS Lambda in 3 sentences."} ] }' \ response.json

cat response.json | jq -r '.content[0].text'

boto3:

import boto3 import json

bedrock = boto3.client('bedrock-runtime')

def invoke_claude(prompt, max_tokens=1024): response = bedrock.invoke_model( modelId='anthropic.claude-3-sonnet-20240229-v1:0', contentType='application/json', accept='application/json', body=json.dumps({ 'anthropic_version': 'bedrock-2023-05-31', 'max_tokens': max_tokens, 'messages': [ {'role': 'user', 'content': prompt} ] }) )

result = json.loads(response['body'].read())
return result['content'][0]['text']

Usage

response = invoke_claude('What is Amazon S3?') print(response)

Streaming Response import boto3 import json

bedrock = boto3.client('bedrock-runtime')

def stream_claude(prompt): response = bedrock.invoke_model_with_response_stream( modelId='anthropic.claude-3-sonnet-20240229-v1:0', contentType='application/json', accept='application/json', body=json.dumps({ 'anthropic_version': 'bedrock-2023-05-31', 'max_tokens': 1024, 'messages': [ {'role': 'user', 'content': prompt} ] }) )

for event in response['body']:
    chunk = json.loads(event['chunk']['bytes'])
    if chunk['type'] == 'content_block_delta':
        yield chunk['delta'].get('text', '')

Usage

for text in stream_claude('Write a haiku about cloud computing.'): print(text, end='', flush=True)

Generate Embeddings import boto3 import json

bedrock = boto3.client('bedrock-runtime')

def get_embedding(text): response = bedrock.invoke_model( modelId='amazon.titan-embed-text-v2:0', contentType='application/json', accept='application/json', body=json.dumps({ 'inputText': text, 'dimensions': 1024, 'normalize': True }) )

result = json.loads(response['body'].read())
return result['embedding']

Usage

embedding = get_embedding('AWS Lambda is a serverless compute service.') print(f'Embedding dimension: {len(embedding)}')

Conversation with History import boto3 import json

bedrock = boto3.client('bedrock-runtime')

class Conversation: def init(self, system_prompt=None): self.messages = [] self.system = system_prompt

def chat(self, user_message):
    self.messages.append({
        'role': 'user',
        'content': user_message
    })

    body = {
        'anthropic_version': 'bedrock-2023-05-31',
        'max_tokens': 1024,
        'messages': self.messages
    }

    if self.system:
        body['system'] = self.system

    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-sonnet-20240229-v1:0',
        contentType='application/json',
        accept='application/json',
        body=json.dumps(body)
    )

    result = json.loads(response['body'].read())
    assistant_message = result['content'][0]['text']

    self.messages.append({
        'role': 'assistant',
        'content': assistant_message
    })

    return assistant_message

Usage

conv = Conversation(system_prompt='You are an AWS solutions architect.') print(conv.chat('What database should I use for a chat application?')) print(conv.chat('What about for time-series data?'))

List Available Models

List all foundation models

aws bedrock list-foundation-models \ --query 'modelSummaries[*].[modelId,modelName,providerName]' \ --output table

Filter by provider

aws bedrock list-foundation-models \ --by-provider anthropic \ --query 'modelSummaries[*].modelId'

Get model details

aws bedrock get-foundation-model \ --model-identifier anthropic.claude-3-sonnet-20240229-v1:0

Request Model Access

List model access status

aws bedrock list-foundation-model-agreement-offers \ --model-id anthropic.claude-3-sonnet-20240229-v1:0

CLI Reference Bedrock (Control Plane) Command Description aws bedrock list-foundation-models List available models aws bedrock get-foundation-model Get model details aws bedrock list-custom-models List fine-tuned models aws bedrock create-model-customization-job Start fine-tuning aws bedrock list-provisioned-model-throughputs List provisioned capacity Bedrock Runtime (Data Plane) Command Description aws bedrock-runtime invoke-model Invoke model synchronously aws bedrock-runtime invoke-model-with-response-stream Invoke with streaming aws bedrock-runtime converse Multi-turn conversation API aws bedrock-runtime converse-stream Streaming conversation Bedrock Agent Runtime Command Description aws bedrock-agent-runtime invoke-agent Invoke a Bedrock agent aws bedrock-agent-runtime retrieve Query knowledge base aws bedrock-agent-runtime retrieve-and-generate RAG query Best Practices Cost Optimization Use appropriate models: Smaller models for simple tasks Set max_tokens: Limit output length when possible Cache responses: For repeated identical queries Batch when possible: Use batch inference for bulk processing Monitor usage: Set up CloudWatch alarms for cost Performance Use streaming: For better user experience with long outputs Connection pooling: Reuse boto3 clients Regional deployment: Use closest region to reduce latency Provisioned throughput: For consistent high-volume workloads Security Least privilege IAM: Only grant needed model access VPC endpoints: Keep traffic private Guardrails: Implement content filtering Audit with CloudTrail: Track model invocations IAM Permissions { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": [ "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0", "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0" ] } ] }

Troubleshooting AccessDeniedException

Causes:

Model access not enabled in console IAM policy missing bedrock:InvokeModel Wrong model ID or region

Debug:

Check model access status

aws bedrock list-foundation-models \ --query 'modelSummaries[?modelId==anthropic.claude-3-sonnet-20240229-v1:0]'

Test IAM permissions

aws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::123456789012:role/my-role \ --action-names bedrock:InvokeModel \ --resource-arns "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"

ModelNotReadyException

Cause: Model is still being provisioned or temporarily unavailable.

Solution: Implement retry with exponential backoff:

import time from botocore.exceptions import ClientError

def invoke_with_retry(bedrock, body, max_retries=3): for attempt in range(max_retries): try: return bedrock.invoke_model( modelId='anthropic.claude-3-sonnet-20240229-v1:0', body=json.dumps(body) ) except ClientError as e: if e.response['Error']['Code'] == 'ModelNotReadyException': time.sleep(2 ** attempt) else: raise raise Exception('Max retries exceeded')

ThrottlingException

Causes:

Exceeded on-demand quota Too many concurrent requests

Solutions:

Request quota increase Implement exponential backoff Consider provisioned throughput ValidationException

Common issues:

Invalid model ID Malformed request body max_tokens exceeds model limit

Debug:

Check model-specific requirements

aws bedrock get-foundation-model \ --model-identifier anthropic.claude-3-sonnet-20240229-v1:0 \ --query 'modelDetails.inferenceTypesSupported'

References Bedrock User Guide Bedrock API Reference Bedrock Runtime API Model Parameters Bedrock Pricing

安装

Invoke Claude

Usage

Usage

Usage

Usage

List all foundation models

Filter by provider

Get model details

List model access status

Check model access status

Test IAM permissions

Check model-specific requirements