Cloud API Integration Skill

File Organization

Split structure. Main SKILL.md for core patterns. See
references/
for complete implementations.
1. Overview
Risk Level: HIGH - Handles API credentials, processes untrusted prompts, network exposure, data privacy concerns You are an expert in cloud AI API integration with deep expertise in Anthropic Claude, OpenAI GPT-4, and Google Gemini APIs. Your mastery spans secure credential management, prompt security, rate limiting, error handling, and protection against LLM-specific vulnerabilities. You excel at: Secure API key management and rotation Prompt injection prevention for cloud LLMs Rate limiting and cost optimization Multi-provider fallback strategies Output sanitization and data privacy Primary Use Cases : JARVIS cloud AI integration for complex tasks Fallback when local models insufficient Multi-modal processing (vision, code) Enterprise-grade reliability with security 2. Core Principles TDD First - Write tests before implementation. Mock all external API calls. Performance Aware - Optimize for latency, cost, and reliability with caching and connection reuse. Security First - Never hardcode keys, sanitize all inputs, filter all outputs. Cost Conscious - Track usage, set limits, cache repeated queries. Reliability Focused - Multi-provider fallback with circuit breakers. 3. Implementation Workflow (TDD) Step 1: Write Failing Test First

tests/test_cloud_api.py

import pytest from unittest . mock import AsyncMock , patch , MagicMock from src . cloud_api import SecureClaudeClient , CloudAPIConfig class TestSecureClaudeClient : """Test cloud API client with mocked external calls.""" @pytest . fixture def mock_config ( self ) : return CloudAPIConfig ( anthropic_key = "test-key-12345" , timeout = 30.0 ) @pytest . fixture def mock_anthropic_response ( self ) : """Mock Anthropic API response.""" mock_response = MagicMock ( ) mock_response . content = [ MagicMock ( text = "Test response" ) ] mock_response . usage . input_tokens = 10 mock_response . usage . output_tokens = 20 return mock_response @pytest . mark . asyncio async def test_generate_sanitizes_input ( self , mock_config , mock_anthropic_response ) : """Test that prompts are sanitized before sending.""" with patch ( 'anthropic.Anthropic' ) as mock_client : mock_client . return_value . messages . create . return_value = mock_anthropic_response client = SecureClaudeClient ( mock_config ) result = await client . generate ( "Test " )

Verify sanitization was applied

call_args

mock_client . return_value . messages . create . call_args assert "<script>" not in str ( call_args ) assert result == "Test response" @pytest . mark . asyncio async def test_rate_limiter_blocks_excess_requests ( self ) : """Test rate limiting blocks requests over threshold.""" from src . cloud_api import RateLimiter limiter = RateLimiter ( rpm = 2 , daily_cost = 100 ) await limiter . acquire ( 100 ) await limiter . acquire ( 100 ) with pytest . raises ( Exception ) :

RateLimitError

await limiter . acquire ( 100 ) @pytest . mark . asyncio async def test_multi_provider_fallback ( self , mock_config ) : """Test fallback to secondary provider on failure.""" from src . cloud_api import MultiProviderClient with patch ( 'src.cloud_api.SecureClaudeClient' ) as mock_claude : with patch ( 'src.cloud_api.SecureOpenAIClient' ) as mock_openai : mock_claude . return_value . generate = AsyncMock ( side_effect = Exception ( "Rate limited" ) ) mock_openai . return_value . generate = AsyncMock ( return_value = "OpenAI response" ) client = MultiProviderClient ( mock_config ) result = await client . generate ( "test prompt" ) assert result == "OpenAI response" mock_openai . return_value . generate . assert_called_once ( ) Step 2: Implement Minimum to Pass

src/cloud_api.py

class SecureClaudeClient : def init ( self , config : CloudAPIConfig ) : self . client = Anthropic ( api_key = config . anthropic_key . get_secret_value ( ) ) self . sanitizer = PromptSanitizer ( ) async def generate ( self , prompt : str ) -

str : sanitized = self . sanitizer . sanitize ( prompt ) response = self . client . messages . create ( model = "claude-sonnet-4-20250514" , messages = [ { "role" : "user" , "content" : sanitized } ] ) return self . _filter_output ( response . content [ 0 ] . text ) Step 3: Refactor with Patterns Apply caching, connection pooling, and retry logic from Performance Patterns. Step 4: Run Full Verification

Run all tests with coverage

pytest tests/test_cloud_api.py -v --cov = src.cloud_api --cov-report = term-missing

Run security checks

bandit -r src/cloud_api.py

Type checking

mypy src/cloud_api.py --strict 4. Performance Patterns Pattern 1: Connection Pooling

Good: Reuse HTTP connections

import httpx class CloudAPIClient : def init ( self ) : self . _client = httpx . AsyncClient ( limits = httpx . Limits ( max_connections = 100 , max_keepalive_connections = 20 ) , timeout = httpx . Timeout ( 30.0 ) ) async def request ( self , endpoint : str , data : dict ) -

dict : response = await self . _client . post ( endpoint , json = data ) return response . json ( ) async def close ( self ) : await self . _client . aclose ( )

Bad: Create new connection per request

async def bad_request ( endpoint : str , data : dict ) : async with httpx . AsyncClient ( ) as client :

New connection each time!

return await client . post ( endpoint , json = data ) Pattern 2: Retry with Exponential Backoff

Good: Smart retry with backoff

from tenacity import retry , stop_after_attempt , wait_exponential , retry_if_exception_type class CloudAPIClient : @retry ( stop = stop_after_attempt ( 3 ) , wait = wait_exponential ( multiplier = 1 , min = 2 , max = 10 ) , retry = retry_if_exception_type ( ( RateLimitError , APIConnectionError ) ) ) async def generate ( self , prompt : str ) -

str : return await self . _make_request ( prompt )

Bad: No retry or fixed delay

async def bad_generate ( prompt : str ) : try : return await make_request ( prompt ) except Exception : await asyncio . sleep ( 1 )

Fixed delay, no backoff!

return await make_request ( prompt ) Pattern 3: Response Caching

Good: Cache repeated queries with TTL

from functools import lru_cache import hashlib from cachetools import TTLCache class CachedCloudClient : def init ( self ) : self . _cache = TTLCache ( maxsize = 1000 , ttl = 300 )

5 min TTL

async def generate ( self , prompt : str , ** kwargs ) -

str : cache_key = self . _make_key ( prompt , kwargs ) if cache_key in self . _cache : return self . _cache [ cache_key ] result = await self . _client . generate ( prompt , ** kwargs ) self . _cache [ cache_key ] = result return result def _make_key ( self , prompt : str , kwargs : dict ) -

str : content = f" { prompt } : { sorted ( kwargs . items ( ) ) } " return hashlib . sha256 ( content . encode ( ) ) . hexdigest ( )

Bad: No caching

async def bad_generate ( prompt : str ) : return await client . generate ( prompt )

Repeated identical calls!

Pattern 4: Batch API Calls

Good: Batch multiple requests

import asyncio class BatchCloudClient : async def generate_batch ( self , prompts : list [ str ] ) -

list [ str ] : """Process multiple prompts concurrently with rate limiting.""" semaphore = asyncio . Semaphore ( 5 )

Max 5 concurrent

async def limited_generate ( prompt : str ) -

str : async with semaphore : return await self . generate ( prompt ) tasks = [ limited_generate ( p ) for p in prompts ] return await asyncio . gather ( * tasks )

Bad: Sequential processing

async def bad_batch ( prompts : list [ str ] ) : results = [ ] for prompt in prompts : results . append ( await client . generate ( prompt ) )

One at a time!

return results Pattern 5: Async Request Handling

Good: Fully async with proper context management

class AsyncCloudClient : async def aenter ( self ) : self . _client = httpx . AsyncClient ( ) return self async def aexit ( self , * args ) : await self . _client . aclose ( ) async def generate ( self , prompt : str ) -

str : response = await self . _client . post ( self . endpoint , json = { "prompt" : prompt } , timeout = 30.0 ) return response . json ( ) [ "text" ]

Usage

async with AsyncCloudClient ( ) as client : result = await client . generate ( "Hello" )

Bad: Blocking calls in async context

def bad_generate ( prompt : str ) : response = requests . post ( endpoint , json = { "prompt" : prompt } )

Blocks!

return response . json ( ) 5. Core Responsibilities 5.1 Security-First API Integration When integrating cloud AI APIs, you will: Never hardcode API keys - Always use environment variables or secret managers Treat all prompts as untrusted - Sanitize user input before sending Filter all outputs - Prevent data exfiltration and injection Implement rate limiting - Protect against abuse and cost overruns Log securely - Never log API keys or sensitive prompts 5.2 Cost and Performance Optimization Select appropriate model tier based on task complexity Implement caching for repeated queries Use streaming for better user experience Monitor usage and set spending alerts Implement circuit breakers for failed APIs 5.3 Privacy and Compliance Minimize data sent to cloud APIs Never send PII without explicit consent Implement data retention policies Use API features that disable training on data Document data flows for compliance 6. Technical Foundation 6.1 Core SDKs & Versions Provider Production Minimum Notes Anthropic anthropic>=0.40.0

=0.25.0 Messages API support OpenAI openai>=1.50.0 =1.0.0 Structured outputs Gemini google-generativeai>=0.8.0 - Latest features 6.2 Security Dependencies

requirements.txt

anthropic

= 0.40 .0 openai = 1.50 .0 google - generativeai = 0.8 .0 pydantic = 2.0

Input validation

httpx

= 0.27 .0

HTTP client with timeouts

tenacity

= 8.0

Retry logic

structlog

= 23.0

Secure logging

cryptography

= 41.0

Key encryption

cachetools

= 5.0

Response caching

Implementation Patterns Pattern 1: Secure API Client Configuration from pydantic import BaseModel , SecretStr , Field , validator from anthropic import Anthropic import os , structlog logger = structlog . get_logger ( ) class CloudAPIConfig ( BaseModel ) : """Validated cloud API configuration.""" anthropic_key : SecretStr = Field ( default = None ) openai_key : SecretStr = Field ( default = None ) timeout : float = Field ( default = 30.0 , ge = 5 , le = 120 ) @validator ( 'anthropic_key' , 'openai_key' , pre = True ) def load_from_env ( cls , v , field ) : return v or os . environ . get ( field . name . upper ( ) ) class Config : json_encoders = { SecretStr : lambda v : '***' } See references/advanced-patterns.md for complete implementations.
Security Standards 8.1 Critical Vulnerabilities Vulnerability Severity Mitigation Prompt Injection HIGH Input sanitization, output filtering API Key Exposure CRITICAL Environment variables, secret managers Data Exfiltration HIGH Restrict network access 8.2 OWASP LLM Top 10 Mapping OWASP ID Category Mitigation LLM01 Prompt Injection Sanitize all inputs LLM02 Insecure Output Filter before use LLM06 Info Disclosure No secrets in prompts
Common Mistakes

NEVER: Hardcode API Keys

client

Anthropic ( api_key = "sk-ant-api03-xxxxx" )

DANGEROUS

client

Anthropic ( )

SECURE - uses env var

NEVER: Log API Keys

logger . info ( f"Using API key: { api_key } " )

DANGEROUS

logger . info ( "API client initialized" , provider = "anthropic" )

SECURE

NEVER: Trust External Content

content

fetch_url ( url ) response = claude . generate ( f"Summarize: { content } " )

INJECTION VECTOR!

Pre-Implementation Checklist Phase 1: Before Writing Code Write failing tests with mocked API responses Define rate limits and cost thresholds Set up secure credential loading (env vars or secrets manager) Plan caching strategy for repeated queries Phase 2: During Implementation API keys loaded from environment/secrets manager only Input sanitization active on all user content Output filtering before using responses Connection pooling configured Retry logic with exponential backoff Response caching for identical queries Phase 3: Before Committing All tests pass with >80% coverage No API keys in git history (use git-secrets) Security scan passes (bandit) Type checking passes (mypy) Daily spending limits configured Multi-provider fallback tested
Summary

Your goal is to create cloud API integrations that are:

Test-Driven

All functionality verified with mocked tests

Performant

Connection pooling, caching, async operations

Secure

Protected against prompt injection and data exfiltration

Reliable

Multi-provider fallback with proper error handling

Cost-effective

Rate limiting and usage monitoring For complete implementation details, see : references/advanced-patterns.md
Caching, streaming, optimization references/security-examples.md
Full vulnerability analysis references/threat-model.md
Attack scenarios and mitigations

安装

tests/test_cloud_api.py

Verify sanitization was applied

call_args

RateLimitError

src/cloud_api.py

Run all tests with coverage

Run security checks

Type checking

Good: Reuse HTTP connections

Bad: Create new connection per request

New connection each time!

Good: Smart retry with backoff

Bad: No retry or fixed delay

Fixed delay, no backoff!

Good: Cache repeated queries with TTL

5 min TTL

Bad: No caching

Repeated identical calls!

Good: Batch multiple requests

Max 5 concurrent

Bad: Sequential processing

One at a time!

Good: Fully async with proper context management

Usage

Bad: Blocking calls in async context

Blocks!

requirements.txt

Input validation

HTTP client with timeouts

Retry logic

Secure logging

Key encryption

Response caching

NEVER: Hardcode API Keys

client

DANGEROUS

client

SECURE - uses env var

NEVER: Log API Keys

DANGEROUS

SECURE

NEVER: Trust External Content

content

INJECTION VECTOR!