Modal Knowledge Skill
Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.
Activation Triggers
Activate this skill when users ask about:
Modal.com platform features and capabilities GPU-accelerated Python functions Serverless container configuration Modal pricing and billing Modal CLI commands Web endpoints and APIs on Modal Scheduled/cron jobs on Modal Modal volumes, secrets, and storage Parallel processing with Modal Modal deployment and CI/CD Platform Overview
Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:
Zero Configuration: Everything defined in Python code Fast GPU Startup: ~1 second container spin-up Automatic Scaling: Scale to zero, scale to thousands Per-Second Billing: Only pay for active compute Multi-Cloud: AWS, GCP, Oracle Cloud Infrastructure Core Components Reference Apps and Functions import modal
app = modal.App("app-name")
@app.function() def basic_function(arg: str) -> str: return f"Result: {arg}"
@app.local_entrypoint() def main(): result = basic_function.remote("test") print(result)
Function Decorator Parameters Parameter Type Description image Image Container image configuration gpu str/list GPU type(s): "T4", "A100", ["H100", "A100"] cpu float CPU cores (0.125 to 64) memory int Memory in MB (128 to 262144) timeout int Max execution seconds retries int Retry attempts on failure secrets list Secrets to inject volumes dict Volume mount points schedule Cron/Period Scheduled execution concurrency_limit int Max concurrent executions container_idle_timeout int Seconds to keep warm include_source bool Auto-sync source code GPU Reference Available GPUs GPU Memory Use Case ~Cost/hr T4 16 GB Small inference $0.59 L4 24 GB Medium inference $0.80 A10G 24 GB Inference/fine-tuning $1.10 L40S 48 GB Heavy inference $1.50 A100-40GB 40 GB Training $2.00 A100-80GB 80 GB Large models $3.00 H100 80 GB Cutting-edge $5.00 H200 141 GB Largest models $5.00 B200 180+ GB Latest gen $6.25 GPU Configuration
Single GPU
@app.function(gpu="A100")
Specific memory variant
@app.function(gpu="A100-80GB")
Multi-GPU
@app.function(gpu="H100:4")
Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])
"any" = L4, A10G, or T4
@app.function(gpu="any")
Image Building Base Images
Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")
From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")
From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")
Package Installation
pip (standard)
image.pip_install("torch", "transformers")
uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")
System packages
image.apt_install("ffmpeg", "libsm6")
Shell commands
image.run_commands("apt-get update", "make install")
Adding Files
Single file
image.add_local_file("./config.json", "/app/config.json")
Directory
image.add_local_dir("./models", "/app/models")
Python source
image.add_local_python_source("my_module")
Environment variables
image.env({"VAR": "value"})
Build-Time Function def download_model(): from huggingface_hub import snapshot_download snapshot_download("model-name")
image.run_function(download_model, secrets=[...])
Storage Volumes
Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)
Mount in function
@app.function(volumes={"/data": vol}) def func(): # Read/write to /data vol.commit() # Persist changes
Secrets
From dashboard (recommended)
modal.Secret.from_name("secret-name")
From dictionary
modal.Secret.from_dict({"KEY": "value"})
From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])
From .env file
modal.Secret.from_dotenv()
Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")]) def func(): import os key = os.environ["API_KEY"]
Dict and Queue
Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True) d["key"] = "value" d.put("key", "value", ttl=3600)
Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True) q.put("task") item = q.get()
Web Endpoints FastAPI Endpoint (Simple) @app.function() @modal.fastapi_endpoint() def hello(name: str = "World"): return {"message": f"Hello, {name}!"}
ASGI App (Full FastAPI) from fastapi import FastAPI web_app = FastAPI()
@web_app.post("/predict") def predict(text: str): return {"result": process(text)}
@app.function() @modal.asgi_app() def fastapi_app(): return web_app
WSGI App (Flask) from flask import Flask flask_app = Flask(name)
@app.function() @modal.wsgi_app() def flask_endpoint(): return flask_app
Custom Web Server @app.function() @modal.web_server(port=8000) def custom_server(): subprocess.run(["python", "-m", "http.server", "8000"])
Custom Domains @modal.asgi_app(custom_domains=["api.example.com"])
Scheduling Cron
Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))
With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
Period @app.function(schedule=modal.Period(hours=5)) @app.function(schedule=modal.Period(days=1))
Note: Scheduled functions only run with modal deploy, not modal run.
Parallel Processing Map
Parallel execution (up to 1000 concurrent)
results = list(func.map(items))
Unordered (faster)
results = list(func.map(items, order_outputs=False))
Starmap
Spread args
pairs = [(1, 2), (3, 4)] results = list(add.starmap(pairs))
Spawn
Async job (returns immediately)
call = func.spawn(data) result = call.get() # Get result later
Spawn many
calls = [func.spawn(item) for item in items] results = [call.get() for call in calls]
Container Lifecycle (Classes) @app.cls(gpu="A100", container_idle_timeout=300) class Server:
@modal.enter()
def load(self):
self.model = load_model()
@modal.method()
def predict(self, text):
return self.model(text)
@modal.exit()
def cleanup(self):
del self.model
Concurrency @modal.concurrent(max_inputs=100, target_inputs=80) @modal.method() def batched(self, item): pass
CLI Commands Development modal run app.py # Run function modal serve app.py # Hot-reload dev server modal shell app.py # Interactive shell modal shell app.py --gpu A100 # Shell with GPU
Deployment modal deploy app.py # Deploy modal app list # List apps modal app logs app-name # View logs modal app stop app-name # Stop app
Resources
Volumes
modal volume create name modal volume list modal volume put name local remote modal volume get name remote local
Secrets
modal secret create name KEY=value modal secret list
Environments
modal environment create staging
Pricing (2025) Plans Plan Price Containers GPU Concurrency Starter Free ($30 credits) 100 10 Team $250/month 1000 50 Enterprise Custom Unlimited Custom Compute CPU: $0.0000131/core/sec Memory: $0.00000222/GiB/sec GPUs: See GPU table above Special Programs Startups: Up to $25k credits Researchers: Up to $10k credits Best Practices Use @modal.enter() for model loading Use uv_pip_install for faster builds Use GPU fallbacks for availability Set appropriate timeouts and retries Use environments (dev/staging/prod) Download models during build, not runtime Use order_outputs=False when order doesn't matter Set container_idle_timeout to balance cost/latency Monitor costs in Modal dashboard Test with modal run before modal deploy Common Patterns LLM Inference @app.cls(gpu="A100", container_idle_timeout=300) class LLM: @modal.enter() def load(self): from vllm import LLM self.llm = LLM(model="...")
@modal.method()
def generate(self, prompt):
return self.llm.generate([prompt])
Batch Processing @app.function(volumes={"/data": vol}) def process(file): # Process file vol.commit()
Parallel
results = list(process.map(files))
Scheduled ETL @app.function( schedule=modal.Cron("0 6 * * *"), secrets=[modal.Secret.from_name("db")] ) def daily_etl(): extract() transform() load()
Quick Reference Task Code Create app app = modal.App("name") Basic function @app.function() With GPU @app.function(gpu="A100") With image @app.function(image=img) Web endpoint @modal.asgi_app() Scheduled schedule=modal.Cron("...") Mount volume volumes={"/path": vol} Use secret secrets=[modal.Secret.from_name("x")] Parallel map func.map(items) Async spawn func.spawn(arg) Class pattern @app.cls() with @modal.enter()