- ADK Deployment Guide
- Requires:
- agents-cli
- (
- uv tool install google-agents-cli
- ) —
- install uv
- first if needed.
- Prefer using the
- agents-cli
- commands throughout this guide — they wrap Terraform, Docker, and deployment into a tested pipeline. If your project isn't scaffolded yet, see
- /google-agents-cli-scaffold
- to add deployment support first.
- Reference Files
- For deeper details, consult these reference files in
- references/
- :
- cloud-run.md
- — Scaling defaults, Dockerfile, session types, networking
- agent-runtime.md
- — deploy.py CLI, AdkApp pattern, Terraform resource, deployment metadata, CI/CD differences
- gke.md
- — GKE Autopilot cluster, Kubernetes manifests, Workload Identity, session types, networking
- terraform-patterns.md
- — Custom infrastructure, IAM, state management, importing resources
- batch-inference.md
- — BigQuery Remote Function trigger; for Pub/Sub / Eventarc see
- /google-agents-cli-adk-code
- cicd-pipeline.md
- — Full CI/CD pipeline setup,
- infra cicd
- flags, runner comparison, WIF auth, pipeline stages
- testing-deployed-agents.md
- — Testing instructions per deployment target, curl examples, load tests
- Observability:
- See the
- /google-agents-cli-observability
- skill for Cloud Trace, prompt-response logging, BigQuery Analytics, and third-party integrations.
- Deployment Target Decision Matrix
- Choose the right deployment target based on your requirements:
- Criteria
- Agent Runtime
- Cloud Run
- GKE
- Languages
- Python
- Python
- Python (+ others via custom containers)
- Scaling
- Managed auto-scaling (configurable min/max, concurrency)
- Fully configurable (min/max instances, concurrency, CPU allocation)
- Full Kubernetes scaling (HPA, VPA, node auto-provisioning)
- Networking
- VPC-SC and PSC supported
- Full VPC support, direct VPC egress, IAP, ingress rules
- Full Kubernetes networking
- Session state
- Native
- VertexAiSessionService
- (persistent, managed)
- In-memory (dev), Cloud SQL, or Agent Platform Sessions backend
- In-memory (dev), Cloud SQL, or Agent Platform Sessions backend
- Batch/event processing
- Not supported
- Native trigger endpoints (Pub/Sub, Eventarc); see
- /google-agents-cli-adk-code
- Custom (Kubernetes Jobs, Pub/Sub)
- Cost model
- vCPU-iours + memory-iours (not billed when idle)
- Per-instance-second + min instance costs
- Node pool costs (always-on or auto-provisioned)
- Setup complexity
- Lower (managed, purpose-built for agents)
- Medium (Dockerfile, Terraform, networking)
- Higher (Kubernetes expertise required)
- Best for
- Managed infrastructure, minimal ops
- Custom infra, event-driven workloads
- Full Kubernetes control
- Ask the user
- which deployment target fits their needs. Each is a valid production choice with different trade-offs.
- Product name mapping:
- "Agent Engine" / "Vertex AI Agent Engine" is now
- Agent Runtime
- . Use
- --deployment-target agent_runtime
- .
- Ambient / scheduled / event-driven agents:
- Agent Runtime does not support Pub/Sub, Eventarc, or Cloud Scheduler triggers. Use
- Cloud Run
- (recommended) or
- GKE
- for these workloads. See
- /google-agents-cli-adk-code
- Section 12 for the
- trigger_sources
- pattern.
- OAuth / user consent agents:
- Use
- Agent Runtime
- with Gemini Enterprise for agents that need OAuth 2.0 user consent (e.g., accessing Google Drive, Calendar, or other user-scoped APIs). Cloud Run does not currently support managed OAuth flows. See the
- adk-ae-oauth
- sample in
- /google-agents-cli-workflow
- Phase 2.
- Deploying to Dev
- Deploy Workflow
- Task tracking:
- Deployment involves multiple sequential steps (infra setup, CI/CD configuration, deploy, verification). Use a task list to track progress through these steps — skipping one often causes failures in later steps that are hard to trace back.
- If prototype (no deployment target), first enhance:
- agents-cli scaffold enhance . --deployment-target
- Notify the human
-
- "Eval scores meet thresholds and tests pass. Ready to deploy to dev?"
- Wait for explicit approval
- Once approved:
- agents-cli deploy
- Agent Runtime timeout recovery:
- Agent Runtime deploys can take 5-10 minutes and may exceed command timeouts. If the deploy command is cancelled or times out, the deployment continues server-side. Run
- agents-cli deploy --status
- to check progress — poll every 60 seconds until it reports completion or failure.
- IMPORTANT
- Never run agents-cli deploy without explicit human approval. Do NOT run agents-cli infra single-project before deploying. It is not a prerequisite — agents-cli deploy works on its own. Run it separately if the user needs observability features (prompt-response logging, BigQuery analytics) — see /google-agents-cli-observability . Single-Project Infrastructure Setup (Optional — Advanced) agents-cli infra single-project runs terraform apply in deployment/terraform/single-project/ . Use this to provision single-project GCP infrastructure without CI/CD (service accounts, IAM bindings, telemetry resources, Artifact Registry). Also useful to test things in a single project before going to production. It is NOT required for deploying.
Optional — provision infrastructure in a single GCP project
agents-cli infra single-project Note: agents-cli deploy doesn't automatically use the Terraform-created app_sa . Pass the service account via agents-cli deploy --service-account SA_EMAIL or uv run -m app.app_utils.deploy --service-account SA_EMAIL for Agent Runtime targets. Deploy Flag Reference Flag Description Targets --project GCP project ID All --region GCP region All --service-account Service account email for the deployed agent All --secrets Comma-separated ENV=SECRET or ENV=SECRET:VERSION pairs Agent Runtime --update-env-vars Comma-separated KEY=VALUE environment variables Agent Runtime, Cloud Run --agent-identity Enable agent identity (Preview) Agent Runtime --memory Memory limit (default: 4Gi ) Cloud Run --port Container port Cloud Run --iap Enable Identity-Aware Proxy Cloud Run --image Container image URI (skips source build) Cloud Run, GKE --no-wait Start deployment and return immediately Agent Runtime, Cloud Run --status Check the status of a pending --no-wait deployment Agent Runtime, Cloud Run --list List existing deployments and exit All --dry-run / -n Print what would be executed without running it All --no-confirm-project Skip project confirmation prompt All Run agents-cli deploy --help for the full flag reference. Cloud Run also accepts extra gcloud flags after -- (e.g., -- --timeout=600 ). Project Confirmation: If the project is resolved automatically (not passed via --project ), the command will prompt for confirmation in interactive mode. Since agents typically run in non-interactive mode, you MUST pass --no-confirm-project to proceed if you are relying on automatic project resolution. Production Deployment — CI/CD Pipeline For the full CI/CD pipeline setup guide — prerequisites, infra cicd flags, runner comparison, WIF authentication, pipeline stages, and production approval — see references/cicd-pipeline.md . Cloud Run Specifics For detailed infrastructure configuration (scaling defaults, Dockerfile, FastAPI endpoints, session types, networking), see references/cloud-run.md . For ADK docs on Cloud Run deployment, fetch https://adk.dev/deploy/cloud-run/index.md . For event-driven / ambient agent deployment on Cloud Run, see the ambient-expense-agent sample and /google-agents-cli-adk-code for the trigger_sources pattern. Agent Runtime Specifics Agent Runtime is a managed Vertex AI service for deploying Python ADK agents. Uses source-based deployment (no Dockerfile) via deploy.py and the AdkApp class. No gcloud CLI exists for Agent Runtime. Deploy via agents-cli deploy or deploy.py . Query via the Python vertexai.Client SDK. Deployments can take 5-10 minutes. Use --no-wait to start a deployment and return immediately, then check on it later with --status :
Start deployment without blocking
agents-cli deploy --no-wait
Check on progress later
agents-cli deploy
--status
When
--status
detects the operation has completed, it writes
deployment_metadata.json
and prints the same success output as a normal deploy.
For detailed infrastructure configuration (deploy.py flags, AdkApp pattern, Terraform resource, deployment metadata, session/artifact services, CI/CD differences), see
references/agent-runtime.md
. For ADK docs on Agent Runtime deployment, fetch
https://adk.dev/deploy/agent-runtime/index.md
.
GKE Specifics
For detailed infrastructure configuration (Kubernetes manifests, Terraform resources, Workload Identity, session types, networking), see
references/gke.md
. For ADK docs on GKE deployment, fetch
https://adk.dev/deploy/gke/index.md
.
Service Account Architecture
Scaffolded projects use two service accounts:
app_sa
(per environment) — Runtime identity for the deployed agent. Roles defined in
deployment/terraform/iam.tf
.
cicd_runner_sa
(CI/CD project) — CI/CD pipeline identity (GitHub Actions / Cloud Build). Lives in the CI/CD project (defaults to prod project), needs permissions in
both
staging and prod projects.
Check
deployment/terraform/iam.tf
for exact role bindings. Cross-project permissions (Cloud Run service agents, artifact registry access) are also configured there.
Common 403 errors:
"Permission denied on Cloud Run" →
cicd_runner_sa
missing deployment role in the target project
"Cannot act as service account" → Missing
iam.serviceAccountUser
binding on
app_sa
"Secret access denied" →
app_sa
missing
secretmanager.secretAccessor
"Artifact Registry read denied" → Cloud Run service agent missing read access in CI/CD project
Required Permissions for CI/CD Setup
roles/secretmanager.admin
granted to the Cloud Build service account (
service-
Create a secret
echo -n "YOUR_API_KEY" | gcloud secrets create MY_SECRET_NAME --data-file = -
Update an existing secret
- echo
- -n
- "NEW_API_KEY"
- |
- gcloud secrets versions
- add
- MY_SECRET_NAME --data-file
- =
- -
- Grant access:
- For Cloud Run, grant
- secretmanager.secretAccessor
- to
- app_sa
- . For Agent Runtime, grant it to the platform-managed SA (
- service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com
- ). For GKE, grant
- secretmanager.secretAccessor
- to
- app_sa
- . Access secrets via Kubernetes Secrets or directly via the Secret Manager API with Workload Identity.
- Pass secrets at deploy time (Agent Runtime):
- agents-cli deploy
- --secrets
- "API_KEY=my-api-key,DB_PASS=db-password:2"
- Format:
- ENV_VAR=SECRET_ID
- or
- ENV_VAR=SECRET_ID:VERSION
- (defaults to latest). Access in code via
- os.environ.get("API_KEY")
- .
- Observability
- See the
- agents-cli-observability
- skill for observability configuration (Cloud Trace, prompt-response logging, BigQuery Analytics, third-party integrations).
- Testing Your Deployed Agent
- The quickest way to test a deployed agent is
- agents-cli run --url
--mode "your prompt" - — it handles auth, sessions, and streaming automatically (supports Agent Runtime and Cloud Run).
- For advanced testing (custom headers, session reuse, scripting, load tests), see
- references/testing-deployed-agents.md
- .
- Deploying with a UI (IAP)
- IAP (Identity-Aware Proxy) secures a Cloud Run service so only authorized Google accounts can access it. Support for IAP deployment via
- agents-cli deploy
- is planned for a future release.
- For Agent Runtime with a custom frontend, use a
- decoupled deployment
- — deploy the frontend separately to Cloud Run or Cloud Storage, connecting to the Agent Runtime backend API.
- For more information on IAP with Cloud Run, see the
- Cloud Console IAP settings
- .
- Rollback & Recovery
- The primary rollback mechanism is
- git-based
-
- fix the issue, commit, and push to
- main
- . The CI/CD pipeline will automatically build and deploy the new version through staging → production.
- For immediate Cloud Run rollback without a new commit, use revision traffic shifting:
- gcloud run revisions list
- --service
- =
- SERVICE_NAME
- --region
- =
- REGION
- gcloud run services update-traffic SERVICE_NAME
- \
- --to-revisions
- =
- REVISION_NAME
- =
- 100
- --region
- =
- REGION
- Agent Runtime doesn't support revision-based rollback — fix and redeploy via
- agents-cli deploy
- .
- For GKE rollback, use
- kubectl rollout undo
- :
- kubectl rollout undo deployment/DEPLOYMENT_NAME
- -n
- NAMESPACE
- kubectl rollout status deployment/DEPLOYMENT_NAME
- -n
- NAMESPACE
- Custom Infrastructure (Terraform)
- CRITICAL
- When your agent requires custom infrastructure (Cloud SQL, Pub/Sub, Eventarc, BigQuery, etc.), you MUST define it in Terraform — never create resources manually via gcloud commands. Exception: quick experimentation is fine with gcloud or console, but production infrastructure must be in Terraform. For custom infrastructure patterns, consult references/terraform-patterns.md for: Where to put custom Terraform files (single-project vs CI/CD) Resource examples (Pub/Sub, BigQuery, Eventarc triggers) IAM bindings for custom resources Terraform state management (remote vs local, importing resources) Common infrastructure patterns Troubleshooting Issue Solution Terraform state locked terraform force-unlock -force LOCK_ID in deployment/terraform/ GitHub Actions auth failed Re-run terraform apply in CI/CD terraform dir; verify WIF pool/provider Cloud Build authorization pending Use github_actions runner instead Resource already exists terraform import (see references/terraform-patterns.md ) Agent Runtime deploy timeout / hangs Deployments take 5-10 min; check if engine was created (see Agent Runtime Specifics) Secret not available Verify secretAccessor granted to app_sa (not the default compute SA) 403 on deploy Check deployment/terraform/iam.tf — cicd_runner_sa needs deployment + SA impersonation roles in the target project 403 when testing Cloud Run Default is --no-allow-unauthenticated ; include Authorization: Bearer $(gcloud auth print-identity-token) header Cold starts too slow Set min_instance_count > 0 in Cloud Run Terraform config Cloud Run 503 errors Check resource limits (memory/CPU), increase max_instance_count , or check container crash logs 403 right after granting IAM role IAM propagation is not instant — wait a couple of minutes before retrying. Don't keep re-granting the same role Resource seems missing but Terraform created it Run terraform state list to check what Terraform actually manages. Resources created via null_resource + local-exec (e.g., BQ linked datasets) won't appear in gcloud CLI output Deployment failed or agent not responding Check Cloud Logging: gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=SERVICE" --project=PROJECT --limit=50 --format="table(timestamp,severity,textPayload)" for Cloud Run, or gcloud logging read "resource.type=aiplatform.googleapis.com/ReasoningEngine" --project=PROJECT --limit=50 for Agent Runtime Agent returns errors after deploy Open Cloud Logging in Console → filter by service name (Cloud Run) or reasoning engine resource (Agent Runtime) → look for Python tracebacks or permission errors in recent log entries Platform Registration For registering deployed agents with Gemini Enterprise, see /google-agents-cli-publish .