Axiom Cost Control
Dashboards, monitors, and waste identification for Axiom usage optimization.
Before You Start
Load required skills:
skill: axiom-sre skill: building-dashboards
Building-dashboards provides: dashboard-list, dashboard-get, dashboard-create, dashboard-update, dashboard-delete
Find the audit dataset. Try axiom-audit first:
['axiom-audit'] | where _time > ago(1h) | summarize count() by action | where action in ('usageCalculated', 'runAPLQueryCost')
If not found → ask user. Common names: axiom-audit-logs-view, audit-logs If found but no usageCalculated events → wrong dataset, ask user
Verify axiom-history access (required for Phase 4):
['axiom-history'] | where _time > ago(1h) | take 1
If not found, Phase 4 optimization will not work.
Confirm with user:
Deployment name? Audit dataset name? Contract limit in TB/day? (required for Phase 3 monitors)
Replace
Tips:
Run any script with -h for full usage Do NOT pipe script output to head or tail — causes SIGPIPE errors Requires jq for JSON parsing Use axiom-sre's axiom-query for ad-hoc APL, not direct CLI Which Phases to Run User request Run these phases "reduce costs" / "find waste" 0 → 1 → 4 "set up cost control" 0 → 1 → 2 → 3 "deploy dashboard" 0 → 2 "create monitors" 0 → 3 "check for drift" 0 only Phase 0: Check Existing Setup
Existing dashboard?
dashboard-list
Existing monitors?
axiom-api
If found, fetch with dashboard-get and compare to templates/dashboard.json for drift.
Phase 1: Discovery
scripts/baseline-stats -d
Captures daily ingest stats and produces the Analysis Queue (needed for Phase 4).
Phase 2: Dashboard
scripts/deploy-dashboard -d
Creates dashboard with: ingest trends, burn rate, projections, waste candidates, top users. See reference/dashboard-panels.md for details.
Phase 3: Monitors
Contract is required. You must have the contract limit from preflight step 4.
scripts/create-monitors -d
Creates 5 monitors (use -n to attach notifier):
Last 24h Ingest vs Contract — threshold @ 1.5x contract Per-Dataset Spike Detection — anomaly, grouped by dataset Top Dataset Dominance — threshold @ 40% of hourly contract Query Cost Spike — anomaly on GB·ms Reduction Glidepath — threshold, update weekly
See reference/monitor-strategy.md for threshold derivation.
Phase 4: Optimization Get the Analysis Queue
Run scripts/baseline-stats if not already done. It outputs a prioritized list:
Priority Meaning P0⛔ Top 3 by ingest OR >10% of total — MANDATORY P1 Never queried — strong drop candidate P2 Rarely queried (Work/GB < 100) — likely waste
Work/GB = query cost (GB·ms) / ingest (GB). Lower = less value from data.
Analyze datasets in order
Work top-to-bottom. For each dataset:
Step 1: Column analysis
scripts/analyze-query-coverage -d
If 0 queries → recommend DROP, move to next.
Step 2: Field value analysis
Pick a field from suggested list (usually app, service, or kubernetes.labels.app):
scripts/analyze-query-coverage -d
Note values with high volume but never queried (⚠️ markers).
Step 3: Handle empty values
If (empty) has >5% volume, you MUST drill down with alternative field (e.g., kubernetes.namespace_name).
Step 4: Record recommendation
For each dataset, note: name, ingest volume, Work/GB, top unqueried values, action (DROP/SAMPLE/KEEP), estimated savings.
Done when
All P0⛔ and P1 datasets analyzed. Then compile report using reference/analysis-report-template.md.
Phase 5: Glidepath
Update threshold weekly as reductions take effect:
scripts/update-glidepath -d
Week Target 1 Current p95 2 -25% 3 -50% 4 Contract Cleanup
Delete monitors
axiom-api
Delete dashboard
dashboard-list
Note: Running create-monitors twice creates duplicates. Delete existing monitors first if re-deploying.
Reference Audit Dataset Fields Field Description action usageCalculated or runAPLQueryCost properties.hourly_ingest_bytes Hourly ingest in bytes properties.hourly_billable_query_gbms Hourly query cost properties.dataset Dataset name resource.id Org ID actor.email User email Common Fields for Value Analysis Dataset type Primary field Alternatives Kubernetes logs kubernetes.labels.app kubernetes.namespace_name, kubernetes.container_name Application logs app or service level, logger, component Infrastructure host region, instance Traces service.name span.kind, http.route Units & Conversions Scripts use TB/day Dashboard filter uses GB/month Contract TB/day GB/month 5 PB/month 167 5,000,000 10 PB/month 333 10,000,000 15 PB/month 500 15,000,000 Optimization Actions Signal Action Work/GB = 0 Drop or stop ingesting High-volume unqueried values Sample or reduce log level Empty values from system namespaces Filter at ingest or accept WoW spike Check recent deploys