Axiom Cost Control

Dashboards, monitors, and waste identification for Axiom usage optimization.

Before You Start

Load required skills:

skill: axiom-sre skill: building-dashboards

Building-dashboards provides: dashboard-list, dashboard-get, dashboard-create, dashboard-update, dashboard-delete

Find the audit dataset. Try axiom-audit first:

['axiom-audit'] | where _time > ago(1h) | summarize count() by action | where action in ('usageCalculated', 'runAPLQueryCost')

If not found → ask user. Common names: axiom-audit-logs-view, audit-logs If found but no usageCalculated events → wrong dataset, ask user

Verify axiom-history access (required for Phase 4):

['axiom-history'] | where _time > ago(1h) | take 1

If not found, Phase 4 optimization will not work.

Confirm with user:

Deployment name? Audit dataset name? Contract limit in TB/day? (required for Phase 3 monitors)

Replace and in all commands below.

Tips:

Run any script with -h for full usage Do NOT pipe script output to head or tail — causes SIGPIPE errors Requires jq for JSON parsing Use axiom-sre's axiom-query for ad-hoc APL, not direct CLI Which Phases to Run User request Run these phases "reduce costs" / "find waste" 0 → 1 → 4 "set up cost control" 0 → 1 → 2 → 3 "deploy dashboard" 0 → 2 "create monitors" 0 → 3 "check for drift" 0 only Phase 0: Check Existing Setup

Existing dashboard?

dashboard-list | grep -i cost

Existing monitors?

axiom-api GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "(.id)\t(.name)"'

If found, fetch with dashboard-get and compare to templates/dashboard.json for drift.

Phase 1: Discovery scripts/baseline-stats -d -a

Captures daily ingest stats and produces the Analysis Queue (needed for Phase 4).

Phase 2: Dashboard scripts/deploy-dashboard -d -a

Creates dashboard with: ingest trends, burn rate, projections, waste candidates, top users. See reference/dashboard-panels.md for details.

Phase 3: Monitors

Contract is required. You must have the contract limit from preflight step 4.

scripts/create-monitors -d -a -c [-n ]

Creates 5 monitors (use -n to attach notifier):

Last 24h Ingest vs Contract — threshold @ 1.5x contract Per-Dataset Spike Detection — anomaly, grouped by dataset Top Dataset Dominance — threshold @ 40% of hourly contract Query Cost Spike — anomaly on GB·ms Reduction Glidepath — threshold, update weekly

See reference/monitor-strategy.md for threshold derivation.

Phase 4: Optimization Get the Analysis Queue

Run scripts/baseline-stats if not already done. It outputs a prioritized list:

Priority Meaning P0⛔ Top 3 by ingest OR >10% of total — MANDATORY P1 Never queried — strong drop candidate P2 Rarely queried (Work/GB < 100) — likely waste

Work/GB = query cost (GB·ms) / ingest (GB). Lower = less value from data.

Analyze datasets in order

Work top-to-bottom. For each dataset:

Step 1: Column analysis

scripts/analyze-query-coverage -d -D -a

If 0 queries → recommend DROP, move to next.

Step 2: Field value analysis

Pick a field from suggested list (usually app, service, or kubernetes.labels.app):

scripts/analyze-query-coverage -d -D -a -f

Note values with high volume but never queried (⚠️ markers).

Step 3: Handle empty values

If (empty) has >5% volume, you MUST drill down with alternative field (e.g., kubernetes.namespace_name).

Step 4: Record recommendation

For each dataset, note: name, ingest volume, Work/GB, top unqueried values, action (DROP/SAMPLE/KEEP), estimated savings.

Done when

All P0⛔ and P1 datasets analyzed. Then compile report using reference/analysis-report-template.md.

Phase 5: Glidepath

Update threshold weekly as reductions take effect:

scripts/update-glidepath -d -t

Week Target 1 Current p95 2 -25% 3 -50% 4 Contract Cleanup

Delete monitors

axiom-api GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "(.id)\t(.name)"' axiom-api DELETE "/v2/monitors/"

Delete dashboard

dashboard-list | grep -i cost dashboard-delete

Note: Running create-monitors twice creates duplicates. Delete existing monitors first if re-deploying.

Reference Audit Dataset Fields Field Description action usageCalculated or runAPLQueryCost properties.hourly_ingest_bytes Hourly ingest in bytes properties.hourly_billable_query_gbms Hourly query cost properties.dataset Dataset name resource.id Org ID actor.email User email Common Fields for Value Analysis Dataset type Primary field Alternatives Kubernetes logs kubernetes.labels.app kubernetes.namespace_name, kubernetes.container_name Application logs app or service level, logger, component Infrastructure host region, instance Traces service.name span.kind, http.route Units & Conversions Scripts use TB/day Dashboard filter uses GB/month Contract TB/day GB/month 5 PB/month 167 5,000,000 10 PB/month 333 10,000,000 15 PB/month 500 15,000,000 Optimization Actions Signal Action Work/GB = 0 Drop or stop ingesting High-volume unqueried values Sample or reduce log level Empty values from system namespaces Filter at ingest or accept WoW spike Check recent deploys

controlling-costs

安装

Existing dashboard?

Existing monitors?

Delete monitors

Delete dashboard