- LLM Usage Monitoring Dashboard
- Tracks LLM API costs, tokens, and latency using
- Tokuin CLI
- , and auto-generates a data-driven admin dashboard with PM insights.
- When to use this skill
- LLM cost visibility
-
- When you want to monitor API usage costs per team or individual in real time
- PM reporting dashboard
-
- When you need weekly reports on who uses AI, how much, and how
- User adoption management
-
- When you want to track inactive users and increase AI adoption rates
- Model optimization evidence
-
- When you need data-driven decisions for model switching or cost reduction
- Add monitoring tab to admin dashboard
- When adding an LLM monitoring section to an existing Admin page Prerequisites 1. Verify Tokuin CLI installation
Check if installed
which tokuin && tokuin --version || echo "Not installed — run Step 1 first" 2. Environment variables (only needed for live API calls)
Store in .env file (never hardcode directly in source)
OPENAI_API_KEY
sk- .. .
OpenAI
ANTHROPIC_API_KEY
sk-ant- .. .
Anthropic
OPENROUTER_API_KEY
sk-or- .. .
OpenRouter (400+ models)
LLM monitoring settings
LLM_USER_ID
dev-alice
User identifier
LLM_USER_ALIAS
Alice
Display name
COST_THRESHOLD_USD
10.00
Cost threshold (alert when exceeded)
DASHBOARD_PORT
3000
Dashboard port
MAX_COST_USD
5.00
Max cost per single run
SLACK_WEBHOOK_URL
https:// .. .
For alerts (optional)
- Project stack requirements
Option A (recommended): Next.js 15+ + React 18 + TypeScript
Option B (lightweight): Python 3.8+ + HTML/JavaScript (minimal dependencies)
Instructions
Step 0: Safety check (always run this first)
⚠️ Run this script before executing the skill. Any FAIL items will halt execution.
cat
safety-guard.sh << 'SAFETY_EOF'
!/usr/bin/env bash
safety-guard.sh — Safety gate before running the LLM monitoring dashboard
set -euo pipefail RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'; NC='\033[0m' ALLOW_LIVE="${1:-}"; PASS=0; WARN=0; FAIL=0 log_pass() { echo -e "${GREEN}✅ PASS${NC} $1"; ((PASS++)); } log_warn() { echo -e "${YELLOW}⚠️ WARN${NC} $1"; ((WARN++)); } log_fail() { echo -e "${RED}❌ FAIL${NC} $1"; ((FAIL++)); } echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "🛡 LLM Monitoring Dashboard — Safety Guard v1.0" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
── 1. Check Tokuin CLI installation ────────────────────────────────
if command -v tokuin &>/dev/null; then log_pass "Tokuin CLI installed: $(tokuin --version 2>&1 | head -1)" else log_fail "Tokuin not installed → install with the command below and re-run:" echo " curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash" fi
── 2. Detect hardcoded API keys ────────────────────────────────
HARDCODED=$(grep -rE "(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-or-[a-zA-Z0-9]{20,})" \ . --include=".ts" --include=".tsx" --include=".js" --include=".jsx" \ --include=".html" --include=".sh" --include=".py" --include=".json" \ --exclude-dir=node_modules --exclude-dir=.git 2>/dev/null \ | grep -v ".env" | grep -v "example" | wc -l || echo 0) if [ "$HARDCODED" -eq 0 ]; then log_pass "No hardcoded API keys found" else log_fail "⚠️ ${HARDCODED} hardcoded API key(s) detected! → Move to environment variables (.env) immediately" grep -rE "(sk-[a-zA-Z0-9]{20,})" . \ --include=".ts" --include=".js" --include="*.html" \ --exclude-dir=node_modules 2>/dev/null | head -5 || true fi
── 3. Check .env is in .gitignore ────────────────────────────
if [ -f .env ]; then if [ -f .gitignore ] && grep -q ".env" .gitignore; then log_pass ".env is listed in .gitignore" else log_fail ".env exists but is not in .gitignore! → echo '.env' >> .gitignore" fi else log_warn ".env file not found — create one before making live API calls" fi
── 4. Check live API call mode ────────────────────────────
if [ "$ALLOW_LIVE" = "--allow-live" ]; then log_warn "Live API call mode enabled! Costs will be incurred." log_warn "Max cost threshold: \$${MAX_COST_USD:-5.00} (adjust via MAX_COST_USD env var)" read -p " Allow live API calls? [y/N] " -r echo [[ $REPLY =~ ^[Yy]$ ]] || { echo "Cancelled. Re-run in dry-run mode."; exit 1; } else log_pass "dry-run mode (default) — no API costs incurred" fi
── 5. Check port conflicts ─────────────────────────────────────
PORT="${DASHBOARD_PORT:-3000}" if lsof -i ":${PORT}" &>/dev/null 2>&1; then ALT_PORT=$((PORT + 1)) log_warn "Port ${PORT} is in use → use ${ALT_PORT} instead: export DASHBOARD_PORT=${ALT_PORT}" else log_pass "Port ${PORT} is available" fi
── 6. Initialize data/ directory ──────────────────────────────
mkdir -p ./data if [ -f ./data/metrics.jsonl ]; then BYTES=$(wc -c < ./data/metrics.jsonl || echo 0) if [ "$BYTES" -gt 10485760 ]; then log_warn "metrics.jsonl exceeds 10MB (${BYTES}B) → consider applying a rolling policy" echo " cp data/metrics.jsonl data/metrics-$(date +%Y%m%d).jsonl.bak && > data/metrics.jsonl" else log_pass "data/ ready (metrics.jsonl: ${BYTES}B)" fi else log_pass "data/ ready (new)" fi
── Summary ─────────────────────────────────────────────
echo "" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo -e "Result: ${GREEN}PASS $PASS${NC} / ${YELLOW}WARN $WARN${NC} / ${RED}FAIL $FAIL${NC}" if [ "$FAIL" -gt 0 ]; then echo -e "${RED}❌ Safety check failed. Resolve the FAIL items above and re-run.${NC}" exit 1 else echo -e "${GREEN}✅ Safety check passed. Continuing skill execution.${NC}" exit 0 fi SAFETY_EOF chmod +x safety-guard.sh
Run (halts immediately if any FAIL)
bash safety-guard.sh Step 1: Install Tokuin CLI and verify with dry-run
1-1. Install (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash
Windows PowerShell:
irm https://raw.githubusercontent.com/nooscraft/tokuin/main/install.ps1 | iex
1-2. Verify installation
tokuin --version which tokuin
expected: /usr/local/bin/tokuin or ~/.local/bin/tokuin
1-3. Basic token count test
echo "Hello, world!" | tokuin --model gpt-4
1-4. dry-run cost estimate (no API key needed ✅)
echo "Analyze user behavior patterns from the following data" | \ tokuin load-test \ --model gpt-4 \ --runs 50 \ --concurrency 5 \ --dry-run \ --estimate-cost \ --output-format json | python3 -m json.tool
Expected output structure:
{
"total_requests": 50,
"successful": 50,
"failed": 0,
"latency_ms": { "average": ..., "p50": ..., "p95": ... },
"cost":
}
1-5. Multi-model comparison (dry-run)
echo "Translate this to Korean" | tokuin --compare gpt-4 gpt-3.5-turbo claude-3-haiku --price
1-6. Verify Prometheus format output
echo "Benchmark" | tokuin load-test --model gpt-4 --runs 10 --dry-run --output-format prometheus
Expected: "# HELP", "# TYPE", metrics with "tokuin_" prefix
Step 2: Data collection pipeline with user context
2-1. Create prompt auto-categorization module
cat
categorize_prompt.py << 'PYEOF'
!/usr/bin/env python3
"""Auto-categorize prompts based on keywords""" import hashlib CATEGORIES = { "coding": ["code", "function", "class", "implement", "debug", "fix", "refactor"], "analysis": ["analyze", "compare", "evaluate", "assess"], "translation": ["translate", "translation"], "summary": ["summarize", "summary", "tldr", "brief"], "writing": ["write", "draft", "create", "generate"], "question": ["what is", "how to", "explain", "why"], "data": ["data", "table", "csv", "json", "sql"], } def categorize(prompt: str) -> str: p = prompt.lower() for cat, keywords in CATEGORIES.items(): if any(k in p for k in keywords): return cat return "other" def hash_prompt(prompt: str) -> str: """First 16 chars of SHA-256 (stored instead of raw text — privacy protection)""" return hashlib.sha256(prompt.encode()).hexdigest()[:16] def truncate_preview(prompt: str, limit: int = 100) -> str: return prompt[:limit] + ("…" if len(prompt) > limit else "") if name == "main": import sys prompt = sys.argv[1] if len(sys.argv) > 1 else "" print(categorize(prompt)) PYEOF
2-2. Create metrics collection script with user context
cat
collect-metrics.sh << 'COLLECT_EOF'
!/usr/bin/env bash
collect-metrics.sh — Run Tokuin and save with user context (dry-run by default)
set -euo pipefail
User info
USER_ID="${LLM_USER_ID:-$(whoami)}" USER_ALIAS="${LLM_USER_ALIAS:-$USER_ID}" SESSION_ID="${LLM_SESSION_ID:-$(date +%Y%m%d-%H%M%S)-$$}" PROMPT="${1:-Benchmark prompt}" MODEL="${MODEL:-gpt-4}" PROVIDER="${PROVIDER:-openai}" RUNS="${RUNS:-50}" CONCURRENCY="${CONCURRENCY:-5}" TAGS="${LLM_TAGS:-[]}" TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") CATEGORY=$(python3 categorize_prompt.py "$PROMPT" 2>/dev/null || echo "other") PROMPT_HASH=$(echo -n "$PROMPT" | sha256sum | cut -c1-16 2>/dev/null || echo "unknown") PROMPT_LEN=${#PROMPT}
Run Tokuin (dry-run by default)
RESULT=$(echo "$PROMPT" | tokuin load-test \ --model "$MODEL" \ --provider "$PROVIDER" \ --runs "$RUNS" \ --concurrency "$CONCURRENCY" \ --output-format json \ ${ALLOW_LIVE:+""} ${ALLOW_LIVE:-"--dry-run --estimate-cost"} 2>/dev/null)
Save to JSONL with user context
python3 - << PYEOF import json, sys result = json.loads('''${RESULT}''') latency = result.get("latency_ms", {}) cost = result.get("cost", {}) record = { "id": "${PROMPT_HASH}-${SESSION_ID}", "timestamp": "${TIMESTAMP}", "model": "${MODEL}", "provider": "${PROVIDER}", "user_id": "${USER_ID}", "user_alias": "${USER_ALIAS}", "session_id": "${SESSION_ID}", "prompt_hash": "${PROMPT_HASH}", "prompt_category": "${CATEGORY}", "prompt_length": ${PROMPT_LEN}, "tags": json.loads('${TAGS}'), "is_dry_run": True, "total_requests": result.get("total_requests", 0), "successful": result.get("successful", 0), "failed": result.get("failed", 0), "input_tokens": cost.get("input_tokens", 0), "output_tokens": cost.get("output_tokens", 0), "cost_usd": cost.get("total_cost", 0), "latency_avg_ms": latency.get("average", 0), "latency_p50_ms": latency.get("p50", 0), "latency_p95_ms": latency.get("p95", 0), "status_code": 200 if result.get("successful", 0) > 0 else 500, } with open("./data/metrics.jsonl", "a") as f: f.write(json.dumps(record, ensure_ascii=False) + "\n") print(f"✅ Saved: [{record['user_alias']}] {record['prompt_category']} | ${record['cost_usd']:.4f} | {record['latency_avg_ms']:.0f}ms") PYEOF COLLECT_EOF chmod +x collect-metrics.sh
2-3. Set up cron (auto-collect every 5 minutes)
( crontab -l 2
/dev/null ; echo "*/5 * * * * cd $( pwd ) && bash collect-metrics.sh 'Scheduled benchmark' >> ./data/collect.log 2>&1" ) | crontab - echo "✅ Cron registered (every 5 minutes)"
2-4. First collection test (dry-run)
bash collect-metrics.sh "Analyze user behavior patterns" cat ./data/metrics.jsonl | python3 -m json.tool | head -30 Step 3: Routing structure and dashboard frame Option A — Next.js (recommended)
3-1. Initialize Next.js project (skip this if adding to an existing project)
npx create-next-app@latest llm-dashboard \ --typescript \ --tailwind \ --app \ --no-src-dir cd llm-dashboard
3-2. Install dependencies
npm install recharts better-sqlite3 @types/better-sqlite3
3-3. Set design tokens (consistent tone and style)
cat
app/globals.css << 'CSS_EOF' :root { / Background layers / --bg-base: #0f1117; --bg-surface: #1a1d27; --bg-elevated: #21253a; --border: rgba(255, 255, 255, 0.06); / Text layers / --text-primary: #f1f5f9; --text-secondary: #94a3b8; --text-muted: #475569; / 3-level traffic light system (use consistently across all components) / --color-ok: #22c55e; / Normal — Green 500 / --color-warn: #f59e0b; / Warning — Amber 500 / --color-danger: #ef4444; / Danger — Red 500 / --color-neutral: #60a5fa; / Neutral — Blue 400 / / Data series colors (colorblind-friendly palette) / --series-1: #818cf8; / Indigo — System/GPT-4 / --series-2: #38bdf8; / Sky — User/Claude / --series-3: #34d399; / Emerald — Assistant/Gemini/ --series-4: #fb923c; / Orange — 4th series / / Cost-specific / --cost-input: #a78bfa; --cost-output: #f472b6; / Ranking colors / --rank-gold: #fbbf24; --rank-silver: #94a3b8; --rank-bronze: #b45309; --rank-inactive: #374151; / Typography / --font-mono: 'JetBrains Mono', 'Fira Code', monospace; --font-ui: 'Geist', 'Plus Jakarta Sans', system-ui, sans-serif; } body { background: var(--bg-base); color: var(--text-primary); font-family: var(--font-ui); } / Numbers: alignment stability / .metric-value { font-family: var(--font-mono); font-variant-numeric: tabular-nums; font-feature-settings: 'tnum'; } / KPI card accent-bar / .status-ok { border-left-color: var(--color-ok); } .status-warn { border-left-color: var(--color-warn); } .status-danger { border-left-color: var(--color-danger); } CSS_EOF
3-4. Create routing structure
mkdir -p app/admin/llm-monitoring mkdir -p app/admin/llm-monitoring/users mkdir -p "app/admin/llm-monitoring/users/[userId]" mkdir -p "app/admin/llm-monitoring/runs/[runId]" mkdir -p components/llm-monitoring mkdir -p lib/llm-monitoring
3-5. Initialize SQLite DB
cat
lib/llm-monitoring/db.ts << 'TS_EOF' import Database from 'better-sqlite3' import path from 'path' const DB_PATH = path.join(process.cwd(), 'data', 'monitoring.db') const db = new Database(DB_PATH) db.exec(
CREATE TABLE IF NOT EXISTS runs ( id TEXT PRIMARY KEY, timestamp DATETIME NOT NULL DEFAULT (datetime('now')), model TEXT NOT NULL, provider TEXT NOT NULL, user_id TEXT DEFAULT 'anonymous', user_alias TEXT DEFAULT 'anonymous', session_id TEXT, prompt_hash TEXT, prompt_category TEXT DEFAULT 'other', prompt_length INTEGER DEFAULT 0, tags TEXT DEFAULT '[]', is_dry_run INTEGER DEFAULT 1, total_requests INTEGER DEFAULT 0, successful INTEGER DEFAULT 0, failed INTEGER DEFAULT 0, input_tokens INTEGER DEFAULT 0, output_tokens INTEGER DEFAULT 0, cost_usd REAL DEFAULT 0, latency_avg_ms REAL DEFAULT 0, latency_p50_ms REAL DEFAULT 0, latency_p95_ms REAL DEFAULT 0, status_code INTEGER DEFAULT 200 ); CREATE TABLE IF NOT EXISTS user_profiles ( user_id TEXT PRIMARY KEY, user_alias TEXT NOT NULL, team TEXT DEFAULT '', role TEXT DEFAULT 'user', created_at DATETIME DEFAULT (datetime('now')), last_seen DATETIME, notes TEXT DEFAULT '' ); CREATE INDEX IF NOT EXISTS idx_runs_timestamp ON runs(timestamp DESC); CREATE INDEX IF NOT EXISTS idx_runs_user_id ON runs(user_id); CREATE INDEX IF NOT EXISTS idx_runs_model ON runs(model); CREATE VIEW IF NOT EXISTS user_stats AS SELECT user_id, user_alias, COUNT(*) AS total_runs, SUM(input_tokens + output_tokens) AS total_tokens, ROUND(SUM(cost_usd), 4) AS total_cost, ROUND(AVG(latency_avg_ms), 1) AS avg_latency, ROUND(AVG(CAST(successful AS REAL) / NULLIF(total_requests, 0) * 100), 1) AS success_rate, COUNT(DISTINCT model) AS models_used, MAX(timestamp) AS last_seen FROM runs GROUP BY user_id;) export default db TS_EOF Option B — Lightweight HTML (minimal dependencies)
Use this when there's no existing project or you need a quick prototype
mkdir -p llm-monitoring/data cat
llm-monitoring/index.html << 'HTML_EOF'
🧮 LLM Usage Monitoring
Powered by Tokuin CLI
Cost Trend Over Time
Category Distribution
🏆 User Ranking
Ranked by Cost
| Rank | User | Cost | Requests | Top Model | Success Rate | Last Active |
|---|---|---|---|---|---|---|
| Loading data... | ||||||
💤 Inactive Users
| User | Inactive For | Last Active | Status |
|---|---|---|---|
| No tracking data | |||
📊 PM Auto Insights
💡 Analyzing automatically...
HTML_EOF echo "✅ Lightweight HTML dashboard created: llm-monitoring/index.html"
Start local server
cd llm-monitoring && python3 -m http.server " ${DASHBOARD_PORT :- 3000} " & echo "✅ Dashboard running: http://localhost: ${DASHBOARD_PORT :- 3000} " Step 4: PM insights tab and ranking system (For Option A / Next.js)
Create PM dashboard API route
cat
app/api/ranking/route.ts << 'TS_EOF' import { NextRequest, NextResponse } from 'next/server' import db from '@/lib/llm-monitoring/db' export async function GET(req: NextRequest) { const period = req.nextUrl.searchParams.get('period') || '30d' const days = period === '7d' ? 7 : period === '90d' ? 90 : 30 // Cost-based ranking const costRanking = db.prepare(
SELECT user_id, user_alias, ROUND(SUM(cost_usd), 4) AS total_cost, COUNT(*) AS total_runs, GROUP_CONCAT(DISTINCT model) AS models_used, ROUND(AVG(latency_avg_ms), 0) AS avg_latency, ROUND( AVG(CAST(successful AS REAL) / NULLIF(total_requests, 0)) * 100, 1 ) AS success_rate, MAX(timestamp) AS last_seen FROM runs WHERE timestamp >= datetime('now', '-' || ? || ' days') GROUP BY user_id ORDER BY total_cost DESC LIMIT 20).all(days) // Inactive user tracking (registered users with no activity in the selected period) const inactiveUsers = db.prepare(SELECT p.user_id, p.user_alias, p.team, MAX(r.timestamp) AS last_seen, CAST((julianday('now') - julianday(MAX(r.timestamp))) AS INTEGER) AS days_inactive FROM user_profiles p LEFT JOIN runs r ON p.user_id = r.user_id GROUP BY p.user_id HAVING last_seen IS NULL OR days_inactive >= 7 ORDER BY days_inactive DESC).all() // PM summary const summary = db.prepare(SELECT COUNT(DISTINCT user_id) AS total_users, COUNT(DISTINCT CASE WHEN timestamp >= datetime('now', '-7 days') THEN user_id END) AS active_7d, ROUND(SUM(cost_usd), 2) AS total_cost, COUNT(*) AS total_runs FROM runs WHERE timestamp >= datetime('now', '-' || ? || ' days')).get(days) as Recordreturn NextResponse.json({ costRanking, inactiveUsers, summary }) } TS_EOF Step 5: Auto-generate weekly PM report cat generate-pm-report.sh << 'REPORT_EOF'
!/usr/bin/env bash
generate-pm-report.sh — Auto-generate weekly PM report (Markdown)
set -euo pipefail REPORT_DATE=$(date +"%Y-%m-%d") REPORT_WEEK=$(date +"%Y-W%V") OUTPUT_DIR="./reports" OUTPUT="${OUTPUT_DIR}/pm-weekly-${REPORT_DATE}.md" mkdir -p "$OUTPUT_DIR" python3 << PYEOF > "$OUTPUT" import json, sys from datetime import datetime, timedelta from collections import defaultdict
Load data from the last 7 days
try: records = [json.loads(l) for l in open('./data/metrics.jsonl') if l.strip()] except FileNotFoundError: records = [] week_ago = (datetime.now() - timedelta(days=7)).isoformat() week_data = [r for r in records if r.get('timestamp', '') >= week_ago]
Aggregate
total_cost = sum(r.get('cost_usd', 0) for r in week_data) total_runs = len(week_data) active_users = set(r['user_id'] for r in week_data) all_users = set(r['user_id'] for r in records) inactive_users = all_users - active_users
Per-user cost ranking
user_costs = defaultdict(lambda: {'cost': 0, 'runs': 0, 'alias': '', 'categories': defaultdict(int)}) for r in week_data: uid = r.get('user_id', 'unknown') user_costs[uid]['cost'] += r.get('cost_usd', 0) user_costs[uid]['runs'] += 1 user_costs[uid]['alias'] = r.get('user_alias', uid) user_costs[uid]['categories'][r.get('prompt_category', 'other')] += 1 top_users = sorted(user_costs.items(), key=lambda x: x[1]['cost'], reverse=True)[:5]
Model usage
model_usage = defaultdict(int) for r in week_data: model_usage[r.get('model', 'unknown')] += 1 top_model = max(model_usage, key=model_usage.get) if model_usage else '-'
Success rate
success_count = sum(1 for r in week_data if r.get('status_code', 200) == 200) success_rate = (success_count / total_runs * 100) if total_runs else 0 print(f"""# 📊 LLM Usage Weekly Report — {REPORT_DATE} ({REPORT_WEEK})
Executive Summary
| Metric | Value |
|---|---|
| Total Cost | \${total_cost:.2f} |
| Total Runs | {total_runs:,} |
| Active Users | {len(active_users)} |
| Adoption Rate | {len(active_users)}/{len(all_users)} ({len(active_users)/len(all_users)*100:.0f}% if all_users else 'N/A') |
| Success Rate | {success_rate:.1f}% |
| Top Model | {top_model} |
| ## 🏆 Top 5 Users (by Cost) | |
| Rank | User |
| ------ | ------ |
| {"".join(f" | {'🥇🥈🥉'[i] if i < 3 else i+1} |
| ## 💤 Inactive Users ({len(inactive_users)}) | |
| {"None — all users active within 7 days" if not inactive_users else chr(10).join(f"- {uid}" for uid in inactive_users)} | |
| ## 💡 PM Recommended Actions | |
| {"- " + str(len(inactive_users)) + " inactive user(s) — consider onboarding/support" if inactive_users else ""} | |
| {"- Success rate " + f"{success_rate:.1f}%" + " — SLA 95% " + ("achieved ✅" if success_rate >= 95 else "not met ⚠️ investigate error causes") } | |
| {"- Total cost \$" + f"{total_cost:.2f}" + " — review model optimization opportunities vs. prior week"} | |
| --- | |
| *Auto-generated by generate-pm-report.sh | Powered by Tokuin CLI* |
| """) | |
| PYEOF | |
| echo "✅ PM report generated: $OUTPUT" | |
| cat "$OUTPUT" | |
| # Slack notification (if configured) | |
| if [ -n "${SLACK_WEBHOOK_URL:-}" ]; then | |
| SUMMARY=$(grep -A5 "## Executive Summary" "$OUTPUT" | tail -5) |
| curl -s -X POST "$SLACK_WEBHOOK_URL" \ | |
| -H 'Content-type: application/json' \ | |
| -d "{\"text\":\"📊 Weekly LLM Report ($REPORT_DATE)\n$SUMMARY\"}" > /dev/null | |
| echo "✅ Slack notification sent" | |
| fi | |
| REPORT_EOF | |
| chmod | |
| +x generate-pm-report.sh | |
| # Schedule to run every Monday at 9am | |
| ( | |
| crontab | |
| -l | |
| 2 | |
| > | |
| /dev/null | |
| ; | |
| echo | |
| "0 9 * * 1 cd | |
| $( | |
| pwd | |
| ) | |
| && bash generate-pm-report.sh >> ./data/report.log 2>&1" | |
| ) | |
| crontab | |
| - | |
| echo | |
| "✅ Weekly report cron registered (every Monday 09:00)" | |
| # Run immediately for testing | |
| bash | |
| generate-pm-report.sh | |
| Step 6: Cost alert setup | |
| cat | |
| > | |
| check-alerts.sh | |
| << | |
| 'ALERT_EOF' | |
| #!/usr/bin/env bash | |
| # check-alerts.sh — Detect cost threshold breaches and send Slack alerts | |
| set -euo pipefail | |
| THRESHOLD="${COST_THRESHOLD_USD:-10.00}" | |
| CURRENT_COST=$(python3 << PYEOF | |
| import json | |
| from datetime import datetime, timedelta | |
| today = datetime.now().date().isoformat() | |
| try: | |
| records = [json.loads(l) for l in open('./data/metrics.jsonl') if l.strip()] | |
| today_cost = sum(r.get('cost_usd', 0) for r in records if r.get('timestamp', '')[:10] == today) | |
| print(f"{today_cost:.4f}") | |
| except: | |
| print("0.0000") | |
| PYEOF | |
| ) | |
| python3 - << PYEOF | |
| import sys | |
| cost, threshold = float('$CURRENT_COST'), float('$THRESHOLD') | |
| if cost > threshold: | |
| print(f"ALERT: Today's cost \${cost:.4f} has exceeded the threshold \${threshold:.2f}!") | |
| sys.exit(1) | |
| else: | |
| print(f"OK: Today's cost \${cost:.4f} / threshold \${threshold:.2f}") | |
| sys.exit(0) | |
| PYEOF | |
| # Send Slack alert on exit 1 | |
| if [ $? -ne 0 ] && [ -n "${SLACK_WEBHOOK_URL:-}" ]; then | |
| curl -s -X POST "$SLACK_WEBHOOK_URL" \ | |
| -H 'Content-type: application/json' \ | |
| -d "{\"text\":\"⚠️ LLM cost threshold exceeded!\nToday's cost: \$$CURRENT_COST / Threshold: \$$THRESHOLD\"}" > /dev/null | |
| fi | |
| ALERT_EOF | |
| chmod | |
| +x check-alerts.sh | |
| # Check cost every hour | |
| ( | |
| crontab | |
| -l | |
| 2 | |
| > | |
| /dev/null | |
| ; | |
| echo | |
| "0 * * * * cd | |
| $( | |
| pwd | |
| ) | |
| && bash check-alerts.sh >> ./data/alerts.log 2>&1" | |
| ) | |
| crontab | |
| - | |
| echo | |
| "✅ Cost alert cron registered (every hour)" | |
| Privacy Policy | |
| # Privacy policy (must be followed) | |
| prompt_storage | |
| : | |
| store_full_prompt | |
| : | |
| false | |
| # Default: do not store raw prompt text | |
| store_preview | |
| : | |
| false | |
| # Storing first 100 chars also disabled by default (requires explicit admin config) | |
| store_hash | |
| : | |
| true | |
| # Store SHA-256 hash only (for pattern analysis) | |
| user_data | |
| : | |
| anonymize_by_default | |
| : | |
| true | |
| # user_id can be stored as a hash (controlled via LLM_USER_ID env var) | |
| retention_days | |
| : | |
| 90 | |
| # Recommend purging data older than 90 days | |
| compliance | |
| : | |
| # Never log API keys in code, HTML, scripts, or log files. | |
| # Always add .env to .gitignore. | |
| # Restrict prompt preview access to admins only. | |
| ⚠️ | |
| Required steps when enabling | |
| store_preview: true | |
| Prompt preview storage can only be enabled after an | |
| admin explicitly | |
| completes the following steps: | |
| Set | |
| STORE_PREVIEW=true | |
| in the | |
| .env | |
| file (do not modify code directly) | |
| Obtain team consent for personal data processing (notify users that previews will be stored) | |
| Restrict access to | |
| admin role only | |
| (regular users must not be able to view) | |
| Set | |
| retention_days | |
| explicitly to define the retention period | |
| Enabling | |
| store_preview: true | |
| without completing these steps is a | |
| MUST NOT | |
| violation. | |
| Output Format | |
| Files generated after running the skill: | |
| ./ | |
| ├── safety-guard.sh # Safety gate (Step 0) | |
| ├── categorize_prompt.py # Prompt auto-categorization | |
| ├── collect-metrics.sh # Metrics collection (Step 2) | |
| ├── generate-pm-report.sh # PM weekly report (Step 5) | |
| ├── check-alerts.sh # Cost alerts (Step 6) | |
| │ | |
| ├── data/ | |
| │ ├── metrics.jsonl # Time-series metrics (JSONL format) | |
| │ ├── collect.log # Collection log | |
| │ ├── alerts.log # Alert log | |
| │ └── reports/ | |
| │ └── pm-weekly-YYYY-MM-DD.md # Auto-generated PM report | |
| │ | |
| ├── [If Next.js selected] | |
| │ ├── app/admin/llm-monitoring/page.tsx | |
| │ ├── app/admin/llm-monitoring/users/[userId]/page.tsx | |
| │ ├── app/api/runs/route.ts | |
| │ ├── app/api/ranking/route.ts | |
| │ ├── app/api/metrics/route.ts # Prometheus endpoint | |
| │ ├── components/llm-monitoring/ | |
| │ │ ├── KPICard.tsx | |
| │ │ ├── TrendChart.tsx | |
| │ │ ├── ModelCostBar.tsx | |
| │ │ ├── LatencyGauge.tsx | |
| │ │ ├── TokenDonut.tsx | |
| │ │ ├── RankingTable.tsx | |
| │ │ ├── InactiveUsers.tsx | |
| │ │ ├── PMInsights.tsx | |
| │ │ └── UserDetailPage.tsx | |
| │ └── lib/llm-monitoring/db.ts | |
| │ | |
| └── [If lightweight HTML selected] | |
| └── llm-monitoring/ | |
| ├── index.html # Single-file dashboard (charts + ranking + user detail) | |
| └── data/ | |
| └── metrics.jsonl | |
| Constraints | |
| MUST | |
| Always run Step 0 ( | |
| safety-guard.sh | |
| ) first | |
| Use | |
| --dry-run | |
| as the default; explicitly pass | |
| --allow-live | |
| for live API calls | |
| Manage API keys via environment variables or | |
| .env | |
| files | |
| Add | |
| .env | |
| to | |
| .gitignore | |
| : | |
| echo '.env' >> .gitignore | |
| Use the 3-level color system ( | |
| --color-ok | |
| , | |
| --color-warn | |
| , | |
| --color-danger | |
| ) consistently across all status indicators | |
| Implement drilldown navigation so clicking a user link opens their personal detail page | |
| Generate PM insights automatically from data (no hardcoding) | |
| MUST NOT | |
| Never hardcode API keys in source code, HTML, scripts, or log files | |
| Never set live API calls ( | |
| --allow-live | |
| ) as the default in automated scripts | |
| Never use arbitrary colors — always use design token CSS variables | |
| Never show status as text only — always pair with color and badge | |
| Never store raw prompt text in the database (hashes only) | |
| Examples | |
| Example 1: Quick start (dry-run, no API key needed) | |
| # 1. Safety check | |
| bash | |
| safety-guard.sh | |
| # 2. Install Tokuin | |
| curl | |
| -fsSL | |
| https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | |
| bash | |
| # 3. Collect sample data (dry-run) | |
| export | |
| LLM_USER_ID | |
| = | |
| "dev-alice" | |
| export | |
| LLM_USER_ALIAS | |
| = | |
| "Alice" | |
| bash | |
| collect-metrics.sh | |
| "Analyze user behavior patterns" | |
| bash | |
| collect-metrics.sh | |
| "Write a Python function to parse JSON" | |
| bash | |
| collect-metrics.sh | |
| "Translate this document to English" | |
| # 4. Run lightweight dashboard | |
| cd | |
| llm-monitoring | |
| && | |
| python3 | |
| -m | |
| http.server | |
| 3000 | |
| open | |
| http://localhost:3000 | |
| Example 2: Multi-user simulation (team test) | |
| # Simulate multiple users with dry-run | |
| for | |
| user | |
| in | |
| "alice" | |
| "backend" | |
| "analyst" | |
| "pm-charlie" | |
| ; | |
| do | |
| export | |
| LLM_USER_ID | |
| = | |
| " | |
| $user | |
| " | |
| export | |
| LLM_USER_ALIAS | |
| = | |
| " | |
| $user | |
| " | |
| for | |
| category | |
| in | |
| "coding" | |
| "analysis" | |
| "translation" | |
| ; | |
| do | |
| bash | |
| collect-metrics.sh | |
| " | |
| ${category} | |
| related prompt example" | |
| done | |
| done | |
| # Check results | |
| wc | |
| -l | |
| data/metrics.jsonl | |
| Example 3: Generate PM weekly report immediately | |
| bash | |
| generate-pm-report.sh | |
| cat | |
| reports/pm-weekly- | |
| $( | |
| date | |
| +%Y-%m-%d | |
| ) | |
| .md | |
| Example 4: Test cost alert | |
| export | |
| COST_THRESHOLD_USD | |
| = | |
| 0.01 | |
| # Low threshold for testing | |
| bash | |
| check-alerts.sh | |
| # Expected: ALERT message if cost exceeds threshold, otherwise "OK" | |
| References | |
| Tokuin GitHub | |
| : | |
| https://github.com/nooscraft/tokuin | |
| Tokuin install script | |
| : | |
| https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | |
| Adding models guide | |
| : | |
| https://github.com/nooscraft/tokuin/blob/main/ADDING_MODELS_GUIDE.md | |
| Provider roadmap | |
| : | |
| https://github.com/nooscraft/tokuin/blob/main/PROVIDERS_PLAN.md | |
| Contributing guide | |
| : | |
| https://github.com/nooscraft/tokuin/blob/main/CONTRIBUTING.md | |
| OpenRouter model catalog | |
| : | |
| https://openrouter.ai/models | |
| Korean blog guide | |
| : | |
| https://digitalbourgeois.tistory.com/m/2658 |