Data Analysis

Answer business questions by querying the data warehouse. The kernel starts automatically on first use.

Prerequisites

uv must be installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Scripts are located relative to this skill file.

MANDATORY FIRST STEP

Before any other action, check for cached patterns:

uv run scripts/cli.py pattern lookup ""

This is NON-NEGOTIABLE. Patterns contain proven strategies that save time and avoid failed queries.

Workflow Analysis Progress: - [ ] Step 1: pattern lookup (check for cached strategy) - [ ] Step 2: concept lookup (check for known tables) - [ ] Step 3: Search codebase for table definitions (Grep) - [ ] Step 4: Read SQL file to get table/column names - [ ] Step 5: Execute query via kernel (run_sql) - [ ] Step 6: learn_concept (ALWAYS before presenting results) - [ ] Step 7: learn_pattern (ALWAYS if discovery required) - [ ] Step 8: record_pattern_outcome (if you used a pattern in Step 1) - [ ] Step 9: Present findings to user

CLI Commands Kernel Management uv run scripts/cli.py start # Start kernel with Snowflake uv run scripts/cli.py exec "..." # Execute Python code uv run scripts/cli.py status # Check kernel status uv run scripts/cli.py restart # Restart kernel uv run scripts/cli.py stop # Stop kernel uv run scripts/cli.py install plotly # Install additional packages

Concept Cache (concept -> table mappings)

Look up a concept

uv run scripts/cli.py concept lookup customers

Learn a new concept

uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

List all concepts

uv run scripts/cli.py concept list

Import concepts from warehouse.md

uv run scripts/cli.py concept import -p /path/to/warehouse.md

Pattern Cache (query strategies)

Look up patterns for a question

uv run scripts/cli.py pattern lookup "who uses operator X"

Learn a new pattern

uv run scripts/cli.py pattern learn operator_usage \ -q "who uses X operator" \ -q "which customers use X" \ -s "1. Query TASK_RUNS for operator_class" \ -s "2. Join with ORGS on org_id" \ -t "HQ.MODEL_ASTRO.TASK_RUNS" \ -t "HQ.MODEL_ASTRO.ORGANIZATIONS" \ -g "TASK_RUNS is huge - always filter by date"

Record pattern outcome

uv run scripts/cli.py pattern record operator_usage --success

List all patterns

uv run scripts/cli.py pattern list

Delete a pattern

uv run scripts/cli.py pattern delete operator_usage

Table Schema Cache

Look up cached table schema

uv run scripts/cli.py table lookup HQ.MART_CUST.CURRENT_ASTRO_CUSTS

Cache a table schema

uv run scripts/cli.py table cache DB.SCHEMA.TABLE -c '[{"name":"id","type":"INT"}]'

List all cached tables

uv run scripts/cli.py table list

Delete from cache

uv run scripts/cli.py table delete DB.SCHEMA.TABLE

Cache Management

View cache statistics

uv run scripts/cli.py cache status

Clear all caches

uv run scripts/cli.py cache clear

Clear only stale entries (older than 90 days)

uv run scripts/cli.py cache clear --stale-only

Quick Start Example

1. Check for existing patterns

uv run scripts/cli.py pattern lookup "how many customers"

2. Check for known concepts

uv run scripts/cli.py concept lookup customers

3. Execute query

uv run scripts/cli.py exec "df = run_sql('SELECT COUNT(*) FROM HQ.MART_CUST.CURRENT_ASTRO_CUSTS')" uv run scripts/cli.py exec "print(df)"

4. Cache what we learned

uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

Available Functions in Kernel

Once kernel starts, these are available:

Function Description run_sql(query, limit=100) Execute SQL, return Polars DataFrame run_sql_pandas(query, limit=100) Execute SQL, return Pandas DataFrame pl Polars library (imported) pd Pandas library (imported) Table Discovery via Codebase

If concept/pattern cache miss, search the codebase:

Grep pattern="" glob="*/.sql"

Repo Type Where to Look Gusty dags/declarative/04_metric/, 06_reporting/, 05_mart/ dbt models/marts/, models/staging/ Known Tables Quick Reference Concept Table Key Column Date Column customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS ACCT_ID - organizations HQ.MODEL_ASTRO.ORGANIZATIONS ORG_ID CREATED_TS deployments HQ.MODEL_ASTRO.DEPLOYMENTS DEPLOYMENT_ID CREATED_TS task_runs HQ.MODEL_ASTRO.TASK_RUNS - START_TS dag_runs HQ.MODEL_ASTRO.DAG_RUNS - START_TS users HQ.MODEL_ASTRO.USERS USER_ID - accounts HQ.MODEL_CRM.SF_ACCOUNTS ACCT_ID -

Large tables (always filter by date): TASK_RUNS (6B rows), DAG_RUNS (500M rows)

Query Tips Use LIMIT during exploration Filter early with WHERE clauses Prefer pre-aggregated tables (METRICS_, MART_, AGG_*) For 100M+ row tables: no JOINs or GROUP BY on first query Reference reference/discovery-warehouse.md - Large table handling, warehouse discovery

analyzing-data

安装

Look up a concept

Learn a new concept

List all concepts

Import concepts from warehouse.md

Look up patterns for a question

Learn a new pattern

Record pattern outcome

List all patterns

Delete a pattern

Look up cached table schema

Cache a table schema

List all cached tables

Delete from cache

View cache statistics

Clear all caches

Clear only stale entries (older than 90 days)

1. Check for existing patterns

2. Check for known concepts

3. Execute query

4. Cache what we learned