analyzing-data

安装量: 468
排名: #2213

安装

npx skills add https://github.com/astronomer/agents --skill analyzing-data

Data Analysis

Answer business questions by querying the data warehouse. The kernel starts automatically on first use.

Prerequisites

uv must be installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Scripts are located relative to this skill file.

MANDATORY FIRST STEP

Before any other action, check for cached patterns:

uv run scripts/cli.py pattern lookup ""

This is NON-NEGOTIABLE. Patterns contain proven strategies that save time and avoid failed queries.

Workflow Analysis Progress: - [ ] Step 1: pattern lookup (check for cached strategy) - [ ] Step 2: concept lookup (check for known tables) - [ ] Step 3: Search codebase for table definitions (Grep) - [ ] Step 4: Read SQL file to get table/column names - [ ] Step 5: Execute query via kernel (run_sql) - [ ] Step 6: learn_concept (ALWAYS before presenting results) - [ ] Step 7: learn_pattern (ALWAYS if discovery required) - [ ] Step 8: record_pattern_outcome (if you used a pattern in Step 1) - [ ] Step 9: Present findings to user

CLI Commands Kernel Management uv run scripts/cli.py start # Start kernel with Snowflake uv run scripts/cli.py exec "..." # Execute Python code uv run scripts/cli.py status # Check kernel status uv run scripts/cli.py restart # Restart kernel uv run scripts/cli.py stop # Stop kernel uv run scripts/cli.py install plotly # Install additional packages

Concept Cache (concept -> table mappings)

Look up a concept

uv run scripts/cli.py concept lookup customers

Learn a new concept

uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

List all concepts

uv run scripts/cli.py concept list

Import concepts from warehouse.md

uv run scripts/cli.py concept import -p /path/to/warehouse.md

Pattern Cache (query strategies)

Look up patterns for a question

uv run scripts/cli.py pattern lookup "who uses operator X"

Learn a new pattern

uv run scripts/cli.py pattern learn operator_usage \ -q "who uses X operator" \ -q "which customers use X" \ -s "1. Query TASK_RUNS for operator_class" \ -s "2. Join with ORGS on org_id" \ -t "HQ.MODEL_ASTRO.TASK_RUNS" \ -t "HQ.MODEL_ASTRO.ORGANIZATIONS" \ -g "TASK_RUNS is huge - always filter by date"

Record pattern outcome

uv run scripts/cli.py pattern record operator_usage --success

List all patterns

uv run scripts/cli.py pattern list

Delete a pattern

uv run scripts/cli.py pattern delete operator_usage

Table Schema Cache

Look up cached table schema

uv run scripts/cli.py table lookup HQ.MART_CUST.CURRENT_ASTRO_CUSTS

Cache a table schema

uv run scripts/cli.py table cache DB.SCHEMA.TABLE -c '[{"name":"id","type":"INT"}]'

List all cached tables

uv run scripts/cli.py table list

Delete from cache

uv run scripts/cli.py table delete DB.SCHEMA.TABLE

Cache Management

View cache statistics

uv run scripts/cli.py cache status

Clear all caches

uv run scripts/cli.py cache clear

Clear only stale entries (older than 90 days)

uv run scripts/cli.py cache clear --stale-only

Quick Start Example

1. Check for existing patterns

uv run scripts/cli.py pattern lookup "how many customers"

2. Check for known concepts

uv run scripts/cli.py concept lookup customers

3. Execute query

uv run scripts/cli.py exec "df = run_sql('SELECT COUNT(*) FROM HQ.MART_CUST.CURRENT_ASTRO_CUSTS')" uv run scripts/cli.py exec "print(df)"

4. Cache what we learned

uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

Available Functions in Kernel

Once kernel starts, these are available:

Function Description run_sql(query, limit=100) Execute SQL, return Polars DataFrame run_sql_pandas(query, limit=100) Execute SQL, return Pandas DataFrame pl Polars library (imported) pd Pandas library (imported) Table Discovery via Codebase

If concept/pattern cache miss, search the codebase:

Grep pattern="" glob="*/.sql"

Repo Type Where to Look Gusty dags/declarative/04_metric/, 06_reporting/, 05_mart/ dbt models/marts/, models/staging/ Known Tables Quick Reference Concept Table Key Column Date Column customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS ACCT_ID - organizations HQ.MODEL_ASTRO.ORGANIZATIONS ORG_ID CREATED_TS deployments HQ.MODEL_ASTRO.DEPLOYMENTS DEPLOYMENT_ID CREATED_TS task_runs HQ.MODEL_ASTRO.TASK_RUNS - START_TS dag_runs HQ.MODEL_ASTRO.DAG_RUNS - START_TS users HQ.MODEL_ASTRO.USERS USER_ID - accounts HQ.MODEL_CRM.SF_ACCOUNTS ACCT_ID -

Large tables (always filter by date): TASK_RUNS (6B rows), DAG_RUNS (500M rows)

Query Tips Use LIMIT during exploration Filter early with WHERE clauses Prefer pre-aggregated tables (METRICS_, MART_, AGG_*) For 100M+ row tables: no JOINs or GROUP BY on first query Reference reference/discovery-warehouse.md - Large table handling, warehouse discovery

返回排行榜