tracing-downstream-lineage

安装量: 427
排名: #2334

安装

npx skills add https://github.com/astronomer/agents --skill tracing-downstream-lineage

Downstream Lineage: Impacts

Answer the critical question: "What breaks if I change this?"

Use this BEFORE making changes to understand the blast radius.

Impact Analysis Step 1: Identify Direct Consumers

Find everything that reads from this target:

For Tables:

Search DAG source code: Look for DAGs that SELECT from this table

Use list_dags to get all DAGs Use get_dag_source to search for table references Look for: FROM target_table, JOIN target_table

Check for dependent views:

-- Snowflake SELECT * FROM information_schema.view_table_usage WHERE table_name = ''

-- Or check SHOW VIEWS and search definitions

Look for BI tool connections:

Dashboards often query tables directly Check for common BI patterns in table naming (rpt_, dashboard_)

For DAGs:

Check what the DAG produces: Use get_dag_source to find output tables Then trace those tables' consumers (recursive) Step 2: Build Dependency Tree

Map the full downstream impact:

SOURCE: fct.orders | +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs | | | +-- TABLE: rpt.monthly_summary --> Email: Monthly Report | +-- TABLE: ml.order_features --> Model: Demand Forecasting | +-- DIRECT: Looker Dashboard "Sales Overview"

Step 3: Categorize by Criticality

Critical (breaks production):

Production dashboards Customer-facing applications Automated reports to executives ML models in production Regulatory/compliance reports

High (causes significant issues):

Internal operational dashboards Analyst workflows Data science experiments Downstream ETL jobs

Medium (inconvenient):

Ad-hoc analysis tables Development/staging copies Historical archives

Low (minimal impact):

Deprecated tables Unused datasets Test data Step 4: Assess Change Risk

For the proposed change, evaluate:

Schema Changes (adding/removing/renaming columns):

Which downstream queries will break? Are there SELECT * patterns that will pick up new columns? Which transformations reference the changing columns?

Data Changes (values, volumes, timing):

Will downstream aggregations still be valid? Are there NULL handling assumptions that will break? Will timing changes affect SLAs?

Deletion/Deprecation:

Full dependency tree must be migrated first Communication needed for all stakeholders Step 5: Find Stakeholders

Identify who owns downstream assets:

DAG owners: Check owners field in DAG definitions Dashboard owners: Usually in BI tool metadata Team ownership: Look for team naming patterns or documentation Output: Impact Report Summary

"Changing fct.orders will impact X tables, Y DAGs, and Z dashboards"

Impact Diagram +--> [agg.daily_sales] --> [Executive Dashboard] | [fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email] | +--> [ml.features] --> [Demand Model]

Detailed Impacts Downstream Type Criticality Owner Notes agg.daily_sales Table Critical data-eng Updated hourly Executive Dashboard Dashboard Critical analytics CEO views daily ml.order_features Table High ml-team Retraining weekly Risk Assessment Change Type Risk Level Mitigation Add column Low No action needed Rename column High Update 3 DAGs, 2 dashboards Delete column Critical Full migration plan required Change data type Medium Test downstream aggregations Recommended Actions

Before making changes:

Notify owners: @data-eng, @analytics, @ml-team Update downstream DAG: transform_daily_sales Test dashboard: Executive KPIs Schedule change during low-impact window Related Skills Trace where data comes from: tracing-upstream-lineage skill Check downstream freshness: checking-freshness skill Debug any broken DAGs: debugging-dags skill

返回排行榜