adf-master

安装量: 53
排名: #14116

安装

npx skills add https://github.com/josiahsiegel/claude-plugin-marketplace --skill adf-master

Azure Data Factory Master Knowledge Base 🚨 CRITICAL GUIDELINES Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes () in file paths, NOT forward slashes (/).

Examples:

❌ WRONG: D:/repos/project/file.tsx ✅ CORRECT: D:\repos\project\file.tsx

This applies to:

Edit tool file_path parameter Write tool file_path parameter All file operations on Windows systems Documentation Guidelines

NEVER create new documentation files unless explicitly requested by the user.

Priority: Update existing README.md files rather than creating new documentation Repository cleanliness: Keep repository root clean - only README.md unless user requests otherwise Style: Documentation should be concise, direct, and professional - avoid AI-generated tone User preference: Only create additional .md files when user specifically asks for documentation

This skill provides comprehensive reference information about Azure Data Factory, including official documentation sources, CI/CD deployment methods, and troubleshooting resources. Use this to access detailed ADF knowledge on-demand.

🚨 CRITICAL 2025 UPDATE: Deprecated Features### Apache Airflow Workflow Orchestration Manager - DEPRECATEDStatus: Available only for existing customers as of early 2025Retirement Date: Not yet announced, but feature is officially deprecatedImpact: New customers cannot provision Apache Airflow in Azure Data FactoryOfficial Deprecation Notice:- Apache Airflow Workflow Orchestration Manager is deprecated with no retirement date set- Only existing deployments can continue using this feature- No new Airflow integrations can be created in ADFMigration Path:- Recommended: Migrate to Fabric Data Factory with native Airflow support- Alternative: Use standalone Apache Airflow deployments (Azure Container Instances, AKS, or VM-based)- Alternative: Migrate orchestration logic to native ADF pipelines with control flow activitiesWhy Deprecated:- Microsoft focus shifted to Fabric Data Factory as the unified data integration platform- Fabric provides modern orchestration capabilities superseding Airflow integration- Limited adoption and maintenance burden for standalone Airflow feature in ADFAction Required:- If using Airflow in ADF: Plan migration within 12-18 months- For new projects: Do NOT use Airflow in ADF - use Fabric or native ADF patterns- Monitor Microsoft announcements for official retirement timelineReference:- Microsoft Roadmap: https://www.directionsonmicrosoft.com/roadmaps/ref/azure-data-factory-roadmap/## 🆕 2025 Feature Updates### Microsoft Fabric Integration (GA June 2025)ADF Mounting in Fabric:- Bring existing ADF pipelines into Fabric workspaces without rebuilding- General Availability as of June 2025- Seamless integration enables hybrid ADF + Fabric workflowsCross-Workspace Pipeline Orchestration:- New Invoke Pipeline activity supports cross-platform calls- Invoke pipelines across Fabric, Azure Data Factory, and Synapse- Managed VNet support for secure cross-workspace communicationVariable Libraries:- Environment-specific variables for CI/CD automation- Automatic value substitution during workspace promotion- Eliminates separate parameter files per environmentConnector Enhancements:- ServiceNow V2 (V1 End of Support)- Enhanced PostgreSQL and Snowflake connectors- Native OneLake connectivity for zero-copy integration### Node.js 20.x Requirement for CI/CDCRITICAL: As of 2025, npm package @microsoft/azure-data-factory-utilities requires Node.js 20.xBreaking Change:- Older Node.js versions (14.x, 16.x, 18.x) may cause package incompatibility errors- Update CI/CD pipelines to use Node.js 20.x or compatible versionsGitHub Actions:yaml- name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20.x'Azure DevOps:yaml- task: UseNode@1 inputs: version: '20.x' 🚨 CRITICAL 2025 UPDATE: Deprecated Features Apache Airflow Workflow Orchestration Manager - DEPRECATED

Status: Available only for existing customers as of early 2025 Retirement Date: Not yet announced, but feature is officially deprecated Impact: New customers cannot provision Apache Airflow in Azure Data Factory

Official Deprecation Notice:

Apache Airflow Workflow Orchestration Manager is deprecated with no retirement date set Only existing deployments can continue using this feature No new Airflow integrations can be created in ADF

Migration Path:

Recommended: Migrate to Fabric Data Factory with native Airflow support Alternative: Use standalone Apache Airflow deployments (Azure Container Instances, AKS, or VM-based) Alternative: Migrate orchestration logic to native ADF pipelines with control flow activities

Why Deprecated:

Microsoft focus shifted to Fabric Data Factory as the unified data integration platform Fabric provides modern orchestration capabilities superseding Airflow integration Limited adoption and maintenance burden for standalone Airflow feature in ADF

Action Required:

If using Airflow in ADF: Plan migration within 12-18 months For new projects: Do NOT use Airflow in ADF - use Fabric or native ADF patterns Monitor Microsoft announcements for official retirement timeline

Reference:

Microsoft Roadmap: https://www.directionsonmicrosoft.com/roadmaps/ref/azure-data-factory-roadmap/ 🆕 2025 Feature Updates Microsoft Fabric Integration (GA June 2025)

ADF Mounting in Fabric:

Bring existing ADF pipelines into Fabric workspaces without rebuilding General Availability as of June 2025 Seamless integration enables hybrid ADF + Fabric workflows

Cross-Workspace Pipeline Orchestration:

New Invoke Pipeline activity supports cross-platform calls Invoke pipelines across Fabric, Azure Data Factory, and Synapse Managed VNet support for secure cross-workspace communication

Variable Libraries:

Environment-specific variables for CI/CD automation Automatic value substitution during workspace promotion Eliminates separate parameter files per environment

Connector Enhancements:

ServiceNow V2 (V1 End of Support) Enhanced PostgreSQL and Snowflake connectors Native OneLake connectivity for zero-copy integration Node.js 20.x Requirement for CI/CD

CRITICAL: As of 2025, npm package @microsoft/azure-data-factory-utilities requires Node.js 20.x

Breaking Change:

Older Node.js versions (14.x, 16.x, 18.x) may cause package incompatibility errors Update CI/CD pipelines to use Node.js 20.x or compatible versions

GitHub Actions:

  • name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20.x'

Azure DevOps:

  • task: UseNode@1 inputs: version: '20.x'

Official Documentation Sources Primary Microsoft Learn Resources

Main Documentation Hub:

URL: https://learn.microsoft.com/en-us/azure/data-factory/ Last Updated: February 2025 Coverage: Complete ADF documentation including tutorials, concepts, how-to guides, and reference materials Key Topics: Pipelines, datasets, triggers, linked services, data flows, integration runtimes, monitoring

Introduction to Azure Data Factory:

URL: https://learn.microsoft.com/en-us/azure/data-factory/introduction Summary: Managed cloud service for complex hybrid ETL, ELT, and data integration projects Key Features: 90+ built-in connectors, serverless architecture, code-free UI, single-pane monitoring Context7 Library Documentation

Library ID: /websites/learn_microsoft_en-us_azure_data-factory

Trust Score: 7.5 Code Snippets: 10,839 Topics: CI/CD, ARM templates, pipeline patterns, data flows, monitoring, troubleshooting

How to Access:

Use Context7 MCP tool to fetch latest documentation: mcp__context7__get-library-docs: - context7CompatibleLibraryID: /websites/learn_microsoft_en-us_azure_data-factory - topic: "CI/CD continuous integration deployment pipelines ARM templates" - tokens: 8000

CI/CD Deployment Methods Modern Automated Approach (Recommended)

npm Package: @microsoft/azure-data-factory-utilities

Latest Version: 1.0.3+ (check npm for current version) npm URL: https://www.npmjs.com/package/@microsoft/azure-data-factory-utilities Node.js Requirement: Version 20.x or compatible

Key Features:

Validates ADF resources independently of service Generates ARM templates programmatically Enables true CI/CD without manual publish button Supports preview mode for selective trigger management

package.json Configuration:

{ "scripts": { "build": "node node_modules/@microsoft/azure-data-factory-utilities/lib/index", "build-preview": "node node_modules/@microsoft/azure-data-factory-utilities/lib/index --preview" }, "dependencies": { "@microsoft/azure-data-factory-utilities": "^1.0.3" } }

Commands:

Validate resources

npm run build validate

Generate ARM templates

npm run build export [outputFolder]

Preview mode (only stop/start modified triggers)

npm run build-preview export [outputFolder]

Official Documentation:

URL: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-improvements Last Updated: January 2025 Topics: Setup, configuration, build commands, CI/CD integration Traditional Manual Approach (Legacy)

Method: Git integration + Publish button

Process:

Configure Git integration in ADF UI (Dev environment only) Make changes in ADF Studio Click "Publish" button to generate ARM templates Templates saved to adf_publish branch Release pipelines deploy from adf_publish branch

When to Use:

Migrating from existing setup No build pipeline infrastructure Simple deployments without validation

Limitations:

Requires manual publish action No validation until publish Not true CI/CD (manual step required) Can't validate on pull requests

Migration Path: Modern approach recommended for new implementations

ARM Template Deployment PowerShell Deployment

Primary Command: New-AzResourceGroupDeployment

Syntax:

New-AzResourceGroupDeployment -ResourceGroupName "<resource-group-name>" -TemplateFile "ARMTemplateForFactory.json" -TemplateParameterFile "ARMTemplateParametersForFactory.<environment>.json" -factoryName "" -Mode Incremental -Verbose

Validation:

Test-AzResourceGroupDeployment -ResourceGroupName "<resource-group-name>" -TemplateFile "ARMTemplateForFactory.json" -TemplateParameterFile "ARMTemplateParametersForFactory.<environment>.json" -factoryName ""

What-If Analysis:

New-AzResourceGroupDeployment -ResourceGroupName "<resource-group-name>" -TemplateFile "ARMTemplateForFactory.json" -TemplateParameterFile "ARMTemplateParametersForFactory.<environment>.json" -factoryName "" ` -WhatIf

Azure CLI Deployment

Primary Command: az deployment group create

Syntax:

az deployment group create \ --resource-group \ --template-file ARMTemplateForFactory.json \ --parameters ARMTemplateParametersForFactory..json \ --parameters factoryName= \ --mode Incremental

Validation:

az deployment group validate \ --resource-group \ --template-file ARMTemplateForFactory.json \ --parameters ARMTemplateParametersForFactory..json \ --parameters factoryName=

What-If Analysis:

az deployment group what-if \ --resource-group \ --template-file ARMTemplateForFactory.json \ --parameters ARMTemplateParametersForFactory..json \ --parameters factoryName=

PrePostDeploymentScript Current Version: Ver2

Location: https://github.com/Azure/Azure-DataFactory/blob/main/SamplesV2/ContinuousIntegrationAndDelivery/PrePostDeploymentScript.Ver2.ps1

Key Improvement in Ver2:

Turns off/on ONLY triggers that have been modified Ver1 stopped/started ALL triggers (slower, more disruptive) Compares trigger payloads to determine changes

Download Command:

Linux/macOS/Git Bash

curl -o PrePostDeploymentScript.Ver2.ps1 https://raw.githubusercontent.com/Azure/Azure-DataFactory/main/SamplesV2/ContinuousIntegrationAndDelivery/PrePostDeploymentScript.Ver2.ps1

PowerShell

Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Azure/Azure-DataFactory/main/SamplesV2/ContinuousIntegrationAndDelivery/PrePostDeploymentScript.Ver2.ps1" -OutFile "PrePostDeploymentScript.Ver2.ps1"

Parameters

Pre-Deployment (Stop Triggers):

./PrePostDeploymentScript.Ver2.ps1 -armTemplate "<path-to-ARMTemplateForFactory.json>" -ResourceGroupName "" -DataFactoryName "<factory-name>" -predeployment $true ` -deleteDeployment $false

Post-Deployment (Start Triggers & Cleanup):

./PrePostDeploymentScript.Ver2.ps1 -armTemplate "<path-to-ARMTemplateForFactory.json>" -ResourceGroupName "" -DataFactoryName "<factory-name>" -predeployment $false ` -deleteDeployment $true

PowerShell Requirements

Version: PowerShell Core (7.0+) recommended

Azure DevOps: Use pwsh: true in AzurePowerShell@5 task Locally: Use pwsh command, not powershell

Modules Required:

Az.DataFactory Az.Resources

Official Documentation:

URL: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-sample-script Last Updated: January 2025 GitHub Actions CI/CD Official Resources

Medium Article (Recent 2025):

URL: https://medium.com/microsoftazure/azure-data-factory-build-and-deploy-with-new-ci-cd-flow-using-github-actions-cd46c95054e0 Author: Jared Zagelbaum (Microsoft Azure) Topics: Modern CI/CD flow, npm package usage, GitHub Actions setup

Microsoft Community Hub:

URL: https://techcommunity.microsoft.com/blog/fasttrackforazureblog/azure-data-factory-cicd-with-github-actions/3768493 Topics: End-to-end GitHub Actions setup, workload identity federation

Community Blog (February 2025):

URL: https://linusdata.blog/2025/03/14/automating-azure-data-factory-deployments-with-github-actions/ Topics: Practical implementation guide, troubleshooting tips Key GitHub Actions

Essential Actions:

actions/checkout@v4 - Checkout repository actions/setup-node@v4 - Setup Node.js actions/upload-artifact@v4 - Publish ARM templates actions/download-artifact@v4 - Download ARM templates in deploy workflow azure/login@v2 - Authenticate to Azure azure/arm-deploy@v2 - Deploy ARM templates azure/powershell@v2 - Run PrePostDeploymentScript Authentication Methods

Service Principal (JSON credentials):

{ "clientId": "", "clientSecret": "", "subscriptionId": "", "tenantId": "" }

Store in GitHub secret: AZURE_CREDENTIALS

Workload Identity Federation (More secure):

No secrets stored Uses OIDC (OpenID Connect) Recommended for production Setup: https://learn.microsoft.com/en-us/azure/developer/github/connect-from-azure Azure DevOps CI/CD Official Resources

Microsoft Learn:

URL: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-automate-azure-pipelines Topics: Build pipeline, release pipeline, service connections, variable groups

Community Guides:

Adam Marczak Blog: https://marczak.io/posts/2023/02/quick-cicd-for-data-factory/ Topics: Quick setup, best practices, folder structure

Towards Data Science:

URL: https://towardsdatascience.com/azure-data-factory-ci-cd-made-simple-building-and-deploying-your-arm-templates-with-azure-devops-30c30595afa5 Topics: ARM template build and deployment workflow Key Azure DevOps Tasks

Build Pipeline Tasks:

UseNode@1 - Install Node.js Npm@1 - Install packages, run build commands PublishPipelineArtifact@1 - Publish ARM templates

Release Pipeline Tasks:

DownloadPipelineArtifact@2 - Download ARM templates AzurePowerShell@5 - Run PrePostDeploymentScript AzureResourceManagerTemplateDeployment@3 - Deploy ARM template Service Connection Requirements

Permissions Needed:

Data Factory Contributor (on all Data Factories) Contributor (on Resource Groups) Key Vault access policies (if using secrets)

Configuration:

Project Settings → Service connections → New service connection Type: Azure Resource Manager Authentication: Service Principal (recommended) or Managed Identity Troubleshooting Resources Official Troubleshooting Guide

URL: https://learn.microsoft.com/en-us/azure/data-factory/ci-cd-github-troubleshoot-guide Last Updated: January 2025

Common Issues Covered:

Template parameter validation errors Integration Runtime type cannot be changed ARM template size exceeds 4MB limit Git connection problems Authentication failures Deployment errors Diagnostic Logs

Enable Diagnostic Settings:

Azure Portal → Data Factory → Diagnostic settings → Add diagnostic setting Send to: Log Analytics workspace

Logs to Enable: - PipelineRuns - TriggerRuns - ActivityRuns - SandboxPipelineRuns - SandboxActivityRuns

Kusto Queries for Troubleshooting:

// Failed pipeline runs in last 24 hours ADFPipelineRun | where Status == "Failed" | where TimeGenerated > ago(24h) | project TimeGenerated, PipelineName, RunId, Status, ErrorMessage, Parameters | order by TimeGenerated desc

// Failed CI/CD deployments ADFActivityRun | where ActivityType == "ExecutePipeline" | where Status == "Failed" | where TimeGenerated > ago(7d) | project TimeGenerated, PipelineName, ActivityName, ErrorCode, ErrorMessage | order by TimeGenerated desc

// Performance analysis ADFActivityRun | where TimeGenerated > ago(7d) | extend DurationMinutes = datetime_diff('minute', End, Start) | summarize AvgDuration = avg(DurationMinutes) by ActivityType, ActivityName | where AvgDuration > 10 | order by AvgDuration desc

Common Error Patterns

Error: "Template parameters are not valid"

Cause: Deleted triggers still referenced in parameters Solution: Regenerate ARM template or use PrePostDeploymentScript cleanup

Error: "Updating property type is not supported"

Cause: Trying to change Integration Runtime type Solution: Delete and recreate IR (not in-place update)

Error: "Operation timed out"

Cause: Network connectivity, large data volume, insufficient compute Solution: Increase timeout, optimize query, increase DIUs

Error: "Authentication failed"

Cause: Service principal expired, missing permissions, wrong credentials Solution: Verify credentials, check role assignments, renew if expired Best Practices Repository Structure

Recommended Folder Layout:

repository-root/ ├── adf-resources/ # ADF JSON files (if using npm approach) │ ├── dataset/ │ ├── pipeline/ │ ├── trigger/ │ ├── linkedService/ │ └── integrationRuntime/ ├── .github/ │ └── workflows/ # GitHub Actions workflows │ ├── adf-build.yml │ └── adf-deploy.yml ├── azure-pipelines/ # Azure DevOps pipelines │ ├── build.yml │ └── release.yml ├── parameters/ # Environment-specific parameters │ ├── ARMTemplateParametersForFactory.dev.json │ ├── ARMTemplateParametersForFactory.test.json │ └── ARMTemplateParametersForFactory.prod.json ├── package.json # npm configuration └── README.md

Git Configuration

Only Configure Git on Development ADF:

Development: Git-integrated for source control Test: CI/CD deployment only (no Git) Production: CI/CD deployment only (no Git)

Rationale: Prevents accidental manual changes in higher environments

Multi-Environment Strategy Environment Flow: Dev (Git) → Build → Test → Approval → Production ↓ ARM Templates

Parameter Management:

Separate parameter file per environment Store secrets in Azure Key Vault Reference Key Vault in parameter files Never commit secrets to source control Monitoring and Alerting

Set up alerts for:

Build pipeline failures Deployment failures Pipeline run failures Performance degradation Cost anomalies

Recommended Tools:

Azure Monitor (Metrics and Alerts) Log Analytics (Kusto queries) Application Insights (for custom logging) Azure Advisor (optimization recommendations) Additional Resources GitHub Repositories

Official Azure Data Factory Samples:

URL: https://github.com/Azure/Azure-DataFactory Path: SamplesV2/ContinuousIntegrationAndDelivery/ Contents: PrePostDeploymentScript.Ver2.ps1, example pipelines, documentation

Community Examples:

Search GitHub for "azure-data-factory-cicd" for real-world examples Many organizations publish their CI/CD patterns as reference Community Support

Microsoft Q&A:

URL: https://learn.microsoft.com/en-us/answers/tags/130/azure-data-factory Active community, Microsoft employees respond

Stack Overflow:

Tag: azure-data-factory Large knowledge base of resolved issues

Azure Status:

URL: https://status.azure.com Check for service outages and incidents When to Fetch Latest Information

Situations requiring current documentation:

npm package version updates New ADF features or activities Changes to ARM template schema Updates to PrePostDeploymentScript New GitHub Actions or Azure DevOps tasks Breaking changes or deprecations

How to Fetch:

Use WebFetch for Microsoft Learn articles Check npm for latest package version Use Context7 for comprehensive topic coverage Review Azure Data Factory GitHub for script updates

This knowledge base should be your starting point for all Azure Data Factory questions. Always verify critical information with the latest official documentation when making production decisions.

Progressive Disclosure References

For detailed JSON schemas and complete reference materials, see:

Activity Types: references/activity-types.md - Complete JSON schemas for all activity types (Copy, ForEach, IfCondition, Switch, Until, Lookup, ExecutePipeline, WebActivity, DatabricksJob, SetVariable, AppendVariable, Wait, Fail, GetMetadata) Expression Functions: references/expression-functions.md - Complete reference for all ADF expression functions (string, collection, logical, conversion, math, date/time, pipeline/activity references) Linked Services: references/linked-services.md - Complete JSON configurations for all connector types (Blob Storage, ADLS Gen2, Azure SQL, Synapse, Fabric Lakehouse/Warehouse, Databricks, Key Vault, REST, SFTP, Snowflake, PostgreSQL) Triggers: references/triggers.md - Complete JSON schemas for schedule, tumbling window, and event triggers Datasets: references/datasets.md - Complete JSON schemas for all dataset types with parameterization patterns

返回排行榜