sap-hana-cloud-data-intelligence

安装量: 35
排名: #19555

安装

npx skills add https://github.com/secondsky/sap-skills --skill sap-hana-cloud-data-intelligence
SAP HANA Cloud Data Intelligence Skill
This skill provides comprehensive guidance for developing with SAP Data Intelligence Cloud, including pipeline creation, operator development, data integration, and machine learning scenarios.
Table of Contents
When to Use This Skill
Core Concepts
Quick Start Patterns
Common Tasks
Bundled Resources
When to Use This Skill
Use this skill when:
Creating or modifying data processing graphs/pipelines
Developing custom operators (Gen1 or Gen2)
Integrating ABAP-based SAP systems (S/4HANA, BW)
Building replication flows for data movement
Developing ML scenarios with ML Scenario Manager
Working with JupyterLab in Data Intelligence
Using Data Transformation Language (DTL) functions
Configuring subengines (Python, Node.js, C++)
Working with structured data operators
Core Concepts
Graphs (Pipelines)
Graphs are networks of operators connected via typed input/output ports for data transfer.
Two Generations:
Gen1 Operators
Legacy operators, broad compatibility
Gen2 Operators
Enhanced error recovery, state management, snapshots
Critical Rule
Graphs cannot mix Gen1 and Gen2 operators - choose one generation per graph.
Gen2 Advantages:
Automatic error recovery with snapshots
State management with periodic checkpoints
Native multiplexing (one-to-many, many-to-one)
Improved Python3 operator
Operators
Building blocks that process data within graphs. Each operator has:
Ports
Typed input/output connections for data flow
Configuration
Parameters that control behavior
Runtime
Engine that executes the operator
Operator Categories:
Messaging (Kafka, MQTT, NATS)
Storage (Files, HDFS, S3, Azure, GCS)
Database (HANA, SAP BW, SQL)
Script (Python, JavaScript, R, Go)
Data Processing (Transform, Anonymize, Validate)
Machine Learning (TensorFlow, PyTorch, HANA ML)
Integration (OData, REST, SAP CPI)
Workflow (Pipeline, Data Workflow)
Subengines
Subengines enable operators to run on different runtimes within the same graph.
Supported Subengines:
ABAP
For ABAP Pipeline Engine operators
Python 3.9
For Python-based operators
Node.js
For JavaScript-based operators
C++
For high-performance native operators
Key Benefit
Connected operators on the same subengine run in a single OS process for optimal performance.
Trade-off
Cross-engine communication requires serialization/deserialization overhead.
Quick Start Patterns
Basic Graph Creation
1. Open SAP Data Intelligence Modeler
2. Create new graph
3. Add operators from repository
4. Connect operator ports (matching types)
5. Configure operator parameters
6. Validate graph
7. Execute and monitor
Replication Flow Pattern
1. Create replication flow in Modeler
2. Configure source connection (ABAP, HANA, etc.)
3. Configure target (HANA Cloud, S3, Kafka, etc.)
4. Add tasks with source objects
5. Define filters and mappings
6. Validate flow
7. Deploy to tenant repository
8. Run and monitor
Delivery Guarantees:
Default: At-least-once (may have duplicates)
With UPSERT to databases: Exactly-once
For cloud storage: Use "Suppress Duplicates" option
ML Scenario Pattern
1. Open ML Scenario Manager from launchpad
2. Create new scenario
3. Add datasets (register data sources)
4. Create Jupyter notebooks for experiments
5. Build training pipelines
6. Track metrics with Metrics Explorer
7. Version scenario for reproducibility
8. Deploy model pipeline
Common Tasks
ABAP System Integration
For integrating ABAP-based SAP systems:
Prerequisites
Configure Cloud Connector for on-premise systems
Connection Setup
Create ABAP connection in Connection Management
Metadata Access
Use Metadata Explorer for object discovery
Data Sources
CDS Views, ODP (Operational Data Provisioning), Tables
Reference
See
references/abap-integration.md
for detailed setup.
Structured Data Processing
Use structured data operators for SQL-like transformations:
Data Transform
Visual SQL editor for complex transformations
Aggregation Node
GROUP BY with aggregation functions
Join Node
INNER, LEFT, RIGHT, FULL joins
Projection Node
Column selection and renaming
Union Node
Combine multiple datasets
Case Node
Conditional logic
Reference
See
references/structured-data-operators.md
for configuration.
Data Transformation Language
DTL provides SQL-like functions for data processing:
Function Categories:
String: CONCAT, SUBSTRING, UPPER, LOWER, TRIM, REPLACE
Numeric: ABS, CEIL, FLOOR, ROUND, MOD, POWER
Date/Time: ADD_DAYS, MONTHS_BETWEEN, EXTRACT, CURRENT_UTCTIMESTAMP
Conversion: TO_DATE, TO_STRING, TO_INTEGER, TO_DECIMAL
Miscellaneous: CASE, COALESCE, IFNULL, NULLIF
Reference
See
references/dtl-functions.md
for complete reference.
Best Practices
Graph Design
Choose Generation Early
Decide Gen1 vs Gen2 before building
Minimize Cross-Engine Communication
Group operators by subengine
Use Appropriate Port Types
Match data types for efficient transfer
Enable Snapshots
For Gen2 graphs, enable auto-recovery
Validate Before Execution
Always validate graphs
Operator Development
Start with Built-in Operators
Use predefined operators first
Extend When Needed
Create custom operators for specific needs
Use Script Operators
For quick prototyping with Python/JS
Version Your Operators
Track changes with operator versions
Document Configuration
Describe all parameters
Replication Flows
Plan Target Schema
Understand target structure requirements
Use Filters
Reduce data volume with source filters
Handle Duplicates
Configure for exactly-once when possible
Monitor Execution
Track progress and errors
Clean Up Artifacts
Remove source artifacts after completion
ML Scenarios
Version Early
Create versions before major changes
Track All Metrics
Use SDK for comprehensive tracking
Use Notebooks for Exploration
JupyterLab for experimentation
Productionize with Pipelines
Convert notebooks to pipelines
Export/Import for Migration
Use ZIP export for transfers
Error Handling
Common Graph Errors
Error
Cause
Solution
Port type mismatch
Incompatible data types
Use converter operator or matching types
Gen1/Gen2 mixing
Combined operator generations
Use single generation per graph
Resource exhaustion
Insufficient memory/CPU
Adjust resource requirements
Connection failure
Network or credentials
Verify connection settings
Validation errors
Invalid configuration
Review error messages, fix config
Recovery Strategies
Gen2 Graphs:
Enable automatic recovery in graph settings
Configure snapshot intervals
Monitor recovery status
Gen1 Graphs:
Implement manual error handling in operators
Use try-catch in script operators
Configure retry logic
Reference Files
For detailed information, see:
references/operators-reference.md
- Complete operator catalog (266 operators)
references/abap-integration.md
- ABAP/S4HANA/BW integration with SAP Notes
references/structured-data-operators.md
- Structured data processing
references/dtl-functions.md
- Data Transformation Language (79 functions)
references/ml-scenario-manager.md
- ML Scenario Manager, SDK, artifacts
references/subengines.md
- Python, Node.js, C++ subengine development
references/graphs-pipelines.md
- Graph execution, snapshots, recovery
references/replication-flows.md
- Replication flows, cloud storage, Kafka
references/data-workflow.md
- Data workflow operators, orchestration
references/security-cdc.md
- Security, data protection, CDC methods
references/additional-features.md
- Monitoring, cloud storage services, scenario templates, data types, Git terminal
references/modeling-advanced.md
- Graph snippets, SAP cloud apps, configuration types, 141 graph templates
Templates
Starter templates are available in
templates/
:
templates/basic-graph.json
- Simple data processing graph
templates/replication-flow.json
- Data replication pattern
templates/ml-training-pipeline.json
- ML training workflow
Documentation Links
Primary Sources:
GitHub Docs:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs
SAP Help Portal:
https://help.sap.com/docs/SAP_DATA_INTELLIGENCE
SAP Developer Center:
https://developers.sap.com/topics/data-intelligence.html
Section-Specific:
Modeling Guide:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide
Bundled Resources
Reference Documentation
references/abap-integration.md
- ABAP system integration guide
references/ml-scenario-manager.md
- Machine Learning scenario manager
references/replication-flows.md
- Data replication flow configuration
references/operators-reference.md
- Complete operators reference
references/dtl-functions.md
- Data Transformation Language functions
references/modeling-advanced.md
- Advanced modeling techniques
references/structured-data-operators.md
- Structured data operators guide
Documentation Links
ABAP Integration:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/abapintegration
Machine Learning:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/machinelearning
Function Reference:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/functionreference
Repository Objects:
https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects
Version Information
Skill Version
1.0.0
Last Updated
2025-11-27
Documentation Source
SAP-docs/sap-hana-cloud-data-intelligence (GitHub)
返回排行榜