This skill provides comprehensive guidance for developing with SAP Data Intelligence Cloud, including pipeline creation, operator development, data integration, and machine learning scenarios.
Table of Contents
When to Use This Skill
Core Concepts
Quick Start Patterns
Common Tasks
Bundled Resources
When to Use This Skill
Use this skill when:
Creating or modifying data processing graphs/pipelines
Developing custom operators (Gen1 or Gen2)
Integrating ABAP-based SAP systems (S/4HANA, BW)
Building replication flows for data movement
Developing ML scenarios with ML Scenario Manager
Working with JupyterLab in Data Intelligence
Using Data Transformation Language (DTL) functions
Configuring subengines (Python, Node.js, C++)
Working with structured data operators
Core Concepts
Graphs (Pipelines)
Graphs are networks of operators connected via typed input/output ports for data transfer.
Two Generations:
Gen1 Operators
Legacy operators, broad compatibility
Gen2 Operators
Enhanced error recovery, state management, snapshots
Critical Rule
Graphs cannot mix Gen1 and Gen2 operators - choose one generation per graph.
Gen2 Advantages:
Automatic error recovery with snapshots
State management with periodic checkpoints
Native multiplexing (one-to-many, many-to-one)
Improved Python3 operator
Operators
Building blocks that process data within graphs. Each operator has:
Ports
Typed input/output connections for data flow
Configuration
Parameters that control behavior
Runtime
Engine that executes the operator
Operator Categories:
Messaging (Kafka, MQTT, NATS)
Storage (Files, HDFS, S3, Azure, GCS)
Database (HANA, SAP BW, SQL)
Script (Python, JavaScript, R, Go)
Data Processing (Transform, Anonymize, Validate)
Machine Learning (TensorFlow, PyTorch, HANA ML)
Integration (OData, REST, SAP CPI)
Workflow (Pipeline, Data Workflow)
Subengines
Subengines enable operators to run on different runtimes within the same graph.
Supported Subengines:
ABAP
For ABAP Pipeline Engine operators
Python 3.9
For Python-based operators
Node.js
For JavaScript-based operators
C++
For high-performance native operators
Key Benefit
Connected operators on the same subengine run in a single OS process for optimal performance.
Trade-off
Cross-engine communication requires serialization/deserialization overhead.
Quick Start Patterns
Basic Graph Creation
1. Open SAP Data Intelligence Modeler
2. Create new graph
3. Add operators from repository
4. Connect operator ports (matching types)
5. Configure operator parameters
6. Validate graph
7. Execute and monitor
Replication Flow Pattern
1. Create replication flow in Modeler
2. Configure source connection (ABAP, HANA, etc.)
3. Configure target (HANA Cloud, S3, Kafka, etc.)
4. Add tasks with source objects
5. Define filters and mappings
6. Validate flow
7. Deploy to tenant repository
8. Run and monitor
Delivery Guarantees:
Default: At-least-once (may have duplicates)
With UPSERT to databases: Exactly-once
For cloud storage: Use "Suppress Duplicates" option
ML Scenario Pattern
1. Open ML Scenario Manager from launchpad
2. Create new scenario
3. Add datasets (register data sources)
4. Create Jupyter notebooks for experiments
5. Build training pipelines
6. Track metrics with Metrics Explorer
7. Version scenario for reproducibility
8. Deploy model pipeline
Common Tasks
ABAP System Integration
For integrating ABAP-based SAP systems:
Prerequisites
Configure Cloud Connector for on-premise systems
Connection Setup
Create ABAP connection in Connection Management
Metadata Access
Use Metadata Explorer for object discovery
Data Sources
CDS Views, ODP (Operational Data Provisioning), Tables
Reference
See
references/abap-integration.md
for detailed setup.
Structured Data Processing
Use structured data operators for SQL-like transformations:
Data Transform
Visual SQL editor for complex transformations
Aggregation Node
GROUP BY with aggregation functions
Join Node
INNER, LEFT, RIGHT, FULL joins
Projection Node
Column selection and renaming
Union Node
Combine multiple datasets
Case Node
Conditional logic
Reference
See
references/structured-data-operators.md
for configuration.
Data Transformation Language
DTL provides SQL-like functions for data processing: