Machine Learning Domain
Layer 3: Domain Constraints
Domain Constraints → Design Implications Domain Rule Design Constraint Rust Implication Large data Efficient memory Zero-copy, streaming GPU acceleration CUDA/Metal support candle, tch-rs Model portability Standard formats ONNX Batch processing Throughput over latency Batched inference Numerical precision Float handling ndarray, careful f32/f64 Reproducibility Deterministic Seeded random, versioning Critical Constraints Memory Efficiency RULE: Avoid copying large tensors WHY: Memory bandwidth is bottleneck RUST: References, views, in-place ops
GPU Utilization RULE: Batch operations for GPU efficiency WHY: GPU overhead per kernel launch RUST: Batch sizes, async data loading
Model Portability RULE: Use standard model formats WHY: Train in Python, deploy in Rust RUST: ONNX via tract or candle
Trace Down ↓
From constraints to design (Layer 2):
"Need efficient data pipelines" ↓ m10-performance: Streaming, batching ↓ polars: Lazy evaluation
"Need GPU inference" ↓ m07-concurrency: Async data loading ↓ candle/tch-rs: CUDA backend
"Need model loading" ↓ m12-lifecycle: Lazy init, caching ↓ tract: ONNX runtime
Use Case → Framework
Use Case Recommended Why
Inference only tract (ONNX) Lightweight, portable
Training + inference candle, burn Pure Rust, GPU
PyTorch models tch-rs Direct bindings
Data pipelines polars Fast, lazy eval
Key Crates
Purpose Crate
Tensors ndarray
ONNX inference tract
ML framework candle, burn
PyTorch bindings tch-rs
Data processing polars
Embeddings fastembed
Design Patterns
Pattern Purpose Implementation
Model loading Once, reuse OnceLock
static MODEL: OnceLock
fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) }
async fn predict(input: Vec
Code Pattern: Batched Inference
async fn batch_predict(inputs: Vec
for batch in inputs.chunks(batch_size) {
// Stack inputs into batch tensor
let batch_tensor = stack_inputs(batch);
// Run inference on batch
let batch_output = model.run(batch_tensor).await;
// Unstack results
results.extend(unstack_outputs(batch_output));
}
results
}
Common Mistakes Mistake Domain Violation Fix Clone tensors Memory waste Use views Single inference GPU underutilized Batch processing Load model per request Slow Singleton pattern Sync data loading GPU idle Async pipeline Trace to Layer 1 Constraint Layer 2 Pattern Layer 1 Implementation Memory efficiency Zero-copy ndarray views Model singleton Lazy init OnceLock Batch processing Chunked iteration chunks() + parallel GPU async Concurrent loading tokio::spawn + GPU Related Skills When See Performance m10-performance Lazy initialization m12-lifecycle Async patterns m07-concurrency Memory efficiency m01-ownership