domain-ml

安装量: 412
排名: #2385

安装

npx skills add https://github.com/zhanghandong/rust-skills --skill domain-ml

Machine Learning Domain

Layer 3: Domain Constraints

Domain Constraints → Design Implications Domain Rule Design Constraint Rust Implication Large data Efficient memory Zero-copy, streaming GPU acceleration CUDA/Metal support candle, tch-rs Model portability Standard formats ONNX Batch processing Throughput over latency Batched inference Numerical precision Float handling ndarray, careful f32/f64 Reproducibility Deterministic Seeded random, versioning Critical Constraints Memory Efficiency RULE: Avoid copying large tensors WHY: Memory bandwidth is bottleneck RUST: References, views, in-place ops

GPU Utilization RULE: Batch operations for GPU efficiency WHY: GPU overhead per kernel launch RUST: Batch sizes, async data loading

Model Portability RULE: Use standard model formats WHY: Train in Python, deploy in Rust RUST: ONNX via tract or candle

Trace Down ↓

From constraints to design (Layer 2):

"Need efficient data pipelines" ↓ m10-performance: Streaming, batching ↓ polars: Lazy evaluation

"Need GPU inference" ↓ m07-concurrency: Async data loading ↓ candle/tch-rs: CUDA backend

"Need model loading" ↓ m12-lifecycle: Lazy init, caching ↓ tract: ONNX runtime

Use Case → Framework Use Case Recommended Why Inference only tract (ONNX) Lightweight, portable Training + inference candle, burn Pure Rust, GPU PyTorch models tch-rs Direct bindings Data pipelines polars Fast, lazy eval Key Crates Purpose Crate Tensors ndarray ONNX inference tract ML framework candle, burn PyTorch bindings tch-rs Data processing polars Embeddings fastembed Design Patterns Pattern Purpose Implementation Model loading Once, reuse OnceLock Batching Throughput Collect then process Streaming Large data Iterator-based GPU async Parallelism Data loading parallel to compute Code Pattern: Inference Server use std::sync::OnceLock; use tract_onnx::prelude::*;

static MODEL: OnceLock, Graph\>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) }

async fn predict(input: Vec) -> anyhow::Result> { let model = get_model(); let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?; let result = model.run(tvec!(input.into()))?; Ok(result[0].to_array_view::()?.iter().copied().collect()) }

Code Pattern: Batched Inference async fn batch_predict(inputs: Vec>, batch_size: usize) -> Vec> { let mut results = Vec::with_capacity(inputs.len());

for batch in inputs.chunks(batch_size) {
    // Stack inputs into batch tensor
    let batch_tensor = stack_inputs(batch);

    // Run inference on batch
    let batch_output = model.run(batch_tensor).await;

    // Unstack results
    results.extend(unstack_outputs(batch_output));
}

results

}

Common Mistakes Mistake Domain Violation Fix Clone tensors Memory waste Use views Single inference GPU underutilized Batch processing Load model per request Slow Singleton pattern Sync data loading GPU idle Async pipeline Trace to Layer 1 Constraint Layer 2 Pattern Layer 1 Implementation Memory efficiency Zero-copy ndarray views Model singleton Lazy init OnceLock Batch processing Chunked iteration chunks() + parallel GPU async Concurrent loading tokio::spawn + GPU Related Skills When See Performance m10-performance Lazy initialization m12-lifecycle Async patterns m07-concurrency Memory efficiency m01-ownership

返回排行榜