phoenix-evals

安装量: 105
排名: #8038

安装

npx skills add https://github.com/arize-ai/phoenix --skill phoenix-evals

Phoenix Evals

Build evaluators for AI/LLM applications. Code first, LLM for nuance, validate against humans.

Quick Reference Task Files Setup setup-python, setup-typescript Build code evaluator evaluators-code-{python|typescript} Build LLM evaluator evaluators-llm-{python|typescript}, evaluators-custom-templates Run experiment experiments-running-{python|typescript} Create dataset experiments-datasets-{python|typescript} Validate evaluator validation, validation-calibration-{python|typescript} Analyze errors error-analysis, axial-coding RAG evals evaluators-rag Production production-overview, production-guardrails Workflows

Starting Fresh: observe-tracing-setup → error-analysis → axial-coding → evaluators-overview

Building Evaluator: fundamentals → evaluators-{code|llm}-{python|typescript} → validation-calibration-{python|typescript}

RAG Systems: evaluators-rag → evaluators-code- (retrieval) → evaluators-llm- (faithfulness)

Production: production-overview → production-guardrails → production-continuous

Rule Categories Prefix Description fundamentals- Types, scores, anti-patterns observe- Tracing, sampling error-analysis- Finding failures axial-coding- Categorizing failures evaluators- Code, LLM, RAG evaluators experiments- Datasets, running experiments validation- Calibrating judges production- CI/CD, monitoring Key Principles Principle Action Error analysis first Can't automate what you haven't observed Custom > generic Build from your failures Code first Deterministic before LLM Validate judges >80% TPR/TNR Binary > Likert Pass/fail, not 1-5

返回排行榜