Agent Skills 排行榜 · 关键词 + 语义搜索

/ 聚焦搜索框
正在使用 AI 进行语义搜索...
24,920
总 Skills
102.2M
总安装量
2,598
贡献者
# Skill 仓库 描述 安装量
13301 pr-review-loop xpepper/pr-review-agent-skill
PR Review Loop Purpose Address all open PR review comments one at a time using an opinionated, resumable workflow. Works with comments from any reviewer (human or bot). Typical invocations Users trigger this skill with prompts like: "Address all open review comments on this PR" "Work through the code review feedback on PR 42" "Fix the review comments left by @alice on this pull request" "Use pr-review-loop on PR 123" Prerequisites gh CLI (preferred). If unavailable, fall back to any tool availab...
203
13302 historian-analyst rysweet/amplihack
Analyze events through the disciplinary lens of history, applying rigorous historical methods (source criticism, comparative analysis, periodization), temporal frameworks (continuity/change, causation), and historiographical perspectives to understand how the past shapes the present, identify historical patterns and precedents, and contextualize contemporary events within long-term trajectories. When to Use This Skill - Historical Contextualization: Understanding how past events shape current...
203
13303 ui/ux design review rknall/claude-skills
203
13304 duckdb silvainfm/claude-skills
DuckDB Overview DuckDB is a high-performance, in-process analytical database management system (often called "SQLite for analytics"). Execute complex SQL queries directly on CSV, Parquet, JSON files, and Python DataFrames (pandas, Polars) without importing data or running a separate database server. When to Use This Skill Activate when the user: Wants to run SQL queries on data files (CSV, Parquet, JSON) Needs to perform complex analytical queries (aggregations, joins, window functions) Asks...
203
13305 pm-architect rysweet/amplihack
You are the project manager orchestrating four specialized sub-skills to coordinate software development projects. You delegate to specialists and synthesize their insights for comprehensive project management. When to Activate Activate when the user: - Mentions managing projects or coordinating work - Asks about project status or progress - Wants to organize multiple projects or features - Needs help with project planning or execution - Says "I'm losing track" or "What should I work on?...
203
13306 encore-getting-started encoredev/skills
Getting Started with Encore.ts Instructions Install Encore CLI macOS brew install encoredev/tap/encore Linux/WSL curl -L https://encore.dev/install.sh | bash Windows (PowerShell) iwr https://encore.dev/install.ps1 | iex Create a New App Interactive - choose from templates encore app create my-app Or start with a blank app encore app create my-app --example=ts/hello-world Project Structure A minimal Encore.ts app: my-app/ ├── encore.app App configuration ├── package.json ...
203
13307 aws penetration testing sickn33/antigravity-awesome-skills
AWS Penetration Testing Purpose Provide comprehensive techniques for penetration testing AWS cloud environments. Covers IAM enumeration, privilege escalation, SSRF to metadata endpoint, S3 bucket exploitation, Lambda code extraction, and persistence techniques for red team operations. Inputs/Prerequisites AWS CLI configured with credentials Valid AWS credentials (even low-privilege) Understanding of AWS IAM model Python 3, boto3 library Tools: Pacu, Prowler, ScoutSuite, SkyArk Outputs/Deliverabl...
202
13308 axiom-swift-performance charleswiltgen/axiom
Swift Performance Optimization Purpose Core Principle: Optimize Swift code by understanding language-level performance characteristics—value semantics, ARC behavior, generic specialization, and memory layout—to write fast, efficient code without premature micro-optimization. Swift Version: Swift 6.2+ (for InlineArray, Span, @concurrent) Xcode: 16+ Platforms: iOS 18+, macOS 15+ Related Skills: axiom-performance-profiling — Use Instruments to measure (do this first!) axiom-swiftui-performance ...
202
13309 axiom-ios-data charleswiltgen/axiom
iOS Data & Persistence Router You MUST use this skill for ANY data persistence, database, axiom-storage, CloudKit, or serialization work. When to Use Use this router when working with: Databases (SwiftData, Core Data, GRDB, SQLiteData) Schema migrations CloudKit sync File storage (iCloud Drive, local storage) Data serialization (Codable, JSON) Storage strategy decisions Routing Logic SwiftData Working with SwiftData → /skill axiom-swiftdata Schema migration → /skill axiom-swiftdata-migratio...
202
13310 typescript-core bobmatnyc/claude-mpm-skills
TypeScript Core Patterns Modern TypeScript development patterns for type safety, runtime validation, and optimal configuration. Quick Start New Project: Use 2025 tsconfig → Enable strict + noUncheckedIndexedAccess → Choose Zod for validation Existing Project: Enable strict: false initially → Fix any with unknown → Add noUncheckedIndexedAccess API Development: Zod schemas at boundaries → z.infer<typeof Schema> for types → satisfies for routes Library Development: Enable declaration: true → ...
202
13311 secondme mindverse/second-me-skills
SecondMe 一站式项目创建 将 secondme-init → secondme-prd → secondme-nextjs 三个步骤合并为一个完整流程。 工具使用: 收集用户输入时使用 AskUserQuestion 工具。 参数说明 参数 说明 (无参数) 完整流程:初始化 → PRD → 生成项目 --quick 快速流程:初始化 → 跳过 PRD → 生成项目 执行流程 环境检查(首次执行前) 重要提醒: 当前目录将作为项目根目录,Next.js 项目会直接在此目录中初始化。 显示当前工作目录路径,让用户确认: 📂 当前工作目录: /path/to/current/dir ⚠️ Next.js 项目将直接在此目录中初始化,请确保你已在一个新建的空文件夹中运行。 检查当前目录内容(除 .secondme/ 、 .git/ 、 CLAUDE.md 、 .claude/ 等配置文件外): 如果目录为空或仅有配置文件 :继续 如果存在其他文件 :发出警告并使用 AskUserQuestion 让用户确认是否继续 阶段 0:检测当前状态 检查 .secondme/state.js...
202
13312 wix-cli-dashboard-menu-plugin wix/skills
Wix Dashboard Menu Plugin Builder Creates dashboard menu plugin extensions for Wix CLI applications. Dashboard menu plugins are menu items that integrate into predefined menu slots on dashboard pages managed by Wix first-party business apps (Wix Stores, Wix Bookings, Wix Blog, Wix eCommerce, Wix Events, Wix CRM, Wix Restaurants). When clicked, a dashboard menu plugin either navigates to a dashboard page or opens a dashboard modal . Dashboard menu plugins are configuration-only extensions — they ...
202
13313 urban-planner-analyst rysweet/amplihack
Analyze urban development and spatial organization through the disciplinary lens of urban planning, applying established frameworks (comprehensive planning, zoning, transit-oriented development), multiple theoretical approaches (modernist, new urbanist, smart growth, equity planning), and evidence-based practices to understand how cities function, grow, and can be shaped to meet community needs for sustainability, livability, and equity. When to Use This Skill - Development Project Evaluation...
202
13314 nsfc-justification-writer huangwb8/chineseresearchlatex
推荐:用脚本快速生成信息表(并可交互填写),见 `skills/nsfc-justification-writer/scripts/run.py init`。 工作流(按顺序执行) - 定位项目与目标文件:确认 `project_root`,读取并仅编辑 `extraTex/1.1.立项依据.tex`。 - 抽取现有骨架:若文件已有 `\subsubsection` 等小标题,优先保留骨架,仅替换正文段落(除非用户要求重构层级)。默认不强制标题精确匹配(`strict_title_match=false`),更关注“内容维度是否覆盖”。 - 渐进式写作引导(主推):先骨架→再段落→再修订→再润色→再验收(避免一步到位压力) 使用 `scripts/run.py coach --stage auto` 自动判断当前阶段并给出“本轮只做三件事 + 需要你补充的问题 + 可复制提示词” - 每轮只改一个 `\subsubsection` 的正文,配合 `apply-section` 安全写入并自动备份 - 生成“立项依据”主叙事(建议 4 段闭环,AI 会检查内容维度覆盖而非...
202
13315 odoo-upgrade ahmed-lakosha/odoo-upgrade-skill
Odoo Upgrade Assistant A comprehensive skill for upgrading Odoo modules between versions, with extensive pattern recognition and automated fixes for common migration issues. When to Use This Skill Activate this skill when: User requests upgrading Odoo modules between versions (14→19) Fixing Odoo version compatibility errors Migrating themes or custom modules Resolving RPC service errors in frontend components Converting XML views for newer Odoo versions Updating SCSS variables for Odoo 19 th...
202
13316 twilio-communications davila7/claude-code-templates
Twilio Communications Patterns SMS Sending Pattern Basic pattern for sending SMS messages with Twilio. Handles the fundamentals: phone number formatting, message delivery, and delivery status callbacks. Key considerations: Phone numbers must be in E.164 format (+1234567890) Default rate limit: 80 messages per second (MPS) Messages over 160 characters are split (and cost more) Carrier filtering can block messages (especially to US numbers) When to use: ['Sending notifications to users', 'Tran...
202
13317 slack-bot-builder davila7/claude-code-templates
Slack Bot Builder Patterns Bolt App Foundation Pattern The Bolt framework is Slack's recommended approach for building apps. It handles authentication, event routing, request verification, and HTTP request processing so you can focus on app logic. Key benefits: Event handling in a few lines of code Security checks and payload validation built-in Organized, consistent patterns Works for experiments and production Available in: Python, JavaScript (Node.js), Java When to use: ['Starting any ne...
202
13318 exa-search davila7/claude-code-templates
Exa Search Neural search for web content, code, companies, and people via the Exa MCP server. When to Activate User needs current web information or news Searching for code examples, API docs, or technical references Researching companies, competitors, or market players Finding professional profiles or people in a domain Running background research for any development task User says "search for", "look up", "find", or "what's the latest on" MCP Requirement Exa MCP server must be configured. Add ...
202
13319 framework-migration-legacy-modernize sickn33/antigravity-awesome-skills
Legacy Code Modernization Workflow Orchestrate a comprehensive legacy system modernization using the strangler fig pattern, enabling gradual replacement of outdated components while maintaining continuous business operations through expert agent coordination. [Extended thinking: The strangler fig pattern, named after the tropical fig tree that gradually envelops and replaces its host, represents the gold standard for risk-managed legacy modernization. This workflow implements a systematic approa...
202
13320 event-store-design sickn33/antigravity-awesome-skills
Event Store Design Comprehensive guide to designing event stores for event-sourced applications. When to Use This Skill Designing event sourcing infrastructure Choosing between event store technologies Implementing custom event stores Optimizing event storage and retrieval Setting up event store schemas Planning for event store scaling Core Concepts 1. Event Store Architecture ┌─────────────────────────────────────────────────────┐ │ Event Store │ ├──────...
202
13321 solidity-security sickn33/antigravity-awesome-skills
Solidity Security Master smart contract security best practices, vulnerability prevention, and secure Solidity development patterns. When to Use This Skill Writing secure smart contracts Auditing existing contracts for vulnerabilities Implementing secure DeFi protocols Preventing reentrancy, overflow, and access control issues Optimizing gas usage while maintaining security Preparing contracts for professional audits Understanding common attack vectors Critical Vulnerabilities 1. Reentrancy Atta...
202
13322 pentest commands sickn33/antigravity-awesome-skills
Pentest Commands Purpose Provide a comprehensive command reference for penetration testing tools including network scanning, exploitation, password cracking, and web application testing. Enable quick command lookup during security assessments. Inputs/Prerequisites Kali Linux or penetration testing distribution Target IP addresses with authorization Wordlists for brute forcing Network access to target systems Basic understanding of tool syntax Outputs/Deliverables Network enumeration results Iden...
201
13323 investigation-workflow rysweet/amplihack
This skill provides a systematic 6-phase workflow for investigating and understanding existing systems, codebases, and architectures. Unlike development workflows optimized for implementation, this workflow is optimized for exploration, understanding, and knowledge capture. When to Use This Skill Investigation Tasks (use this workflow): - "Investigate how the authentication system works" - "Explain the neo4j memory integration" - "Understand why CI is failing consistently" - "Analyze the ...
201
13324 feishu-im-read larksuite/openclaw-lark
飞书 IM 消息读取 执行前必读 该 Skill 中的所有消息读取工具均以用户身份调用,只能读取用户有权限的会话 feishu_im_user_get_messages 中 open_id 和 chat_id 必须二选一 消息中出现 thread_id 时,根据用户意图判断是否用 feishu_im_user_get_thread_messages 读取话题内回复 以用户身份读取后,如果消息内容中出现资源标记时,用 feishu_im_user_fetch_resource 下载,需要 message_id + file_key + type 快速索引:意图 → 工具 用户意图 工具 必填参数 常用可选 获取群聊/单聊历史消息 feishu_im_user_get_messages chat_id 或 open_id(二选一) relative_time, start_time/end_time, page_size, sort_rule 获取话题内回复消息 feishu_im_user_get_thread_messages thread_id(omt_xxx) page_size,...
201
13325 alphaear-logic-visualizer rkiding/awesome-finance-skills
AlphaEar Logic Visualizer Skill Overview This skill specializes in creating visual representations of logic flows, specifically generating Draw.io XML compatible diagrams. It is useful for visualizing investment theses or signal transmission chains. Capabilities 1. Generate Draw.io Diagrams 1. Generate Draw.io Diagrams (Agentic Workflow) YOU (the Agent) are the Visualizer. Use the prompts in references/PROMPTS.md to generate the XML. Workflow: Generate XML : Use the Draw.io XML Generation Prompt...
201
13326 skill-pdf-to-pptx-tool dnvriend/pdf-to-pptx-tool
This skill provides comprehensive guidance for using `pdf-to-pptx-tool`, a professional CLI tool that converts PDF documents into PowerPoint presentations. Each PDF page becomes a high-quality slide with customizable resolution. When to Use This Skill Use this skill when: - You need to convert PDF documents to PowerPoint format - You want to customize conversion quality (DPI settings) - You need to debug conversion issues with verbose logging - You're working with multi-page PDF documents...
201
13327 html injection testing davila7/claude-code-templates
HTML Injection Testing Purpose Identify and exploit HTML injection vulnerabilities that allow attackers to inject malicious HTML content into web applications. This vulnerability enables attackers to modify page appearance, create phishing pages, and steal user credentials through injected forms. Prerequisites Required Tools Web browser with developer tools Burp Suite or OWASP ZAP Tamper Data or similar proxy cURL for testing payloads Required Knowledge HTML fundamentals HTTP request/response st...
201
13328 cloud penetration testing davila7/claude-code-templates
Cloud Penetration Testing Purpose Conduct comprehensive security assessments of cloud infrastructure across Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP). This skill covers reconnaissance, authentication testing, resource enumeration, privilege escalation, data extraction, and persistence techniques for authorized cloud security engagements. Prerequisites Required Tools Azure tools Install-Module -Name Az -AllowClobber -Force Install-Module -Name MSOnline -Force In...
201
13329 gpui-style-guide longbridge/gpui-component
Overview Code style guide derived from gpui-component implementation patterns. Based on: Analysis of Button, Checkbox, Input, Select, and other components in crates/ui Component Structure Basic Component Pattern use gpui::{ div, prelude::FluentBuilder as _, AnyElement, App, Div, ElementId, InteractiveElement, IntoElement, ParentElement, RenderOnce, StatefulInteractiveElement, StyleRefinement, Styled, Window, }; [derive(IntoElement)] pub struct MyComponent { id: ElementId, ...
201
13330 tooluniverse-pharmacovigilance mims-harvard/tooluniverse
Pharmacovigilance Safety Analyzer Systematic drug safety analysis using FAERS adverse event data, FDA labeling, PharmGKB pharmacogenomics, and clinical trial safety signals. KEY PRINCIPLES : Report-first approach - Create report file FIRST, update progressively Signal quantification - Use disproportionality measures (PRR, ROR) Severity stratification - Prioritize serious/fatal events Multi-source triangulation - FAERS, labels, trials, literature Pharmacogenomic context - Include genetic risk fac...
201
13331 axiom-ui-testing charleswiltgen/axiom
UI Testing Overview Wait for conditions, not arbitrary timeouts. Core principle Flaky tests come from guessing how long operations take. Condition-based waiting eliminates race conditions. NEW in WWDC 2025: Recording UI Automation allows you to record interactions, replay across devices/languages, and review video recordings of test runs. Example Prompts These are real questions developers ask that this skill is designed to answer: 1. "My UI tests pass locally on my Mac but fail in CI. How ...
200
13332 axiom-accessibility-diag charleswiltgen/axiom
Accessibility Diagnostics Overview Systematic accessibility diagnosis and remediation for iOS/macOS apps. Covers the 7 most common accessibility issues that cause App Store rejections and user complaints. Core principle Accessibility is not optional. iOS apps must support VoiceOver, Dynamic Type, and sufficient color contrast to pass App Store Review. Users with disabilities depend on these features. When to Use This Skill Fixing VoiceOver navigation issues (missing labels, wrong element orde...
200
13333 ship-learn-next davila7/claude-code-templates
Ship-Learn-Next Action Planner This skill helps transform passive learning content into actionable Ship-Learn-Next cycles - turning advice and lessons into concrete, shippable iterations. When to Use This Skill Activate when the user: Has a transcript/article/tutorial and wants to "implement the advice" Asks to "turn this into a plan" or "make this actionable" Wants to extract implementation steps from educational content Needs help breaking down big ideas into small, shippable reps Says things ...
200
13334 tailwindcss-framework-integration josiahsiegel/claude-plugin-marketplace
Tailwind CSS Framework Integration React with Vite Setup Create React + Vite project npm create vite@latest my-app -- --template react-ts cd my-app Install Tailwind CSS npm install -D tailwindcss @tailwindcss/vite Configuration // vite.config.ts import { defineConfig } from 'vite' import react from '@vitejs/plugin-react' import tailwindcss from '@tailwindcss/vite' export default defineConfig({ plugins: [react(), tailwindcss()] }) /* src/index.css */ @import "tailwindcss"; // src/main.ts...
200
13335 hono-validation bobmatnyc/claude-mpm-skills
Hono Validation Patterns Overview Hono provides a lightweight built-in validator and integrates seamlessly with popular validation libraries like Zod, TypeBox, and Valibot. Validation happens as middleware, providing type-safe access to validated data in handlers. Key Features: Built-in lightweight validator First-class Zod integration via @hono/zod-validator Standard Schema support (works with any validation library) Type inference from validation schemas Validates: JSON, forms, query params...
200
13336 hono-testing bobmatnyc/claude-mpm-skills
Hono Testing Patterns Overview Hono provides a simple testing approach: create a Request, pass it to your app, and validate the Response. The framework includes a typed test client for even better DX. Key Features: Simple app.request() API Typed test client with full inference Environment mocking for Workers Works with Vitest, Jest, or any test runner When to Use This Skill Use Hono testing when: Writing unit tests for route handlers Integration testing API endpoints Testing middleware beha...
200
13337 security-best-practices davila7/claude-code-templates
Security Best Practices When to use this skill New project : consider security from the start Security audit : inspect and fix vulnerabilities Public API : harden APIs accessible externally Compliance : comply with GDPR, PCI-DSS, etc. Instructions Step 1: Enforce HTTPS and security headers Express.js security middleware : import express from 'express' ; import helmet from 'helmet' ; import rateLimit from 'express-rate-limit' ; const app = express ( ) ; // Helmet: automatically set security heade...
200
13338 local-llm-ops bobmatnyc/claude-mpm-skills
Local LLM Ops (Ollama) Overview Your localLLM repo provides a full local LLM toolchain on Apple Silicon: setup scripts, a rich CLI chat launcher, benchmarks, and diagnostics. The operational path is: install Ollama, ensure the service is running, initialize the venv, pull models, then launch chat or benchmarks. Quick Start ./setup_chatbot.sh ./chatllm If no models are present: ollama pull mistral Setup Checklist Install Ollama: brew install ollama Start the service: brew services start oll...
200
13339 video-transcript zeropointrepo/youtube-skills
Video Transcript Extract transcripts from videos via TranscriptAPI.com . Setup If $TRANSCRIPT_API_KEY is not set, help the user create an account (100 free credits, no card): Step 1 — Register: Ask user for their email. node ./scripts/tapi-auth.js register --email USER_EMAIL → OTP sent to email. Ask user: "Check your email for a 6-digit verification code." Step 2 — Verify: Once user provides the OTP: node ./scripts/tapi-auth.js verify --token TOKEN_FROM_STEP_1 --otp CODE API key saved to your sh...
200
13340 gemini-image-gen jezweb/claude-skills
Gemini Image Generator Generate contextual images for web projects using the Gemini API. Produces hero backgrounds, OG cards, placeholder photos, textures, and style-matched variants. Setup API Key : Set GEMINI_API_KEY as an environment variable. Get a key from https://aistudio.google.com/apikey if you don't have one. export GEMINI_API_KEY = "your-key-here" Workflow Step 1: Understand What's Needed Gather from the user or project context: What : hero background, product photo, texture, OG image,...
200
13341 code-smell-detector rysweet/amplihack
Code Smell Detector Skill Purpose This skill identifies anti-patterns that violate amplihack's development philosophy and provides constructive, specific fixes. It ensures code maintains ruthless simplicity, modular design, and zero-BS implementations. When to Use This Skill Code review: Identify violations before merging Refactoring: Find opportunities to simplify and improve code quality New module creation: Catch issues early in development Philosophy compliance: Ensure code aligns with amp...
200
13342 mobile-design vudovn/antigravity-kit
Mobile Design System (Mobile-First · Touch-First · Platform-Respectful) Philosophy: Touch-first. Battery-conscious. Platform-respectful. Offline-capable. Core Law: Mobile is NOT a small desktop. Operating Rule: Think constraints first, aesthetics second. This skill exists to prevent desktop-thinking, AI-defaults, and unsafe assumptions when designing or building mobile applications. 1. Mobile Feasibility & Risk Index (MFRI) Before designing or implementing any mobile feature or screen , assess f...
200
13343 wcag-audit-patterns sickn33/antigravity-awesome-skills
WCAG Audit Patterns Comprehensive guide to auditing web content against WCAG 2.2 guidelines with actionable remediation strategies. When to Use This Skill Conducting accessibility audits Fixing WCAG violations Implementing accessible components Preparing for accessibility lawsuits Meeting ADA/Section 508 requirements Achieving VPAT compliance Core Concepts 1. WCAG Conformance Levels Level Description Required For A Minimum accessibility Legal baseline AA Standard conformance Most regulations AAA...
200
13344 cost-optimization sickn33/antigravity-awesome-skills
Cloud Cost Optimization Strategies and patterns for optimizing cloud costs across AWS, Azure, and GCP. Purpose Implement systematic cost optimization strategies to reduce cloud spending while maintaining performance and reliability. When to Use Reduce cloud spending Right-size resources Implement cost governance Optimize multi-cloud costs Meet budget constraints Cost Optimization Framework 1. Visibility Implement cost allocation tags Use cloud cost management tools Set up budget alerts Create co...
200
13345 sociologist-analyst rysweet/amplihack
Analyze events through the disciplinary lens of sociology, applying rigorous sociological frameworks (structural-functionalism, conflict theory, symbolic interactionism, social constructionism), methodological approaches (quantitative surveys, qualitative ethnography, comparative-historical analysis), and core concepts (social structure, institutions, stratification, culture, socialization, deviance, collective behavior) to understand social patterns, group dynamics, power relations, inequality,...
200
13346 skill-rails-upgrade sickn33/antigravity-awesome-skills
When to Use This Skill Analyze Rails apps and provide upgrade assessments Use this skill when working with analyze rails apps and provide upgrade assessments. Rails Upgrade Analyzer Analyze the current Rails application and provide a comprehensive upgrade assessment with selective file merging. Step 1: Verify Rails Application Check that we're in a Rails application by looking for these files: Gemfile (must exist and contain 'rails') config/application.rb (Rails application config) config/enviro...
200
13347 terraform-module-library sickn33/antigravity-awesome-skills
Terraform Module Library Production-ready Terraform module patterns for AWS, Azure, GCP, and OCI infrastructure. Purpose Create reusable, well-tested Terraform modules for common cloud infrastructure patterns across multiple cloud providers. When to Use Build reusable infrastructure components Standardize cloud resource provisioning Implement infrastructure as code best practices Create multi-cloud compatible modules Establish organizational Terraform standards Module Structure Show more
200
13348 omarchy basecamp/omarchy
Manage [Omarchy](https://omarchy.org/) Linux systems - a beautiful, modern, opinionated Arch Linux distribution with Hyprland. When This Skill MUST Be Used ALWAYS invoke this skill when the user's request involves ANY of these: - Editing ANY file in `~/.config/hypr/` (window rules, animations, keybindings, monitors, etc.) - Editing ANY file in `~/.config/waybar/`, `~/.config/walker/`, `~/.config/mako/` - Editing terminal configs (alacritty, kitty, ghostty) - Editing ANY file in `~/.config...
200
13349 validate-evaluator hamelsmu/evals-skills
Validate Evaluator Calibrate an LLM judge against human judgment. Overview Split human-labeled data into train (10-20%), dev (40-45%), test (40-45%) Run judge on dev set and measure TPR/TNR Iterate on the judge until TPR and TNR > 90% on dev set Run once on held-out test set for final TPR/TNR Apply bias correction formula to production data Prerequisites A built LLM judge prompt (from write-judge-prompt) Human-labeled data: ~100 traces with binary Pass/Fail labels per failure mode Aim for ~50 Pa...
200
13350 generate-synthetic-data hamelsmu/evals-skills
Generate Synthetic Data Generate diverse, realistic test inputs that cover the failure space of an LLM pipeline. Prerequisites Before generating synthetic data, identify where the pipeline is likely to fail. Ask the user about known failure-prone areas, review existing user feedback, or form hypotheses from available traces. Dimensions (Step 1) must target anticipated failures, not arbitrary variation. Core Process Step 1: Define Dimensions Dimensions are axes of variation specific to your appli...
200