Outlines: Structured Text Generation When to Use This Skill

Use Outlines when you need to:

Guarantee valid JSON/XML/code structure during generation Use Pydantic models for type-safe outputs Support local models (Transformers, llama.cpp, vLLM) Maximize inference speed with zero-overhead structured generation Generate against JSON schemas automatically Control token sampling at the grammar level

GitHub Stars: 8,000+ | From: dottxt.ai (formerly .txt)

Installation

Base installation

pip install outlines

With specific backends

pip install outlines transformers # Hugging Face models pip install outlines llama-cpp-python # llama.cpp pip install outlines vllm # vLLM for high-throughput

Quick Start Basic Example: Classification import outlines from typing import Literal

Load model

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate with type constraint

prompt = "Sentiment of 'This product is amazing!': " generator = outlines.generate.choice(model, ["positive", "negative", "neutral"]) sentiment = generator(prompt)

print(sentiment) # "positive" (guaranteed one of these)

With Pydantic Models from pydantic import BaseModel import outlines

class User(BaseModel): name: str age: int email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Generate structured output

prompt = "Extract user: John Doe, 30 years old, john@example.com" generator = outlines.generate.json(model, User) user = generator(prompt)

print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com"

Core Concepts 1. Constrained Token Sampling

Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.

How it works:

Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG) Transform CFG into Finite State Machine (FSM) Filter invalid tokens at each step during generation Fast-forward when only one valid token exists

Benefits:

Zero overhead: Filtering happens at token level Speed improvement: Fast-forward through deterministic paths Guaranteed validity: Invalid outputs impossible import outlines

Pydantic model -> JSON schema -> CFG -> FSM

class Person(BaseModel): name: str age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Behind the scenes:

1. Person -> JSON schema

2. JSON schema -> CFG

3. CFG -> FSM

4. FSM filters tokens during generation

generator = outlines.generate.json(model, Person) result = generator("Generate person: Alice, 25")

Structured Generators

Outlines provides specialized generators for different output types.

Choice Generator

Multiple choice selection

generator = outlines.generate.choice( model, ["positive", "negative", "neutral"] )

sentiment = generator("Review: This is great!")

Result: One of the three choices

JSON Generator from pydantic import BaseModel

class Product(BaseModel): name: str price: float in_stock: bool

Generate valid JSON matching schema

generator = outlines.generate.json(model, Product) product = generator("Extract: iPhone 15, $999, available")

Guaranteed valid Product instance

print(type(product)) #

Regex Generator

Generate text matching regex

generator = outlines.generate.regex( model, r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern )

phone = generator("Generate phone number:")

Result: "555-123-4567" (guaranteed to match pattern)

Integer/Float Generators

Generate specific numeric types

int_generator = outlines.generate.integer(model) age = int_generator("Person's age:") # Guaranteed integer

float_generator = outlines.generate.float(model) price = float_generator("Product price:") # Guaranteed float

Model Backends

Outlines supports multiple local and API-based backends.

Transformers (Hugging Face) import outlines

Load from Hugging Face

model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda" # Or "cpu" )

Use with any generator

generator = outlines.generate.json(model, YourModel)

llama.cpp

Load GGUF model

model = outlines.models.llamacpp( "./models/llama-3.1-8b-instruct.Q4_K_M.gguf", n_gpu_layers=35 )

generator = outlines.generate.json(model, YourModel)

vLLM (High Throughput)

For production deployments

model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", tensor_parallel_size=2 # Multi-GPU )

generator = outlines.generate.json(model, YourModel)

OpenAI (Limited Support)

Basic OpenAI support

model = outlines.models.openai( "gpt-4o-mini", api_key="your-api-key" )

Note: Some features limited with API models

generator = outlines.generate.json(model, YourModel)

Pydantic Integration

Outlines has first-class Pydantic support with automatic schema translation.

Basic Models from pydantic import BaseModel, Field

class Article(BaseModel): title: str = Field(description="Article title") author: str = Field(description="Author name") word_count: int = Field(description="Number of words", gt=0) tags: list[str] = Field(description="List of tags")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI") print(article.title) print(article.word_count) # Guaranteed > 0

Nested Models class Address(BaseModel): street: str city: str country: str

class Person(BaseModel): name: str age: int address: Address # Nested model

generator = outlines.generate.json(model, Person) person = generator("Generate person in New York")

print(person.address.city) # "New York"

Enums and Literals from enum import Enum from typing import Literal

class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected"

class Application(BaseModel): applicant: str status: Status # Must be one of enum values priority: Literal["low", "medium", "high"] # Must be one of literals

generator = outlines.generate.json(model, Application) app = generator("Generate application")

print(app.status) # Status.PENDING (or APPROVED/REJECTED)

Common Patterns Pattern 1: Data Extraction from pydantic import BaseModel import outlines

class CompanyInfo(BaseModel): name: str founded_year: int industry: str employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, CompanyInfo)

text = """ Apple Inc. was founded in 1976 in the technology industry. The company employs approximately 164,000 people worldwide. """

prompt = f"Extract company information:\n{text}\n\nCompany:" company = generator(prompt)

print(f"Name: {company.name}") print(f"Founded: {company.founded_year}") print(f"Industry: {company.industry}") print(f"Employees: {company.employees}")

Pattern 2: Classification from typing import Literal import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

Binary classification

generator = outlines.generate.choice(model, ["spam", "not_spam"]) result = generator("Email: Buy now! 50% off!")

Multi-class classification

categories = ["technology", "business", "sports", "entertainment"] category_gen = outlines.generate.choice(model, categories) category = category_gen("Article: Apple announces new iPhone...")

With confidence

class Classification(BaseModel): label: Literal["positive", "negative", "neutral"] confidence: float

classifier = outlines.generate.json(model, Classification) result = classifier("Review: This product is okay, nothing special")

Pattern 3: Structured Forms class UserProfile(BaseModel): full_name: str age: int email: str phone: str country: str interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, UserProfile)

prompt = """ Extract user profile from: Name: Alice Johnson Age: 28 Email: alice@example.com Phone: 555-0123 Country: USA Interests: hiking, photography, cooking """

profile = generator(prompt) print(profile.full_name) print(profile.interests) # ["hiking", "photography", "cooking"]

Pattern 4: Multi-Entity Extraction class Entity(BaseModel): name: str type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel): entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond." prompt = f"Extract entities from: {text}"

result = generator(prompt) for entity in result.entities: print(f"{entity.name} ({entity.type})")

Pattern 5: Code Generation class PythonFunction(BaseModel): function_name: str parameters: list[str] docstring: str body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial" func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):") print(f' """{func.docstring}"""') print(f" {func.body}")

Pattern 6: Batch Processing def batch_extract(texts: list[str], schema: type[BaseModel]): """Extract structured data from multiple texts.""" model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, schema)

results = []
for text in texts:
    result = generator(f"Extract from: {text}")
    results.append(result)

return results

class Person(BaseModel): name: str age: int

texts = [ "John is 30 years old", "Alice is 25 years old", "Bob is 40 years old" ]

people = batch_extract(texts, Person) for person in people: print(f"{person.name}: {person.age}")

Backend Configuration Transformers import outlines

Basic usage

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

GPU configuration

model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda", model_kwargs={"torch_dtype": "float16"} )

Popular models

model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct") model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3") model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

Load GGUF model

model = outlines.models.llamacpp( "./models/llama-3.1-8b.Q4_K_M.gguf", n_ctx=4096, # Context window n_gpu_layers=35, # GPU layers n_threads=8 # CPU threads )

Full GPU offload

model = outlines.models.llamacpp( "./models/model.gguf", n_gpu_layers=-1 # All layers on GPU )

vLLM (Production)

Single GPU

model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

Multi-GPU

model = outlines.models.vllm( "meta-llama/Llama-3.1-70B-Instruct", tensor_parallel_size=4 # 4 GPUs )

With quantization

model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", quantization="awq" # Or "gptq" )

Best Practices 1. Use Specific Types

✅ Good: Specific types

class Product(BaseModel): name: str price: float # Not str quantity: int # Not str in_stock: bool # Not str

❌ Bad: Everything as string

class Product(BaseModel): name: str price: str # Should be float quantity: str # Should be int

Add Constraints from pydantic import Field

✅ Good: With constraints

class User(BaseModel): name: str = Field(min_length=1, max_length=100) age: int = Field(ge=0, le=120) email: str = Field(pattern=r"^[\w.-]+@[\w.-]+.\w+$")

❌ Bad: No constraints

class User(BaseModel): name: str age: int email: str

Use Enums for Categories

✅ Good: Enum for fixed set

class Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high"

class Task(BaseModel): title: str priority: Priority

❌ Bad: Free-form string

class Task(BaseModel): title: str priority: str # Can be anything

Provide Context in Prompts

✅ Good: Clear context

prompt = """ Extract product information from the following text. Text: iPhone 15 Pro costs $999 and is currently in stock. Product: """

❌ Bad: Minimal context

prompt = "iPhone 15 Pro costs $999 and is currently in stock."

Handle Optional Fields from typing import Optional

✅ Good: Optional fields for incomplete data

class Article(BaseModel): title: str # Required author: Optional[str] = None # Optional date: Optional[str] = None # Optional tags: list[str] = [] # Default empty list

Can succeed even if author/date missing

Comparison to Alternatives Feature Outlines Instructor Guidance LMQL Pydantic Support ✅ Native ✅ Native ❌ No ❌ No JSON Schema ✅ Yes ✅ Yes ⚠️ Limited ✅ Yes Regex Constraints ✅ Yes ❌ No ✅ Yes ✅ Yes Local Models ✅ Full ⚠️ Limited ✅ Full ✅ Full API Models ⚠️ Limited ✅ Full ✅ Full ✅ Full Zero Overhead ✅ Yes ❌ No ⚠️ Partial ✅ Yes Automatic Retrying ❌ No ✅ Yes ❌ No ❌ No Learning Curve Low Low Low High

When to choose Outlines:

Using local models (Transformers, llama.cpp, vLLM) Need maximum inference speed Want Pydantic model support Require zero-overhead structured generation Control token sampling process

When to choose alternatives:

Instructor: Need API models with automatic retrying Guidance: Need token healing and complex workflows LMQL: Prefer declarative query syntax Performance Characteristics

Speed:

Zero overhead: Structured generation as fast as unconstrained Fast-forward optimization: Skips deterministic tokens 1.2-2x faster than post-generation validation approaches

Memory:

FSM compiled once per schema (cached) Minimal runtime overhead Efficient with vLLM for high throughput

Accuracy:

100% valid outputs (guaranteed by FSM) No retry loops needed Deterministic token filtering Resources Documentation: https://outlines-dev.github.io/outlines GitHub: https://github.com/outlines-dev/outlines (8k+ stars) Discord: https://discord.gg/R9DSu34mGd Blog: https://blog.dottxt.co See Also references/json_generation.md - Comprehensive JSON and Pydantic patterns references/backends.md - Backend-specific configuration references/examples.md - Production-ready examples

安装

Base installation

With specific backends

Load model

Generate with type constraint

Generate structured output

Pydantic model -> JSON schema -> CFG -> FSM

Behind the scenes:

1. Person -> JSON schema

2. JSON schema -> CFG

3. CFG -> FSM

4. FSM filters tokens during generation

Multiple choice selection

Result: One of the three choices

Generate valid JSON matching schema

Guaranteed valid Product instance

Generate text matching regex

Result: "555-123-4567" (guaranteed to match pattern)

Generate specific numeric types

Load from Hugging Face

Use with any generator

Load GGUF model

For production deployments

Basic OpenAI support

Note: Some features limited with API models

Binary classification

Multi-class classification

With confidence

Basic usage

GPU configuration

Popular models

Load GGUF model

Full GPU offload

Single GPU

Multi-GPU

With quantization

✅ Good: Specific types

❌ Bad: Everything as string

✅ Good: With constraints

❌ Bad: No constraints

✅ Good: Enum for fixed set

❌ Bad: Free-form string

✅ Good: Clear context

❌ Bad: Minimal context

✅ Good: Optional fields for incomplete data

Can succeed even if author/date missing