FoundationModels: On-Device LLM (iOS 26) Patterns for integrating Apple's on-device language model into apps using the FoundationModels framework. Covers text generation, structured output with @Generable , custom tool calling, and snapshot streaming — all running on-device for privacy and offline support. When to Activate Building AI-powered features using Apple Intelligence on-device Generating or summarizing text without cloud dependency Extracting structured data from natural language input Implementing custom tool calling for domain-specific AI actions Streaming structured responses for real-time UI updates Need privacy-preserving AI (no data leaves the device) Core Pattern — Availability Check Always check model availability before creating a session: struct GenerativeView : View { private var model = SystemLanguageModel . default var body : some View { switch model . availability { case . available : ContentView ( ) case . unavailable ( . deviceNotEligible ) : Text ( "Device not eligible for Apple Intelligence" ) case . unavailable ( . appleIntelligenceNotEnabled ) : Text ( "Please enable Apple Intelligence in Settings" ) case . unavailable ( . modelNotReady ) : Text ( "Model is downloading or not ready" ) case . unavailable ( let other ) : Text ( "Model unavailable: ( other ) " ) } } } Core Pattern — Basic Session // Single-turn: create a new session each time let session = LanguageModelSession ( ) let response = try await session . respond ( to : "What's a good month to visit Paris?" ) print ( response . content ) // Multi-turn: reuse session for conversation context let session = LanguageModelSession ( instructions : """ You are a cooking assistant. Provide recipe suggestions based on ingredients. Keep suggestions brief and practical. """ ) let first = try await session . respond ( to : "I have chicken and rice" ) let followUp = try await session . respond ( to : "What about a vegetarian option?" ) Key points for instructions: Define the model's role ("You are a mentor") Specify what to do ("Help extract calendar events") Set style preferences ("Respond as briefly as possible") Add safety measures ("Respond with 'I can't help with that' for dangerous requests") Core Pattern — Guided Generation with @Generable Generate structured Swift types instead of raw strings: 1. Define a Generable Type @Generable ( description : "Basic profile information about a cat" ) struct CatProfile { var name : String @Guide ( description : "The age of the cat" , . range ( 0 ... 20 ) ) var age : Int @Guide ( description : "A one sentence profile about the cat's personality" ) var profile : String } 2. Request Structured Output let response = try await session . respond ( to : "Generate a cute rescue cat" , generating : CatProfile . self ) // Access structured fields directly print ( "Name: ( response . content . name ) " ) print ( "Age: ( response . content . age ) " ) print ( "Profile: ( response . content . profile ) " ) Supported @Guide Constraints .range(0...20) — numeric range .count(3) — array element count description: — semantic guidance for generation Core Pattern — Tool Calling Let the model invoke custom code for domain-specific tasks: 1. Define a Tool struct RecipeSearchTool : Tool { let name = "recipe_search" let description = "Search for recipes matching a given term and return a list of results." @Generable struct Arguments { var searchTerm : String var numberOfResults : Int } func call ( arguments : Arguments ) async throws -> ToolOutput { let recipes = await searchRecipes ( term : arguments . searchTerm , limit : arguments . numberOfResults ) return . string ( recipes . map { "- ( $0 . name ) : ( $0 . description ) " } . joined ( separator : "\n" ) ) } } 2. Create Session with Tools let session = LanguageModelSession ( tools : [ RecipeSearchTool ( ) ] ) let response = try await session . respond ( to : "Find me some pasta recipes" ) 3. Handle Tool Errors do { let answer = try await session . respond ( to : "Find a recipe for tomato soup." ) } catch let error as LanguageModelSession . ToolCallError { print ( error . tool . name ) if case . databaseIsEmpty = error . underlyingError as ? RecipeSearchToolError { // Handle specific tool error } } Core Pattern — Snapshot Streaming Stream structured responses for real-time UI with PartiallyGenerated types: @Generable struct TripIdeas { @Guide ( description : "Ideas for upcoming trips" ) var ideas : [ String ] } let stream = session . streamResponse ( to : "What are some exciting trip ideas?" , generating : TripIdeas . self ) for try await partial in stream { // partial: TripIdeas.PartiallyGenerated (all properties Optional) print ( partial ) } SwiftUI Integration @State private var partialResult : TripIdeas . PartiallyGenerated ? @State private var errorMessage : String ? var body : some View { List { ForEach ( partialResult ? . ideas ?? [ ] , id : \ . self ) { idea in Text ( idea ) } } . overlay { if let errorMessage { Text ( errorMessage ) . foregroundStyle ( . red ) } } . task { do { let stream = session . streamResponse ( to : prompt , generating : TripIdeas . self ) for try await partial in stream { partialResult = partial } } catch { errorMessage = error . localizedDescription } } } Key Design Decisions Decision Rationale On-device execution Privacy — no data leaves the device; works offline 4,096 token limit On-device model constraint; chunk large data across sessions Snapshot streaming (not deltas) Structured output friendly; each snapshot is a complete partial state @Generable macro Compile-time safety for structured generation; auto-generates PartiallyGenerated type Single request per session isResponding prevents concurrent requests; create multiple sessions if needed response.content (not .output ) Correct API — always access results via .content property Best Practices Always check model.availability before creating a session — handle all unavailability cases Use instructions to guide model behavior — they take priority over prompts Check isResponding before sending a new request — sessions handle one request at a time Access response.content for results — not .output Break large inputs into chunks — 4,096 token limit applies to instructions + prompt + output combined Use @Generable for structured output — stronger guarantees than parsing raw strings Use GenerationOptions(temperature:) to tune creativity (higher = more creative) Monitor with Instruments — use Xcode Instruments to profile request performance Anti-Patterns to Avoid Creating sessions without checking model.availability first Sending inputs exceeding the 4,096 token context window Attempting concurrent requests on a single session Using .output instead of .content to access response data Parsing raw string responses when @Generable structured output would work Building complex multi-step logic in a single prompt — break into multiple focused prompts Assuming the model is always available — device eligibility and settings vary When to Use On-device text generation for privacy-sensitive apps Structured data extraction from user input (forms, natural language commands) AI-assisted features that must work offline Streaming UI that progressively shows generated content Domain-specific AI actions via tool calling (search, compute, lookup)

foundation-models-on-device

安装