Vision Framework API Reference

Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.

When to Use This Reference Implementing subject lifting using VisionKit or Vision Detecting hand/body poses for gesture recognition or fitness apps Segmenting people from backgrounds or separating multiple individuals Face detection and landmarks for AR effects or authentication Combining Vision APIs to solve complex computer vision problems Looking up specific API signatures and parameter meanings Recognizing text in images (OCR) with VNRecognizeTextRequest Detecting barcodes and QR codes with VNDetectBarcodesRequest Building live scanners with DataScannerViewController Scanning documents with VNDocumentCameraViewController Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)

Related skills: See axiom-vision for decision trees and patterns, axiom-vision-diag for troubleshooting

Vision Framework Overview

Vision provides computer vision algorithms for still images and video:

Core workflow:

Create request (e.g., VNDetectHumanHandPoseRequest()) Create handler with image (VNImageRequestHandler(cgImage: image)) Perform request (try handler.perform([request])) Access observations from request.results

Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates

Performance: Run on background queue - resource intensive, blocks UI if on main thread

Subject Segmentation APIs VNGenerateForegroundInstanceMaskRequest

Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+

Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)

Basic Usage let request = VNGenerateForegroundInstanceMaskRequest() let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else { return }

InstanceMaskObservation

allInstances: IndexSet containing all foreground instance indices (excludes background 0)

instanceMask: CVPixelBuffer with UInt8 labels (0 = background, 1+ = instance indices)

instanceAtPoint(_:): Returns instance index at normalized point

let point = CGPoint(x: 0.5, y: 0.5) // Center of image let instance = observation.instanceAtPoint(point)

if instance == 0 { print("Background tapped") } else { print("Instance (instance) tapped") }

Generating Masks

createScaledMask(for:croppedToInstancesContent:)

Parameters:

for: IndexSet of instances to include croppedToInstancesContent: false = Output matches input resolution (for compositing) true = Tight crop around selected instances

Returns: Single-channel floating-point CVPixelBuffer (soft segmentation mask)

// All instances, full resolution let mask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false )

// Single instance, cropped let instances = IndexSet(integer: 1) let croppedMask = try observation.createScaledMask( for: instances, croppedToInstancesContent: true )

Instance Mask Hit Testing

Access raw pixel buffer to map tap coordinates to instance labels:

let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly) defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask) let width = CVPixelBufferGetWidth(instanceMask) let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// Convert normalized tap to pixel coordinates let pixelPoint = VNImagePointForNormalizedPoint( CGPoint(x: normalizedX, y: normalizedY), width: imageWidth, height: imageHeight )

// Calculate byte offset let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// Read instance label let label = UnsafeRawPointer(baseAddress!).load( fromByteOffset: offset, as: UInt8.self )

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))

VisionKit Subject Lifting ImageAnalysisInteraction (iOS)

Availability: iOS 16+, iPadOS 16+

Adds system-like subject lifting UI to views:

let interaction = ImageAnalysisInteraction() interaction.preferredInteractionTypes = .imageSubject // Or .automatic imageView.addInteraction(interaction)

Interaction types:

.automatic: Subject lifting + Live Text + data detectors .imageSubject: Subject lifting only (no interactive text) ImageAnalysisOverlayView (macOS)

Availability: macOS 13+

let overlayView = ImageAnalysisOverlayView() overlayView.preferredInteractionTypes = .imageSubject nsView.addSubview(overlayView)

Programmatic Access ImageAnalyzer let analyzer = ImageAnalyzer() let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)

ImageAnalysis

subjects: [Subject] - All subjects in image

highlightedSubjects: Set - Currently highlighted (user long-pressed)

subject(at:): Async lookup of subject at normalized point (returns nil if none)

// Get all subjects let subjects = analysis.subjects

// Look up subject at tap if let subject = try await analysis.subject(at: tapPoint) { // Process subject }

// Change highlight state analysis.highlightedSubjects = Set([subjects[0], subjects[1]])

Subject Struct

image: UIImage/NSImage - Extracted subject with transparency

bounds: CGRect - Subject boundaries in image coordinates

// Single subject image let subjectImage = subject.image

// Composite multiple subjects let compositeImage = try await analysis.image(for: [subject1, subject2])

Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)

Person Segmentation APIs VNGeneratePersonSegmentationRequest

Availability: iOS 15+, macOS 12+

Returns single mask containing all people in image:

let request = VNGeneratePersonSegmentationRequest() // Configure quality level if needed try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else { return }

let personMask = observation.pixelBuffer // CVPixelBuffer

VNGeneratePersonInstanceMaskRequest

Availability: iOS 17+, macOS 14+

Returns separate masks for up to 4 people:

let request = VNGeneratePersonInstanceMaskRequest() try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else { return }

// Same InstanceMaskObservation API as foreground instance masks let allPeople = observation.allInstances // Up to 4 people (1-4)

// Get mask for person 1 let person1Mask = try observation.createScaledMask( for: IndexSet(integer: 1), croppedToInstancesContent: false )

Limitations:

Segments up to 4 people With >4 people: may miss people or combine them (typically background people) Use VNDetectFaceRectanglesRequest to count faces if you need to handle crowded scenes Hand Pose Detection VNDetectHumanHandPoseRequest

Availability: iOS 14+, macOS 11+

Detects 21 hand landmarks per hand:

let request = VNDetectHumanHandPoseRequest() request.maximumHandCount = 2 // Default: 2, increase if needed

let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])

for observation in request.results as? [VNHumanHandPoseObservation] ?? [] { // Process each hand }

Performance note: maximumHandCount affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.

Hand Landmarks (21 points)

Wrist: 1 landmark

Thumb (4 landmarks):

.thumbTip .thumbIP (interphalangeal joint) .thumbMP (metacarpophalangeal joint) .thumbCMC (carpometacarpal joint)

Fingers (4 landmarks each):

Tip (.indexTip, .middleTip, .ringTip, .littleTip) DIP (distal interphalangeal joint) PIP (proximal interphalangeal joint) MCP (metacarpophalangeal joint) Group Keys

Access landmark groups:

Group Key Points .all All 21 landmarks .thumb 4 thumb joints .indexFinger 4 index finger joints .middleFinger 4 middle finger joints .ringFinger 4 ring finger joints .littleFinger 4 little finger joints // Get all points let allPoints = try observation.recognizedPoints(.all)

// Get index finger points only let indexPoints = try observation.recognizedPoints(.indexFinger)

// Get specific point let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip)

// Check confidence guard thumbTip.confidence > 0.5 else { return }

// Access location (normalized coordinates, lower-left origin) let location = thumbTip.location // CGPoint

Gesture Recognition Example (Pinch) let thumbTip = try observation.recognizedPoint(.thumbTip) let indexTip = try observation.recognizedPoint(.indexTip)

guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else { return }

let distance = hypot( thumbTip.location.x - indexTip.location.x, thumbTip.location.y - indexTip.location.y )

let isPinching = distance < 0.05 // Normalized threshold

Chirality (Handedness) let chirality = observation.chirality // .left or .right or .unknown

Body Pose Detection VNDetectHumanBodyPoseRequest (2D)

Availability: iOS 14+, macOS 11+

Detects 18 body landmarks (2D normalized coordinates):

let request = VNDetectHumanBodyPoseRequest() try handler.perform([request])

for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] { // Process each person }

Body Landmarks (18 points)

Face (5 landmarks):

.nose, .leftEye, .rightEye, .leftEar, .rightEar

Arms (6 landmarks):

Left: .leftShoulder, .leftElbow, .leftWrist Right: .rightShoulder, .rightElbow, .rightWrist

Torso (7 landmarks):

.neck (between shoulders) .leftShoulder, .rightShoulder (also in arm groups) .leftHip, .rightHip .root (between hips)

Legs (6 landmarks):

Left: .leftHip, .leftKnee, .leftAnkle Right: .rightHip, .rightKnee, .rightAnkle

Note: Shoulders and hips appear in multiple groups

Group Keys (Body) Group Key Points .all All 18 landmarks .face 5 face landmarks .leftArm shoulder, elbow, wrist .rightArm shoulder, elbow, wrist .torso neck, shoulders, hips, root .leftLeg hip, knee, ankle .rightLeg hip, knee, ankle // Get all body points let allPoints = try observation.recognizedPoints(.all)

// Get left arm only let leftArmPoints = try observation.recognizedPoints(.leftArm)

// Get specific joint let leftWrist = try observation.recognizedPoint(.leftWrist)

VNDetectHumanBodyPose3DRequest (3D)

Availability: iOS 17+, macOS 14+

Returns 3D skeleton with 17 joints in meters (real-world coordinates):

let request = VNDetectHumanBodyPose3DRequest() try handler.perform([request])

guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else { return }

// Get 3D joint position let leftWrist = try observation.recognizedPoint(.leftWrist) let position = leftWrist.position // simd_float4x4 matrix let localPosition = leftWrist.localPosition // Relative to parent joint

3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)

3D Observation Properties

bodyHeight: Estimated height in meters

With depth data: Measured height Without depth data: Reference height (1.8m)

heightEstimation: .measured or .reference

cameraOriginMatrix: simd_float4x4 camera position/orientation relative to subject

pointInImage(_:): Project 3D joint back to 2D image coordinates

let wrist2D = try observation.pointInImage(leftWrist)

3D Point Classes

VNPoint3D: Base class with simd_float4x4 position matrix

VNRecognizedPoint3D: Adds identifier (joint name)

VNHumanBodyRecognizedPoint3D: Adds localPosition and parentJoint

// Position relative to skeleton root (center of hip) let modelPosition = leftWrist.position

// Position relative to parent joint (left elbow) let relativePosition = leftWrist.localPosition

Depth Input

Vision accepts depth data alongside images:

// From AVDepthData let handler = VNImageRequestHandler( cvPixelBuffer: imageBuffer, depthData: depthData, orientation: orientation )

// From file (automatic depth extraction) let handler = VNImageRequestHandler(url: imageURL) // Depth auto-fetched

Depth formats: Disparity or Depth (interchangeable via AVFoundation)

LiDAR: Use in live capture sessions for accurate scale/measurement

Face Detection & Landmarks VNDetectFaceRectanglesRequest

Availability: iOS 11+

Detects face bounding boxes:

let request = VNDetectFaceRectanglesRequest() try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] { let faceBounds = observation.boundingBox // Normalized rect }

VNDetectFaceLandmarksRequest

Availability: iOS 11+

Detects face with detailed landmarks:

let request = VNDetectFaceLandmarksRequest() try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] { if let landmarks = observation.landmarks { let leftEye = landmarks.leftEye let nose = landmarks.nose let leftPupil = landmarks.leftPupil // Revision 2+ } }

Revisions:

Revision 1: Basic landmarks Revision 2: Detects upside-down faces Revision 3+: Pupil locations Person Detection VNDetectHumanRectanglesRequest

Availability: iOS 13+

Detects human bounding boxes (torso detection):

let request = VNDetectHumanRectanglesRequest() try handler.perform([request])

for observation in request.results as? [VNHumanObservation] ?? [] { let humanBounds = observation.boundingBox // Normalized rect }

Use case: Faster than pose detection when you only need location

CoreImage Integration CIBlendWithMask Filter

Composite subject on new background using Vision mask:

// 1. Get mask from Vision let observation = request.results?.first as? VNInstanceMaskObservation let visionMask = try observation.createScaledMask( for: observation.allInstances, croppedToInstancesContent: false )

// 2. Convert to CIImage let maskImage = CIImage(cvPixelBuffer: axiom-visionMask)

// 3. Apply filter let filter = CIFilter(name: "CIBlendWithMask")! filter.setValue(sourceImage, forKey: kCIInputImageKey) filter.setValue(maskImage, forKey: kCIInputMaskImageKey) filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)

let output = filter.outputImage // Composited result

Parameters:

Input image: Original image to mask Mask image: Vision's soft segmentation mask Background image: New background (or empty image for transparency)

HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)

Text Recognition APIs VNRecognizeTextRequest

Availability: iOS 13+, macOS 10.15+

Recognizes text in images with configurable accuracy/speed trade-off.

Basic Usage let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate // Or .fast request.recognitionLanguages = ["en-US", "de-DE"] // Order matters request.usesLanguageCorrection = true

let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])

for observation in request.results as? [VNRecognizedTextObservation] ?? [] { // Get top candidates let candidates = observation.topCandidates(3) let bestText = candidates.first?.string ?? "" }

Recognition Levels Level Performance Accuracy Best For .fast Real-time Good Camera feed, large text, signs .accurate Slower Excellent Documents, receipts, handwriting

Fast path: Character-by-character recognition (Neural Network → Character Detection)

Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)

Properties Property Type Description recognitionLevel VNRequestTextRecognitionLevel .fast or .accurate recognitionLanguages [String] BCP 47 language codes, order = priority usesLanguageCorrection Bool Use language model for correction customWords [String] Domain-specific vocabulary automaticallyDetectsLanguage Bool Auto-detect language (iOS 16+) minimumTextHeight Float Min text height as fraction of image (0-1) revision Int API version (affects supported languages) Language Support // Check supported languages for current settings let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages( for: .accurate, revision: VNRecognizeTextRequestRevision3 )

Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.

Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).

VNRecognizedTextObservation

boundingBox: Normalized rect containing recognized text

topCandidates(_:): Returns [VNRecognizedText] ordered by confidence

VNRecognizedText Property Type Description string String Recognized text confidence VNConfidence 0.0-1.0 boundingBox(for:) VNRectangleObservation? Box for substring range // Get bounding box for substring let text = candidate.string if let range = text.range(of: "invoice") { let box = try candidate.boundingBox(for: range) }

Barcode Detection APIs VNDetectBarcodesRequest

Availability: iOS 11+, macOS 10.13+

Detects and decodes barcodes and QR codes.

Basic Usage let request = VNDetectBarcodesRequest() request.symbologies = [.qr, .ean13, .code128] // Specific codes

let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request])

for barcode in request.results as? [VNBarcodeObservation] ?? [] { let payload = barcode.payloadStringValue let type = barcode.symbology let bounds = barcode.boundingBox }

Symbologies

1D Barcodes:

.codabar (iOS 15+) .code39, .code39Checksum, .code39FullASCII, .code39FullASCIIChecksum .code93, .code93i .code128 .ean8, .ean13 .gs1DataBar, .gs1DataBarExpanded, .gs1DataBarLimited (iOS 15+) .i2of5, .i2of5Checksum .itf14 .upce

2D Codes:

.aztec .dataMatrix .microPDF417 (iOS 15+) .microQR (iOS 15+) .pdf417 .qr

Performance: Specifying fewer symbologies = faster detection

Revisions Revision iOS Features 1 11+ Basic detection, one code at a time 2 15+ Codabar, GS1, MicroPDF, MicroQR, better ROI 3 16+ ML-based, multiple codes, better bounding boxes VNBarcodeObservation Property Type Description payloadStringValue String? Decoded content symbology VNBarcodeSymbology Barcode type boundingBox CGRect Normalized bounds topLeft/topRight/bottomLeft/bottomRight CGPoint Corner points VisionKit Scanner APIs DataScannerViewController

Availability: iOS 16+

Camera-based live scanner with built-in UI for text and barcodes.

Check Availability // Hardware support DataScannerViewController.isSupported

// Runtime availability (camera access, parental controls) DataScannerViewController.isAvailable

Configuration import VisionKit

let dataTypes: Set = [ .barcode(symbologies: [.qr, .ean13]), .text(textContentType: .URL), // Or nil for all text // .text(languages: ["ja"]) // Filter by language ]

let scanner = DataScannerViewController( recognizedDataTypes: dataTypes, qualityLevel: .balanced, // .fast, .balanced, .accurate recognizesMultipleItems: true, isHighFrameRateTrackingEnabled: true, isPinchToZoomEnabled: true, isGuidanceEnabled: true, isHighlightingEnabled: true )

scanner.delegate = self present(scanner, animated: true) { try? scanner.startScanning() }

RecognizedDataType Type Description .barcode(symbologies:) Specific barcode types .text() All text .text(languages:) Text filtered by language .text(textContentType:) Text filtered by type (URL, phone, email) Delegate Protocol protocol DataScannerViewControllerDelegate { func dataScanner(_ dataScanner: DataScannerViewController, didTapOn item: RecognizedItem)

func dataScanner(_ dataScanner: DataScannerViewController,
                 didAdd addedItems: [RecognizedItem],
                 allItems: [RecognizedItem])

func dataScanner(_ dataScanner: DataScannerViewController,
                 didUpdate updatedItems: [RecognizedItem],
                 allItems: [RecognizedItem])

func dataScanner(_ dataScanner: DataScannerViewController,
                 didRemove removedItems: [RecognizedItem],
                 allItems: [RecognizedItem])

func dataScanner(_ dataScanner: DataScannerViewController,
                 becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)

}

RecognizedItem enum RecognizedItem { case text(RecognizedItem.Text) case barcode(RecognizedItem.Barcode)

var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }

}

// Text item struct Text { let transcript: String }

// Barcode item struct Barcode { let payloadStringValue: String? let observation: VNBarcodeObservation }

Async Stream // Alternative to delegate for await items in scanner.recognizedItems { // Current recognized items }

Custom Highlights // Add custom views over recognized items scanner.overlayContainerView.addSubview(customHighlight)

// Capture still photo let photo = try await scanner.capturePhoto()

VNDocumentCameraViewController

Availability: iOS 13+

Document scanning with automatic edge detection, perspective correction, and lighting adjustment.

Basic Usage import VisionKit

let camera = VNDocumentCameraViewController() camera.delegate = self present(camera, animated: true)

Delegate Protocol protocol VNDocumentCameraViewControllerDelegate { func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan)

func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                   didFailWithError error: Error)

}

VNDocumentCameraScan Property Type Description pageCount Int Number of scanned pages imageOfPage(at:) UIImage Get page image at index title String User-editable title func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) { controller.dismiss(animated: true)

for i in 0..<scan.pageCount {
    let pageImage = scan.imageOfPage(at: i)
    // Process with VNRecognizeTextRequest
}

}

Document Analysis APIs VNDetectDocumentSegmentationRequest

Availability: iOS 15+, macOS 12+

Detects document boundaries for custom camera UIs or post-processing.

let request = VNDetectDocumentSegmentationRequest() let handler = VNImageRequestHandler(ciImage: image) try handler.perform([request])

guard let observation = request.results?.first as? VNRectangleObservation else { return // No document found }

// Get corner points (normalized) let corners = [ observation.topLeft, observation.topRight, observation.bottomLeft, observation.bottomRight ]

vs VNDetectRectanglesRequest:

Document: ML-based, trained specifically on documents Rectangle: Edge-based, finds any quadrilateral RecognizeDocumentsRequest (iOS 26+)

Availability: iOS 26+, macOS 26+

Structured document understanding with semantic parsing.

Basic Usage let request = RecognizeDocumentsRequest() let observations = try await request.perform(on: imageData)

guard let document = observations.first?.document else { return }

DocumentObservation Hierarchy DocumentObservation └── document: DocumentObservation.Document ├── text: TextObservation ├── tables: [Container.Table] ├── lists: [Container.List] └── barcodes: [Container.Barcode]

Table Extraction for table in document.tables { for row in table.rows { for cell in row { let text = cell.content.text.transcript let detectedData = cell.content.text.detectedData } } }

Detected Data Types for data in document.text.detectedData { switch data.match.details { case .emailAddress(let email): let address = email.emailAddress case .phoneNumber(let phone): let number = phone.phoneNumber case .link(let url): let link = url case .address(let address): let components = address case .date(let date): let dateValue = date default: break } }

TextObservation Hierarchy TextObservation ├── transcript: String ├── lines: [TextObservation.Line] ├── paragraphs: [TextObservation.Paragraph] ├── words: [TextObservation.Word] └── detectedData: [DetectedDataObservation]

API Quick Reference Subject Segmentation API Platform Purpose VNGenerateForegroundInstanceMaskRequest iOS 17+ Class-agnostic subject instances VNGeneratePersonInstanceMaskRequest iOS 17+ Up to 4 people separately VNGeneratePersonSegmentationRequest iOS 15+ All people (single mask) ImageAnalysisInteraction (VisionKit) iOS 16+ UI for subject lifting Pose Detection API Platform Landmarks Coordinates VNDetectHumanHandPoseRequest iOS 14+ 21 per hand 2D normalized VNDetectHumanBodyPoseRequest iOS 14+ 18 body joints 2D normalized VNDetectHumanBodyPose3DRequest iOS 17+ 17 body joints 3D meters Face & Person Detection API Platform Purpose VNDetectFaceRectanglesRequest iOS 11+ Face bounding boxes VNDetectFaceLandmarksRequest iOS 11+ Face with detailed landmarks VNDetectHumanRectanglesRequest iOS 13+ Human torso bounding boxes Text & Barcode API Platform Purpose VNRecognizeTextRequest iOS 13+ Text recognition (OCR) VNDetectBarcodesRequest iOS 11+ Barcode/QR detection DataScannerViewController iOS 16+ Live camera scanner (text + barcodes) VNDocumentCameraViewController iOS 13+ Document scanning with perspective correction VNDetectDocumentSegmentationRequest iOS 15+ Programmatic document edge detection RecognizeDocumentsRequest iOS 26+ Structured document extraction Observation Types Observation Returned By VNInstanceMaskObservation Foreground/person instance masks VNPixelBufferObservation Person segmentation (single mask) VNHumanHandPoseObservation Hand pose VNHumanBodyPoseObservation Body pose (2D) VNHumanBodyPose3DObservation Body pose (3D) VNFaceObservation Face detection/landmarks VNHumanObservation Human rectangles VNRecognizedTextObservation Text recognition VNBarcodeObservation Barcode detection VNRectangleObservation Document segmentation DocumentObservation Structured document (iOS 26+) Resources

WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099

Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest

Skills: axiom-vision, axiom-vision-diag

axiom-vision-ref

安装