Type4Me macOS Voice Input Skill by ara.so — Daily 2026 Skills collection. Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync. Architecture Overview Type4Me/ ├── ASR/ # ASR engine abstraction │ ├── ASRProvider.swift # Provider enum + protocols │ ├── ASRProviderRegistry.swift # Plugin registry │ ├── Providers/ # Per-vendor config files │ ├── SherpaASRClient.swift # Local streaming ASR │ ├── SherpaOfflineASRClient.swift │ ├── VolcASRClient.swift # Volcengine streaming ASR │ └── DeepgramASRClient.swift # Deepgram streaming ASR ├── Bridge/ # SherpaOnnx C API Swift bridge ├── Audio/ # Audio capture ├── Session/ # Core state machine: record→ASR→inject ├── Input/ # Global hotkey management ├── Services/ # Credentials, hotwords, model manager ├── Protocol/ # Volcengine WebSocket codec └── UI/ # SwiftUI (FloatingBar + Settings) Installation Prerequisites

Xcode Command Line Tools

xcode-select --install

CMake (for local ASR engine)

brew install cmake Build & Deploy from Source git clone https://github.com/joewongjc/type4me.git cd type4me

Step 1: Compile SherpaOnnx local engine (~5 min, one-time)

bash scripts/build-sherpa.sh

Step 2: Build, bundle, sign, install to /Applications, and launch

bash scripts/deploy.sh Download Pre-built App Download Type4Me-v1.2.3.dmg from releases (cloud ASR only, no local engine): https://github.com/joewongjc/type4me/releases/tag/v1.2.3 If macOS blocks the app: xattr -d com.apple.quarantine /Applications/Type4Me.app Download Local ASR Models mkdir -p ~/Library/Application \ Support/Type4Me/Models

Option A: Lightweight ~20MB

tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \ -C ~/Library/Application \ Support/Type4Me/Models/

Option B: Balanced ~236MB (recommended)

tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \ -C ~/Library/Application \ Support/Type4Me/Models/

Option C: Bilingual Chinese+English ~1GB

tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \ -C ~/Library/Application \ Support/Type4Me/Models/ Expected structure for Paraformer model: ~/Library/Application Support/Type4Me/Models/ └── sherpa-onnx-streaming-paraformer-bilingual-zh-en/ ├── encoder.int8.onnx ├── decoder.int8.onnx └── tokens.txt Key Protocols SpeechRecognizer Protocol Every ASR client must implement this protocol: protocol SpeechRecognizer : AnyObject { /// Start a new recognition session func startRecognition ( ) async throws /// Feed raw PCM audio data func appendAudio ( _ buffer : AVAudioPCMBuffer ) async /// Stop and get final result func stopRecognition ( ) async throws -> String /// Cancel without result func cancelRecognition ( ) async /// Streaming partial results (optional) var partialResultHandler : ( ( String ) -> Void ) ? { get set } } ASRProviderConfig Protocol Each vendor's credential definition: protocol ASRProviderConfig { /// Unique identifier string static var providerID : String { get } /// Display name in Settings UI static var displayName : String { get } /// Credential fields shown in Settings static var credentialFields : [ CredentialField ] { get } /// Validate credentials before use static func validate ( _ credentials : [ String : String ] ) -> Bool /// Create the recognizer instance static func createClient ( credentials : [ String : String ] , config : RecognitionConfig ) throws -> SpeechRecognizer } Adding a New ASR Provider Step 1: Create Provider Config Create Type4Me/ASR/Providers/OpenAIWhisperProvider.swift : import Foundation struct OpenAIWhisperProvider : ASRProviderConfig { static let providerID = "openai_whisper" static let displayName = "OpenAI Whisper" static let credentialFields : [ CredentialField ] = [ CredentialField ( key : "api_key" , label : "API Key" , placeholder : "sk-..." , isSecret : true ) , CredentialField ( key : "model" , label : "Model" , placeholder : "whisper-1" , isSecret : false ) ] static func validate ( _ credentials : [ String : String ] ) -> Bool { guard let apiKey = credentials [ "api_key" ] , ! apiKey . isEmpty else { return false } return apiKey . hasPrefix ( "sk-" ) } static func createClient ( credentials : [ String : String ] , config : RecognitionConfig ) throws -> SpeechRecognizer { guard let apiKey = credentials [ "api_key" ] else { throw ASRError . missingCredential ( "api_key" ) } let model = credentials [ "model" ] ?? "whisper-1" return OpenAIWhisperASRClient ( apiKey : apiKey , model : model , config : config ) } } Step 2: Implement the ASR Client Create Type4Me/ASR/OpenAIWhisperASRClient.swift : import Foundation import AVFoundation final class OpenAIWhisperASRClient : SpeechRecognizer { var partialResultHandler : ( ( String ) -> Void ) ? private let apiKey : String private let model : String private let config : RecognitionConfig private var audioData : Data = Data ( ) init ( apiKey : String , model : String , config : RecognitionConfig ) { self . apiKey = apiKey self . model = model self . config = config } func startRecognition ( ) async throws { audioData = Data ( ) } func appendAudio ( _ buffer : AVAudioPCMBuffer ) async { // Convert PCM buffer to raw bytes and accumulate guard let channelData = buffer . floatChannelData ? [ 0 ] else { return } let frameCount = Int ( buffer . frameLength ) let bytes = UnsafeBufferPointer ( start : channelData , count : frameCount ) // Convert Float32 PCM to Int16 for Whisper API let int16Samples = bytes . map { sample -> Int16 in return Int16 ( max ( - 32768 , min ( 32767 , Int ( sample * 32767 ) ) ) ) } int16Samples . withUnsafeBytes { ptr in audioData . append ( contentsOf : ptr ) } } func stopRecognition ( ) async throws -> String { // Build multipart form request to Whisper API var request = URLRequest ( url : URL ( string : "https://api.openai.com/v1/audio/transcriptions" ) ! ) request . httpMethod = "POST" request . setValue ( "Bearer ( apiKey ) " , forHTTPHeaderField : "Authorization" ) let boundary = UUID ( ) . uuidString request . setValue ( "multipart/form-data; boundary= ( boundary ) " , forHTTPHeaderField : "Content-Type" ) var body = Data ( ) // Append audio file part body . append ( "-- ( boundary ) \r\n" . data ( using : . utf8 ) ! ) body . append ( "Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n" . data ( using : . utf8 ) ! ) body . append ( "Content-Type: audio/raw\r\n\r\n" . data ( using : . utf8 ) ! ) body . append ( audioData ) body . append ( "\r\n" . data ( using : . utf8 ) ! ) // Append model part body . append ( "-- ( boundary ) \r\n" . data ( using : . utf8 ) ! ) body . append ( "Content-Disposition: form-data; name=\"model\"\r\n\r\n" . data ( using : . utf8 ) ! ) body . append ( " ( model ) \r\n" . data ( using : . utf8 ) ! ) body . append ( "-- ( boundary ) --\r\n" . data ( using : . utf8 ) ! ) request . httpBody = body let ( data , response ) = try await URLSession . shared . data ( for : request ) guard let httpResponse = response as ? HTTPURLResponse , httpResponse . statusCode == 200 else { throw ASRError . networkError ( "Whisper API returned error" ) } let result = try JSONDecoder ( ) . decode ( WhisperResponse . self , from : data ) return result . text } func cancelRecognition ( ) async { audioData = Data ( ) } } private struct WhisperResponse : Codable { let text : String } Step 3: Register the Provider In Type4Me/ASR/ASRProviderRegistry.swift , add to the all array: struct ASRProviderRegistry { static let all : [ any ASRProviderConfig . Type ] = [ SherpaParaformerProvider . self , VolcengineProvider . self , DeepgramProvider . self , OpenAIWhisperProvider . self , // ← Add your provider here ] } Credentials Storage Credentials are stored at ~/Library/Application Support/Type4Me/credentials.json with permissions 0600 . Never hardcode secrets — always load via CredentialStore : // Reading credentials let store = CredentialStore . shared let apiKey = store . get ( providerID : "openai_whisper" , key : "api_key" ) // Writing credentials store . set ( providerID : "openai_whisper" , key : "api_key" , value : userInputKey ) // Checking if configured let isConfigured = store . isConfigured ( providerID : "openai_whisper" , fields : OpenAIWhisperProvider . credentialFields ) Custom Processing Modes with Prompt Variables Processing modes use LLM post-processing with three context variables: Variable Value {text} Recognized speech text {selected} Text selected in active app at record start {clipboard} Clipboard content at record start Example custom mode prompts: // Translate selection using voice command let translatePrompt = """ The user selected this text: {selected} Voice command: {text} Execute the command on the selected text. Output only the result. """ // Code review via voice let codeReviewPrompt = """ Code to review: {clipboard} Review instruction: {text} Provide focused feedback addressing the instruction. """ // Email reply drafting let emailPrompt = """ Original email: {selected} My reply intent (spoken): {text} Write a professional email reply. Output only the email body. """ Built-in Processing Modes enum ProcessingMode { case fast // Direct ASR output, zero latency case performance // Dual-channel: streaming + offline refinement case englishTranslation // Chinese speech → English text case promptOptimize // Raw prompt → optimized prompt via LLM case command // Voice command + selected/clipboard context → LLM action case custom ( prompt : String ) // User-defined prompt template } Session State Machine The core recording flow in Session/ : [Idle] → hotkey pressed → [Recording] → audio streams to ASR client → hotkey released/pressed again → [Processing] → ASR returns text → [LLM Post-processing] (if mode requires) → [Injecting] → text injected into active app → [Idle] Updating After Source Changes cd type4me git pull bash scripts/deploy.sh

SherpaOnnx does NOT need recompiling unless engine version changed

Troubleshooting App won't open (security warning) xattr -d com.apple.quarantine /Applications/Type4Me.app Local model not recognized in Settings Verify the directory structure exactly matches: ls ~/Library/Application \ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/

Must show: encoder.int8.onnx decoder.int8.onnx tokens.txt

SherpaOnnx build fails

Ensure cmake is installed

brew install cmake

Clean and retry

rm -rf Frameworks/ bash scripts/build-sherpa.sh New ASR provider not appearing in Settings Confirm the provider type is added to ASRProviderRegistry.all Ensure providerID is unique across all providers Clean build: swift package clean && bash scripts/deploy.sh Audio not captured / no floating bar Grant microphone permission: System Settings → Privacy & Security → Microphone → Type4Me ✓ Grant Accessibility permission for text injection: System Settings → Privacy & Security → Accessibility → Type4Me ✓ Credentials not saving

Check file exists and has correct permissions

ls -la ~/Library/Application \ Support/Type4Me/credentials.json

Should show: -rw------- (0600)

Fix permissions if needed:

chmod 0600 ~/Library/Application \ Support/Type4Me/credentials.json Export history to CSV Open Settings → History → select date range → Export CSV. The SQLite database is at: ~/Library/Application \ Support/Type4Me/history.db

Direct query:

sqlite3 ~/Library/Application \ Support/Type4Me/history.db \ "SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;" System Requirements macOS 14.0 (Sonoma) or later Apple Silicon (M1/M2/M3/M4) recommended for local ASR inference Xcode Command Line Tools + CMake for source builds Internet connection only needed for cloud ASR providers

type4me-macos-voice-input

安装