zoom-rtms

安装量: 522
排名: #6725

安装

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill zoom-rtms
Zoom Realtime Media Streams (RTMS)
Background reference for live Zoom media pipelines. Prefer
build-zoom-bot
first, then use this skill for stream types, capabilities, and RTMS-specific implementation constraints.
Zoom Realtime Media Streams (RTMS)
Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.
Read This First (Critical)
RTMS is primarily a
backend media ingestion service
.
Your backend receives and processes live media:
audio, video, screen share, chat, transcript
.
RTMS is not a frontend UI SDK by itself.
Processing is
event-triggered
backend waits for RTMS start webhook events before stream handling begins.
Optional architecture (common):
Add a
Zoom App SDK
frontend for in-client UI/controls.
Stream backend RTMS outputs to frontend via
WebSocket
(or SSE, gRPC, queue workers, etc.).
Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.
Official Documentation
:
https://developers.zoom.us/docs/rtms/
SDK Reference (JS)
:
https://zoom.github.io/rtms/js/
SDK Reference (Python)
:
https://zoom.github.io/rtms/py/
Sample Repository
:
https://github.com/zoom/rtms-samples
Quick Links
New to RTMS? Follow this path:
Connection Architecture
- Two-phase WebSocket design
SDK Quickstart
- Fastest way to receive media (recommended)
Manual WebSocket
- Full protocol control without SDK
Media Types
- Audio, video, transcript, chat, screen share
Complete Implementation:
RTMS Bot
- End-to-end bot implementation guide
Reference:
Lifecycle Flow
- Complete webhook-to-streaming flow
Data Types
- All enums and constants
Webhooks
- Event subscription details
Environment Variables
- credential modes and runtime knobs
Quickstart Notes
- Secondary quickstart guide
Integrated Index
- see the section below in this file
Having issues?
Connection fails ->
Common Issues
Duplicate connections ->
Webhook Gotchas
No audio/video ->
Media Configuration
Start with preflight checks ->
5-Minute Runbook
Supported Products
Product
Webhook Event
Payload ID
App Type
Meetings
meeting.rtms_started
/
meeting.rtms_stopped
meeting_uuid
General App
Webinars
webinar.rtms_started
/
webinar.rtms_stopped
meeting_uuid
(same!)
General App
Video SDK
session.rtms_started
/
session.rtms_stopped
session_id
Video SDK App
Zoom Contact Center Voice
Product-specific RTMS/ZCC Voice events
Product-specific stream/session identifiers
Contact Center / approved RTMS integration
Once connected, the core signaling/media socket model is shared across products. Meetings, webinars, and Video SDK sessions use the familiar start/stop webhooks. Zoom Contact Center Voice adds its own RTMS/ZCC Voice event family and should be treated as the same transport model with product-specific event payloads.
RTMS Overview
RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions
without participant bots
. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.
What RTMS Provides
Media Type
Format
Use Cases
Audio
PCM (L16), G.711, G.722, Opus
Transcription, voice analysis, recording
Video
H.264, JPG, PNG
Recording, AI vision, thumbnails, active participant selection
Screen Share
H.264, JPG, PNG
Content capture, slide extraction
Transcript
JSON text
Meeting notes, search, compliance
Chat
JSON text
Archive, sentiment analysis
March 2026 Protocol Changes
Zoom Contact Center Voice support
RTMS now covers Contact Center Voice audio and transcript scenarios.
Transcript Language Identification control
transcript media handshakes now support
src_language
and
enable_lid
. Default behavior is LID enabled. Set
enable_lid: false
to force a fixed language.
Single individual video stream subscription
RTMS can now stream one participant's camera feed at a time when
data_opt
is set to
VIDEO_SINGLE_INDIVIDUAL_STREAM
.
Graceful client-initiated shutdown
backends can send
STREAM_CLOSE_REQ
over the signaling socket and wait for
STREAM_CLOSE_RESP
.
Media keep-alive tolerance increased
media socket keep-alive timeout is now
65 seconds
, not 35.
Two Approaches
Approach
Best For
Complexity
SDK
(
@zoom/rtms
)
Most use cases
Low - handles WebSocket complexity
Manual WebSocket
Custom protocols, other languages
High - full protocol implementation
Prerequisites
Node.js 20.3.0+
(24 LTS recommended) for JavaScript SDK
Python 3.10+
for Python SDK
Zoom General App (for meetings/webinars) or Video SDK App (for Video SDK) with RTMS feature enabled
Webhook endpoint for RTMS events
Server to receive WebSocket streams
Need RTMS access?
Post in
Zoom Developer Forum
requesting RTMS access with your use case.
Quick Start (SDK - Recommended)
import
rtms
from
"@zoom/rtms"
;
// All RTMS start/stop events across products
const
RTMS_EVENTS
=
[
"meeting.rtms_started"
,
"webinar.rtms_started"
,
"session.rtms_started"
]
;
// Handle webhook events
rtms
.
onWebhookEvent
(
(
{
event
,
payload
}
)
=>
{
if
(
!
RTMS_EVENTS
.
includes
(
event
)
)
return
;
const
client
=
new
rtms
.
Client
(
)
;
client
.
onAudioData
(
(
data
,
timestamp
,
metadata
)
=>
{
console
.
log
(
`
Audio from
${
metadata
.
userName
}
:
${
data
.
length
}
bytes
`
)
;
}
)
;
client
.
onTranscriptData
(
(
data
,
timestamp
,
metadata
)
=>
{
const
text
=
data
.
toString
(
'utf8'
)
;
console
.
log
(
`
${
metadata
.
userName
}
:
${
text
}
`
)
;
}
)
;
client
.
onJoinConfirm
(
(
reason
)
=>
{
console
.
log
(
`
Joined session:
${
reason
}
`
)
;
}
)
;
// SDK handles all WebSocket connections automatically
// Accepts both meeting_uuid and session_id transparently
client
.
join
(
payload
)
;
}
)
;
Quick Start (Manual WebSocket)
For full control or non-SDK languages, implement the two-phase WebSocket protocol:
const
WebSocket
=
require
(
'ws'
)
;
const
crypto
=
require
(
'crypto'
)
;
const
RTMS_EVENTS
=
[
'meeting.rtms_started'
,
'webinar.rtms_started'
,
'session.rtms_started'
]
;
// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function
generateSignature
(
clientId
,
idValue
,
streamId
,
clientSecret
)
{
const
message
=
`
${
clientId
}
,
${
idValue
}
,
${
streamId
}
`
;
return
crypto
.
createHmac
(
'sha256'
,
clientSecret
)
.
update
(
message
)
.
digest
(
'hex'
)
;
}
// 2. Handle webhook
app
.
post
(
'/webhook'
,
(
req
,
res
)
=>
{
res
.
status
(
200
)
.
send
(
)
;
// CRITICAL: Respond immediately!
const
{
event
,
payload
}
=
req
.
body
;
if
(
RTMS_EVENTS
.
includes
(
event
)
)
{
connectToRTMS
(
payload
)
;
}
}
)
;
// 3. Connect to signaling WebSocket
function
connectToRTMS
(
payload
)
{
const
{
server_urls
,
rtms_stream_id
}
=
payload
;
// meeting_uuid for meetings/webinars, session_id for Video SDK
const
idValue
=
payload
.
meeting_uuid
||
payload
.
session_id
;
const
signature
=
generateSignature
(
CLIENT_ID
,
idValue
,
rtms_stream_id
,
CLIENT_SECRET
)
;
const
signalingWs
=
new
WebSocket
(
server_urls
)
;
signalingWs
.
on
(
'open'
,
(
)
=>
{
signalingWs
.
send
(
JSON
.
stringify
(
{
msg_type
:
1
,
// Handshake request
protocol_version
:
1
,
meeting_uuid
:
idValue
,
rtms_stream_id
,
signature
,
media_type
:
9
// AUDIO(1) | TRANSCRIPT(8)
}
)
)
;
}
)
;
// ... handle responses, connect to media WebSocket
}
See
:
Manual WebSocket Guide
for complete implementation.
Media Type Bitmask
Combine types with bitwise OR:
Type
Value
Description
Audio
1
PCM audio samples
Video
2
H.264/JPG video frames
Screen Share
4
Separate from video!
Transcript
8
Real-time speech-to-text
Chat
16
In-meeting chat messages
All
32
All media types
Example
Audio + Transcript = 1 | 8 = 9 Critical Gotchas Issue Solution Only 1 connection allowed New connections kick out existing ones. Track active sessions! Respond 200 immediately If webhook delays, Zoom retries creating duplicate connections Heartbeat mandatory Respond to msg_type 12 with msg_type 13, or connection dies Reconnection is YOUR job RTMS doesn't auto-reconnect. Media keep-alive tolerance is now about 65s ; signaling remains around 60s Transcript language drift Use src_language plus enable_lid: false when you want fixed-language transcription instead of automatic language switching Single participant video only VIDEO_SINGLE_INDIVIDUAL_STREAM supports one participant at a time. A new VIDEO_SUBSCRIPTION_REQ overrides the previous selection Graceful close is explicit now Use STREAM_CLOSE_REQ / STREAM_CLOSE_RESP when your backend wants to terminate the stream cleanly Environment Variables SDK Environment Variables

Required - Authentication

ZM_RTMS_CLIENT

your_client_id

Zoom OAuth Client ID

ZM_RTMS_SECRET

your_client_secret

Zoom OAuth Client Secret

Optional - Webhook server

ZM_RTMS_PORT

8080

Default: 8080

ZM_RTMS_PATH

/webhook

Default: /

Optional - Logging

ZM_RTMS_LOG_LEVEL

info

error, warn, info, debug, trace

ZM_RTMS_LOG_FORMAT

progressive

progressive or json

ZM_RTMS_LOG_ENABLED

true Manual Implementation Variables ZOOM_CLIENT_ID = your_client_id ZOOM_CLIENT_SECRET = your_client_secret ZOOM_SECRET_TOKEN = your_webhook_token

For webhook validation

Zoom App Setup
For Meetings and Webinars (General App)
Go to
marketplace.zoom.us
-> Develop -> Build App
Choose
General App
->
User-Managed
Features -> Access ->
Enable Event Subscription
Add Events -> Search "rtms" -> Select:
meeting.rtms_started
meeting.rtms_stopped
webinar.rtms_started
(if using webinars)
webinar.rtms_stopped
(if using webinars)
Scopes -> Add Scopes -> Search "rtms" -> Add:
meeting:read:meeting_audio
meeting:read:meeting_video
meeting:read:meeting_transcript
meeting:read:meeting_chat
webinar:read:webinar_audio
(if using webinars)
webinar:read:webinar_video
(if using webinars)
webinar:read:webinar_transcript
(if using webinars)
webinar:read:webinar_chat
(if using webinars)
For Video SDK (Video SDK App)
Go to
marketplace.zoom.us
-> Develop -> Build App
Choose
Video SDK App
Use your SDK Key and SDK Secret (not OAuth Client ID/Secret)
Add Events:
session.rtms_started
session.rtms_stopped
Sample Repositories
Official Samples
Repository
Description
rtms-samples
RTMSManager, boilerplates, AI samples
rtms-quickstart-js
JavaScript SDK quickstart
rtms-quickstart-py
Python SDK quickstart
rtms-sdk-cpp
C++ SDK
zoom-rtms
Main SDK repository
AI Integration Samples
Sample
Description
rtms-meeting-assistant-starter-kit
AI meeting assistant with summaries
arlo-meeting-assistant
Production meeting assistant with DB
videosdk-rtms-transcribe-audio
Whisper transcription
Complete Documentation
Concepts
Connection Architecture
- Two-phase WebSocket design
Lifecycle Flow
- Webhook to streaming flow
Examples
SDK Quickstart
- Using @zoom/rtms SDK
Manual WebSocket
- Raw protocol implementation
RTMS Bot
- Complete bot implementation guide
AI Integration
- Transcription and analysis patterns
References
Media Types
- Audio, video, transcript, chat, screen share
Data Types
- All enums and constants
Connection
- WebSocket protocol details
Webhooks
- Event subscription
Troubleshooting
Common Issues
- FAQ and solutions
Resources
Official docs
:
https://developers.zoom.us/docs/rtms/
Data types
:
https://developers.zoom.us/docs/rtms/data-types/
Media params
:
https://developers.zoom.us/docs/rtms/media-parameter-definition/
Developer forum
:
https://devforum.zoom.us/
Need help?
Start with Integrated Index section below for complete navigation.
Integrated Index
This section was migrated from
SKILL.md
.
RTMS provides real-time access to live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions.
Critical Positioning
Treat RTMS as a
backend service
for receiving and processing media streams.
Backend role: ingest audio/video/share/chat/transcript, run AI/analytics, persist/forward data.
Optional frontend role: Zoom App SDK or web dashboard that consumes processed stream data from backend transport (WebSocket/SSE/other).
Kickoff model: backend waits for RTMS start webhook events, then starts stream processing.
Do not model RTMS as a frontend-only SDK.
Quick Start Path
If you're new to RTMS, follow this order:
Run preflight checks first
->
RUNBOOK.md
Understand the architecture
->
concepts/connection-architecture.md
Two-phase WebSocket: Signaling + Media
Why RTMS doesn't use bots
Choose your approach
-> SDK or Manual
SDK (recommended):
examples/sdk-quickstart.md
Manual WebSocket:
examples/manual-websocket.md
Understand the lifecycle
->
concepts/lifecycle-flow.md
Webhook -> Signaling -> Media -> Streaming
Configure media types
->
references/media-types.md
Audio, video, transcript, chat, screen share
Troubleshoot issues
->
troubleshooting/common-issues.md
Connection problems, duplicate webhooks, missing data
Documentation Structure
rtms/
├── SKILL.md # Main skill overview
├── SKILL.md # This file - navigation guide
├── concepts/ # Core architectural patterns
│ ├── connection-architecture.md # Two-phase WebSocket design
│ └── lifecycle-flow.md # Webhook to streaming flow
├── examples/ # Complete working code
│ ├── sdk-quickstart.md # Using @zoom/rtms SDK
│ ├── manual-websocket.md # Raw protocol implementation
│ ├── rtms-bot.md # Complete RTMS bot implementation
│ └── ai-integration.md # Transcription and analysis
├── references/ # Reference documentation
│ ├── media-types.md # Audio, video, transcript, chat, share
│ ├── data-types.md # All enums and constants
│ ├── connection.md # WebSocket protocol details
│ └── webhooks.md # Event subscription
└── troubleshooting/ # Problem solving guides
└── common-issues.md # FAQ and solutions
By Use Case
I want to get meeting transcripts
SDK Quickstart
- Fastest approach
Media Types
- Transcript configuration
AI Integration
- Whisper, Deepgram, AssemblyAI
I want to record meetings
Media Types
- Audio + Video configuration
SDK Quickstart
- Receiving media
AI Integration
- Gap-filled recording
I want to build an AI meeting assistant
AI Integration
- Complete patterns
SDK Quickstart
- Media ingestion
Lifecycle Flow
- Event handling
I want to build a complete RTMS bot
RTMS Bot
-
Complete implementation guide
Lifecycle Flow
- Webhook to streaming flow
Connection Architecture
- Two-phase design
I need full protocol control
Manual WebSocket
-
START HERE
Connection Architecture
- Two-phase design
Data Types
- All message types and enums
Connection
- Protocol details
I'm getting connection errors
Common Issues
- Diagnostic checklist
Connection Architecture
- Verify flow
Webhooks
- Validation and timing
I want to understand the architecture
Connection Architecture
- Two-phase WebSocket
Lifecycle Flow
- Complete flow diagram
Data Types
- Protocol constants
By Product
I'm building for Zoom Meetings
Standard RTMS setup. Webhook event:
meeting.rtms_started
. Uses General App with OAuth.
Start with
SDK Quickstart
or
Manual WebSocket
.
I'm building for Zoom Webinars
Same as meetings, but webhook event is
webinar.rtms_started
. Payload still uses
meeting_uuid
(NOT
webinar_uuid
).
Add webinar scopes and event subscriptions. See
Webhooks
.
Only
panelist
streams are confirmed available. Attendee streams may not be individual.
I'm building for Zoom Video SDK
Webhook event:
session.rtms_started
. Payload uses
session_id
(NOT
meeting_uuid
).
Requires a
Video SDK App
with SDK Key/Secret (not OAuth Client ID/Secret).
Once connected, the protocol is
identical
to meetings.
See
Webhooks
for payload details.
Key Documents
1. Connection Architecture (CRITICAL)
concepts/connection-architecture.md
RTMS uses
two separate WebSocket connections
:
Signaling WebSocket
Authentication, control, heartbeats
Media WebSocket
Actual audio/video/transcript data 2. SDK vs Manual (DECISION POINT) examples/sdk-quickstart.md vs examples/manual-websocket.md SDK Manual Handles WebSocket complexity Full protocol control Automatic reconnection DIY reconnection Less code More code Best for most use cases Best for custom requirements 3. Critical Gotchas (MOST COMMON ISSUES) troubleshooting/common-issues.md Respond 200 immediately - Delayed webhook responses cause duplicates Only 1 connection per stream - New connections kick out existing Heartbeat required - Must respond to keep-alive or connection dies Track active sessions - Prevent duplicate join attempts Key Learnings Critical Discoveries: Two-Phase WebSocket Design Signaling: Control plane (handshake, heartbeat, start/stop) Media: Data plane (audio, video, transcript, chat, share) See: Connection Architecture Webhook Response Timing MUST respond 200 BEFORE any processing Delayed response -> Zoom retries -> duplicate connections See: Common Issues Heartbeat is Mandatory Signaling: Receive msg_type 12, respond with msg_type 13 Media: Same pattern Failure to respond = connection closed See: Connection Signature Generation Format: HMAC-SHA256(clientSecret, "clientId,meetingUuid,streamId") For Video SDK, use session_id in place of meetingUuid Webinars still use meeting_uuid (not webinar_uuid ) Required for both signaling and media handshakes See: Manual WebSocket Media Types are Bitmasks Audio=1, Video=2, Share=4, Transcript=8, Chat=16, All=32 Combine with OR: Audio+Transcript = 1|8 = 9 See: Media Types Screen Share is SEPARATE from Video Different msg_type (16 vs 15) Different media flag (4 vs 2) Must subscribe separately See: Media Types Quick Reference "Connection fails" -> Common Issues "Duplicate connections" -> Webhook timing "No audio/video data" -> Media Types - Check configuration "How do I implement manually?" -> Manual WebSocket "What message types exist?" -> Data Types "How do I integrate AI?" -> AI Integration Document Version Based on Zoom RTMS SDK v1.x and official documentation as of 2026. Happy coding! Remember: Start with SDK Quickstart for the fastest path, or Manual WebSocket if you need full control.
返回排行榜