LiveKit Agents Python SDK
Build voice AI agents with LiveKit's Python Agents SDK.
LiveKit MCP server tools
This skill works alongside the LiveKit MCP server, which provides direct access to the latest LiveKit documentation, code examples, and changelogs. Use these tools when you need up-to-date information that may have changed since this skill was created.
Available MCP tools:
docs_search
- Search the LiveKit docs site
get_pages
- Fetch specific documentation pages by path
get_changelog
- Get recent releases and updates for LiveKit packages
code_search
- Search LiveKit repositories for code examples
get_python_agent_example
- Browse 100+ Python agent examples
When to use MCP tools:
You need the latest API documentation or feature updates
You're looking for recent examples or code patterns
You want to check if a feature has been added in recent releases
The local references don't cover a specific topic
When to use local references:
You need quick access to core concepts covered in this skill
You're working offline or want faster access to common patterns
The information in the references is sufficient for your needs
Use MCP tools and local references together for the best experience.
References
Consult these resources as needed:
./references/livekit-overview.md -- LiveKit ecosystem overview and how these skills work together
./references/agent-session.md -- AgentSession lifecycle, events, and configuration
./references/tools.md -- Function tools, RunContext, and tool results
./references/models.md -- STT, LLM, TTS model strings and plugin configuration
./references/workflows.md -- Multi-agent handoffs, Tasks, TaskGroups, and pipeline nodes
Installation
uv
add
"livekit-agents[silero,turn-detector]~=1.3"
\
"livekit-plugins-noise-cancellation~=0.2"
\
"python-dotenv"
Environment variables
Use the LiveKit CLI to load your credentials into a
.env.local
file:
lk app
env
-w
Or manually create a
.env.local
file:
LIVEKIT_API_KEY
=
your_api_key
LIVEKIT_API_SECRET
=
your_api_secret
LIVEKIT_URL
=
wss://your-project.livekit.cloud
Quick start
Basic agent with STT-LLM-TTS pipeline
from
dotenv
import
load_dotenv
from
livekit
import
agents
,
rtc
from
livekit
.
agents
import
AgentSession
,
Agent
,
AgentServer
,
room_io
from
livekit
.
plugins
import
noise_cancellation
,
silero
from
livekit
.
plugins
.
turn_detector
.
multilingual
import
MultilingualModel
load_dotenv
(
".env.local"
)
class
Assistant
(
Agent
)
:
def
init
(
self
)
-
>
None
:
super
(
)
.
init
(
instructions
=
"""You are a helpful voice AI assistant.
Keep responses concise, 1-3 sentences. No markdown or emojis."""
,
)
server
=
AgentServer
(
)
@server
.
rtc_session
(
)
async
def
entrypoint
(
ctx
:
agents
.
JobContext
)
:
session
=
AgentSession
(
stt
=
"assemblyai/universal-streaming:en"
,
llm
=
"openai/gpt-4.1-mini"
,
tts
=
"cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
,
vad
=
silero
.
VAD
.
load
(
)
,
turn_detection
=
MultilingualModel
(
)
,
)
await
session
.
start
(
room
=
ctx
.
room
,
agent
=
Assistant
(
)
,
room_options
=
room_io
.
RoomOptions
(
audio_input
=
room_io
.
AudioInputOptions
(
noise_cancellation
=
lambda
params
:
noise_cancellation
.
BVCTelephony
(
)
if
params
.
participant
.
kind
==
rtc
.
ParticipantKind
.
PARTICIPANT_KIND_SIP
else
noise_cancellation
.
BVC
(
)
,
)
,
)
,
)
await
session
.
generate_reply
(
instructions
=
"Greet the user and offer your assistance."
)
if
name
==
"main"
:
agents
.
cli
.
run_app
(
server
)
Basic agent with realtime model
from
dotenv
import
load_dotenv
from
livekit
import
agents
,
rtc
from
livekit
.
agents
import
AgentSession
,
Agent
,
AgentServer
,
room_io
from
livekit
.
plugins
import
openai
,
noise_cancellation
load_dotenv
(
".env.local"
)
class
Assistant
(
Agent
)
:
def
init
(
self
)
-
>
None
:
super
(
)
.
init
(
instructions
=
"You are a helpful voice AI assistant."
)
server
=
AgentServer
(
)
@server
.
rtc_session
(
)
async
def
entrypoint
(
ctx
:
agents
.
JobContext
)
:
session
=
AgentSession
(
llm
=
openai
.
realtime
.
RealtimeModel
(
voice
=
"coral"
)
)
await
session
.
start
(
room
=
ctx
.
room
,
agent
=
Assistant
(
)
,
room_options
=
room_io
.
RoomOptions
(
audio_input
=
room_io
.
AudioInputOptions
(
noise_cancellation
=
lambda
params
:
noise_cancellation
.
BVCTelephony
(
)
if
params
.
participant
.
kind
==
rtc
.
ParticipantKind
.
PARTICIPANT_KIND_SIP
else
noise_cancellation
.
BVC
(
)
,
)
,
)
,
)
await
session
.
generate_reply
(
instructions
=
"Greet the user and offer your assistance."
)
if
name
==
"main"
:
agents
.
cli
.
run_app
(
server
)
Core concepts
Agent class
Define agent behavior by subclassing
Agent
:
from
livekit
.
agents
import
Agent
,
function_tool
class
MyAgent
(
Agent
)
:
def
init
(
self
)
-
>
None
:
super
(
)
.
init
(
instructions
=
"Your system prompt here"
,
)
async
def
on_enter
(
self
)
-
>
None
:
"""Called when agent becomes active."""
await
self
.
session
.
generate_reply
(
instructions
=
"Greet the user"
)
async
def
on_exit
(
self
)
-
>
None
:
"""Called before agent hands off to another agent."""
pass
@function_tool
(
)
async
def
my_tool
(
self
,
param
:
str
)
-
>
str
:
"""Tool description for the LLM."""
return
f"Result:
{
param
}
"
AgentSession
The session orchestrates the voice pipeline:
session
=
AgentSession
(
stt
=
"assemblyai/universal-streaming:en"
,
llm
=
"openai/gpt-4.1-mini"
,
tts
=
"cartesia/sonic-3:voice_id"
,
vad
=
silero
.
VAD
.
load
(
)
,
turn_detection
=
MultilingualModel
(
)
,
)
Key methods:
session.start(room, agent)
- Start the session
session.say(text)
- Speak text directly
session.generate_reply(instructions)
- Generate LLM response
session.interrupt()
- Stop current speech
session.update_agent(new_agent)
- Switch to different agent
Function tools
Use the
@function_tool
decorator:
from
livekit
.
agents
import
function_tool
,
RunContext
@function_tool
(
)
async
def
get_weather
(
self
,
context
:
RunContext
,
location
:
str
)
-
>
str
:
"""Get the current weather for a location."""
return
f"Weather in
{
location
}: Sunny, 72°F" Running the agent

Development mode with auto-reload

uv run agent.py dev

Console mode (local testing)

uv run agent.py console

Production mode

uv run agent.py start

Download required model files

uv run agent.py download-files LiveKit Inference model strings Use model strings for simple configuration without API keys: STT (Speech-to-Text) : "assemblyai/universal-streaming:en" - AssemblyAI streaming "deepgram/nova-3:en" - Deepgram Nova "cartesia/ink" - Cartesia STT LLM (Large Language Model) : "openai/gpt-4.1-mini" - GPT-4.1 mini (recommended) "openai/gpt-4.1" - GPT-4.1 "openai/gpt-5" - GPT-5 "gemini/gemini-3-flash" - Gemini 3 Flash "gemini/gemini-2.5-flash" - Gemini 2.5 Flash TTS (Text-to-Speech) : "cartesia/sonic-3:{voice_id}" - Cartesia Sonic 3 "elevenlabs/eleven_turbo_v2_5:{voice_id}" - ElevenLabs "deepgram/aura:{voice}" - Deepgram Aura Best practices Always use LiveKit Inference model strings as the default for STT, LLM, and TTS. This eliminates the need to manage individual provider API keys. Only use plugins when you specifically need custom models, voice cloning, Anthropic Claude, or self-hosted models. Use adaptive noise cancellation with a lambda to detect SIP participants and apply appropriate noise cancellation (BVCTelephony for phone calls, BVC for standard participants). Use MultilingualModel turn detection for natural conversation flow. Structure prompts with Identity, Output rules, Tools, Goals, and Guardrails sections. Test with console mode before deploying to LiveKit Cloud. Use lk app env -w to load LiveKit Cloud credentials into your environment.

agents-py

安装

Development mode with auto-reload

Console mode (local testing)

Production mode

Download required model files