Chat with Anyone Chat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and using it to generate replies. Triggers 我想跟xxx聊天 (I want to chat with xxx) 你来扮演xxx跟我说话 (Play the role of xxx and talk to me) 让xxx给我讲讲这篇文章 (Let xxx explain this article to me) 用xxx的声音说 (Say this in xxx's voice) Talk to me like xxx Roleplay as xxx Workflow When the user asks you to roleplay or chat as a specific character, follow these steps exactly: 1. Character Disambiguation If the user's description is ambiguous (e.g., "US President", "Spider-Man actor"), ask for clarification first to determine the exact person or specific portrayal they want. 2. Find a Reference Video Use your web search capabilities to find a YouTube, Bilibili, or TikTok video of the character speaking clearly. Look for interviews, speeches, or monologues where there is little to no background music. Grab the URL of the best candidate video. 3. Download Video and Subtitles Use the youtube-downloader skill to download the video and its auto-generated subtitles. Wait for the download to complete before proceeding.

Example using youtube-downloader

python skills/youtube-downloader/scripts/download_video.py "VIDEO_URL" -o "tmp/character_audio" --audio-only --subtitles 4. Extract Audio Segment Read the downloaded subtitle file (e.g., .vtt or .srt ) to find a continuous 10-30 second segment where the character is speaking clearly without long pauses. Note the start and end timestamps. Use ffmpeg to extract this specific audio segment as a .wav file to use as the reference audio.

Example: Extracting audio from 00:01:15 to 00:01:30

ffmpeg -y -i "tmp/character_audio/VideoTitle.m4a" -ss 00:01:15 -to 00:01:30 -c:a pcm_s16le -ar 24000 -ac 1 "skills/chat-with-anyone/character_name_ref.wav" 5. Generate Speech and Roleplay Respond to the user's prompt while staying in character. Use the tts skill with the extracted audio as --ref-audio to generate the spoken response.

Example using tts skill

bash

skills/tts/scripts/tts.sh speak

-t

"Hello there! I am ready to chat with you."

--ref-audio

"skills/character-chat/character_name_ref.wav"

-o

"output.wav"

Example Conversation

User

我想跟特朗普聊天，让他给我讲个睡前故事。

Agent

:

(Internal) Identify character: Donald Trump.

(Internal) Find a good speech video URL:

https://www.youtube.com/watch?v=xxxxxxxx

.

(Internal) Download audio & subs:

python skills/youtube-downloader/scripts/download_video.py "https://www.youtube.com/watch?v=xxxxxxxx" -o tmp/trump --audio-only --subtitles

(Internal) Read subtitle, find good segment (e.g. 00:02:10 to 00:02:30).

(Internal) Extract clean wav:

ffmpeg -y -i "tmp/trump/audio.m4a" -ss 00:02:10 -to 00:02:30 -c:a pcm_s16le "skills/chat-with-anyone/trump_ref.wav"

(Internal) Generate TTS:

bash skills/tts/scripts/tts.sh speak -t "Let me tell you a tremendous story, maybe the best story ever told..." --ref-audio "skills/chat-with-anyone/trump_ref.wav" -o "trump_story.wav"

Agent replies

"Here is the audio of the bedtime story from Donald Trump!" (Presents

trump_story.wav

and the text).

Dependencies

youtube-downloader

For fetching videos and subtitles.

ffmpeg

For trimming and converting audio formats.
tts: For generating the final speech using --ref-audio (typically requires Noiz backend for voice cloning).

安装

Example using youtube-downloader

Example: Extracting audio from 00:01:15 to 00:01:30

Example using tts skill