Turn Any Photo Into a
Talking Avatar
Upload a portrait and audio — get a lip-synced video in seconds. Real-time conversational avatars via a simple REST API.
How It Works
Three simple steps to bring any portrait to life
1. Upload
Send a portrait image and an audio file via the API. Supports PNG, JPG, WAV, and MP3.
2. Process
SadTalker runs on serverless GPUs to generate precise lip-sync from your audio waveform.
3. Receive
Get back a video URL with your avatar speaking naturally. Stored on Cloudflare R2.
Live Demo
Upload a portrait photo and audio to generate a talking avatar video
Portrait Image
PNG or JPG, face clearly visible
Audio File
WAV or MP3, speech audio
API Reference
Simple REST API — two endpoints is all you need
Submit a portrait image and audio file for lip-sync generation. Accepts multipart/form-data with fields image and audio.
curl -X POST https://avatar.12brain.org/api/v1/generate \
-F "image=@portrait.png" \
-F "audio=@speech.wav"{
"job_id": "abc123-def456",
"status": "IN_QUEUE"
}Poll job status. Returns IN_QUEUE,IN_PROGRESS,COMPLETED, orFAILED.
curl https://avatar.12brain.org/api/v1/status/abc123-def456{
"status": "COMPLETED",
"output": {
"video_url": "https://r2.avatarbrain.com/outputs/abc123.mp4"
}
}Limits & Notes
- •Max file size: 50MB per file (image + audio)
- •Supported image formats: PNG, JPG/JPEG
- •Supported audio formats: WAV, MP3
- •GPU cold-start may add 30-60s on first request
- •Videos stored on Cloudflare R2 — URLs are permanent