Powered by SadTalker + RunPod Serverless

Turn Any Photo Into a
Talking Avatar

Upload a portrait and audio — get a lip-synced video in seconds. Real-time conversational avatars via a simple REST API.

<30s
Generation Time
50MB
Max File Size
REST
Simple API

How It Works

Three simple steps to bring any portrait to life

1. Upload

Send a portrait image and an audio file via the API. Supports PNG, JPG, WAV, and MP3.

2. Process

SadTalker runs on serverless GPUs to generate precise lip-sync from your audio waveform.

3. Receive

Get back a video URL with your avatar speaking naturally. Stored on Cloudflare R2.

Live Demo

Upload a portrait photo and audio to generate a talking avatar video

Portrait Image

PNG or JPG, face clearly visible

Audio File

WAV or MP3, speech audio

API Reference

Simple REST API — two endpoints is all you need

POST/v1/generate

Submit a portrait image and audio file for lip-sync generation. Accepts multipart/form-data with fields image and audio.

Request
curl -X POST https://avatar.12brain.org/api/v1/generate \
  -F "image=@portrait.png" \
  -F "audio=@speech.wav"
Response
{
  "job_id": "abc123-def456",
  "status": "IN_QUEUE"
}
GET/v1/status/:jobId

Poll job status. Returns IN_QUEUE,IN_PROGRESS,COMPLETED, orFAILED.

Request
curl https://avatar.12brain.org/api/v1/status/abc123-def456
Response (completed)
{
  "status": "COMPLETED",
  "output": {
    "video_url": "https://r2.avatarbrain.com/outputs/abc123.mp4"
  }
}

Limits & Notes

  • Max file size: 50MB per file (image + audio)
  • Supported image formats: PNG, JPG/JPEG
  • Supported audio formats: WAV, MP3
  • GPU cold-start may add 30-60s on first request
  • Videos stored on Cloudflare R2 — URLs are permanent