Turn Any Photo Into a
Talking Avatar

Upload a portrait and audio — get a lip-synced video in seconds. Real-time conversational avatars via a simple REST API.

<30s

Generation Time

50MB

Max File Size

REST

Simple API

How It Works

Three simple steps to bring any portrait to life

Send a portrait image and an audio file via the API. Supports PNG, JPG, WAV, and MP3.

SadTalker runs on serverless GPUs to generate precise lip-sync from your audio waveform.

Get back a video URL with your avatar speaking naturally. Stored on Cloudflare R2.

Upload a portrait photo and audio to generate a talking avatar video

Portrait Image

PNG or JPG, face clearly visible

Audio File

WAV or MP3, speech audio

Simple REST API — two endpoints is all you need

POST/v1/generate

Submit a portrait image and audio file for lip-sync generation. Accepts multipart/form-data with fields image and audio.

Request

curl -X POST https://avatar.12brain.org/api/v1/generate \
  -F "image=@portrait.png" \
  -F "audio=@speech.wav"

Response

{
  "job_id": "abc123-def456",
  "status": "IN_QUEUE"
}

GET/v1/status/:jobId

Poll job status. Returns IN_QUEUE,IN_PROGRESS,COMPLETED, orFAILED.

Request

curl https://avatar.12brain.org/api/v1/status/abc123-def456

Response (completed)

{
  "status": "COMPLETED",
  "output": {
    "video_url": "https://r2.avatarbrain.com/outputs/abc123.mp4"
  }
}