Svara · V6 · Studio · API

Studio · for creators

Write a script. Pick a voice. Ship.

A browser studio with 30+ emotion tags, multi-track timelines, and real-time previews. No installs, no GPUs, no audio engineering.

From $9 · 1M chars / mo

Open Studio →

API · for developers

One endpoint. Any language. Same voice.

Pass voice="aria" with text in any of 50+ languages — even mixed mid-sentence. Emotion tags inline. Streams from token zero.

# pip install svara
from svara import Client
c = Client()
audio = c.speak(
  text="नमस्ते, world! [laugh] こんにちは.",
  voice="aria",
  emotion="warm",
  stream=True,
)

From $0.030 / 1K chars · 187ms p50

Read API docs →

Enterprise · self-hosted

Your voices, your VPC, your weights.

Apache 2.0 weights, 1.5GB on disk, runs on a single T4 or quantized to 420MB on CPU. Fine-tune your own talents. Sign deepfake-detection terms. Air-gapped if you need it.

Single-tenant deployment✓

SOC 2 · HIPAA · GDPR DPA✓

Custom voice cloning (1 min audio)✓

Watermark + provenance✓

Dedicated support✓

Annual contract · custom

Talk to enterprise →

Built around four product decisions.

Most TTS models force you to pick a language first, then a voice. Svara flips that — pick a voice, speak in any of 50+ languages with the same identity. The rest follows.

⌁

780M parameters, open weights

Smaller than the category by 3–5×. Runs on a single T4 in 18% real-time, or quantized to 420MB on a CPU. Fine-tune on a laptop.

◍

Same voice, 50+ languages

Aria sounds like Aria in English, Hindi, Japanese, or all three in one sentence. No swapping voices when you switch language — the model handles code-switch natively.

♥

Emotion tags in the prompt

30+ tags — [happy] [whisper] [sigh] [speed=1.2] — written inline. No emotion sliders, no separate API.

⟶

Streaming from token zero

First audio byte at ~80ms. Pipe directly into a phone call, an LLM, or a browser <audio> tag.

⛯

One endpoint

No language router, no voice ID lookup, no emotion API. POST /v1/audio/speech takes text + voice + emotion. That's it.

Apache 2.0

Use the weights commercially, modify, redistribute. Watermarking and a deepfake-detector ship alongside.

One small voice model.
Every language. Every voice.