SVARA · v8 · prompt is the interface

The prompt is the product. Write performance. Read speech.

No emotion sliders. No language picker. No voice ID lookup. 30+ inline tags — emotion, nonverbal, prosody, language — written into your text the way a director writes stage directions. The 780M model reads them and performs.

example.txt · 312 chars · 3 languages · 5 tags
tap any chip to hear that segment ↓ ▶ Play all

[lang=en] Hey there — welcome back to The Local Frequency. [excited] We have got a really special show today, [laugh] I am still a little jet-lagged from the flight in. [pause=400ms] [lang=hi] तो चलिए शुरू करते हैं — [speed=0.95] सबसे पहले हमारे आज के मेहमान का परिचय। [lang=ja] [warm] 東京から、ようこそ。 [whisper] ……what a journey.

🇺🇸
Aria · en-US · same voice across all three languages · 187ms · 780M

The full tag DSL

Four families. Inline. Compose anywhere. Same model reads them in any of 50+ languages.

Language50+
[lang=en] [lang=hi] [lang=ja] [lang=es] [lang=fr] [lang=auto]
Emotion18
Nonverbal8
Prosody8

Svara · what you write

"Hey there — welcome back. [excited] We have a special show today, [laugh] still jet-lagged. [pause=400ms] [lang=hi] तो चलिए शुरू करते हैं। [lang=ja] [warm] 東京から、ようこそ。 [whisper] ......what a journey."

Other providers · what you write

  • Three separate API calls — one per language, with three different voice IDs.
  • Stitch the audio yourself, hope the levels match.
  • For emotion: an emotion_id param. One emotion per call. No [whisper] mid-sentence.
  • For pauses: SSML <break time="400ms"/> in some providers, undocumented in others.
  • For nonverbals like [laugh]: not supported. Generate, edit in Audacity.

Why a single tag-aware model.

svara-global-v1 · 780M params · Apache 2.0
780M
parameters · 1.5GB on disk
50+
languages, in every voice
30+
inline tags · one DSL
187ms
P50 streaming, edge

Stop wiring three APIs. Write one prompt.

100K characters / month free. Apache 2.0 weights if you want to self-host. No card.

Start free → Read the tag spec Apache weights on HF