Svara · V8 · Prompt is the interface

The prompt is the product. Write performance. Read speech.

No emotion sliders. No language picker. No voice ID lookup. 30+ inline tags — emotion, nonverbal, prosody, language — written into your text the way a director writes stage directions. The 780M model reads them and performs.

example.txt · 312 chars · 3 languages · 5 tags

tap any chip to hear that segment ↓ ▶ Play all

[lang=en] Hey there — welcome back to The Local Frequency. [excited] We have got a really special show today, [laugh] I am still a little jet-lagged from the flight in. [pause=400ms] [lang=hi] तो चलिए शुरू करते हैं — [speed=0.95] सबसे पहले हमारे आज के मेहमान का परिचय। [lang=ja] [warm] 東京から、ようこそ。 [whisper] ……what a journey.

🇺🇸

Aria · en-US · same voice across all three languages · 187ms · 780M

The full tag DSL

Four families. Inline. Compose anywhere. Same model reads them in any of 50+ languages.

Language50+

[lang=en] [lang=hi] [lang=ja] [lang=es] [lang=fr] [lang=auto]

Emotion18

Nonverbal8

Prosody8

Svara · what you write

"Hey there — welcome back. [excited] We have a special show today, [laugh] still jet-lagged. [pause=400ms] [lang=hi] तो चलिए शुरू करते हैं। [lang=ja] [warm] 東京から、ようこそ。 [whisper] ......what a journey."

Other providers · what you write

Three separate API calls — one per language, with three different voice IDs.
Stitch the audio yourself, hope the levels match.
For emotion: an emotion_id param. One emotion per call. No [whisper] mid-sentence.
For pauses: SSML <break time="400ms"/> in some providers, undocumented in others.
For nonverbals like [laugh]: not supported. Generate, edit in Audacity.

The prompt is the product. Write performance. Read speech.

The full tag DSL

Svara · what you write

Other providers · what you write

Why a single tag-aware model.

Stop wiring three APIs. Write one prompt.