svara-global-v1 is a 780M-parameter open-weights speech model that holds one voice across 50+ languages, with emotion as a first-class input. Apache 2.0. Streaming sub-200ms. Production-ready today.
svara-global-v1 is the first sub-billion-parameter TTS model to hold a single, recognisable voice across 50+ languages — without re-training, voice-bank stitching, or per-language fine-tunes. Inline tags steer emotion and prosody at sentence level.
Each voice is a single conditioning vector — every voice can speak every language, with every emotion, at every speed. Tap any voice to hear it.
2,400 blind A/B comparisons across listeners in 14 countries, scored on a 1.0 rating scale. svara-global-v1 (780M params) is ranked above models 2–4× its size.
Pair Svara with any ASR. Identity carries across the chain — the listener hears the same person, in their language, in real time. Ideal for support, telehealth, and global product calls.
A compact transformer stack with shared cross-lingual conditioning, decoded into mel-spectrograms, then a small streaming vocoder. The full chain runs in well under real-time on consumer hardware.
Open weights on Hugging Face, hosted API for the production path. Free credits to start; pay only for what you stream.