An LP in nineteen Indian languages, pressed for developers. Open-source, low-latency, expressive — speech in milliseconds.
Svara is an open-source text-to-speech model trained on a corpus of nineteen Indian languages, including the under-resourced: Magahi, Maithili, Bhojpuri, Bodo, Dogri. Apache 2.0. Self-host the weights, or call our edge network and skip the GPUs altogether.
The model runs at sub-200 millisecond average latency end-to-end, comfortably under the threshold where conversation begins to feel turn-taking rather than transactional. Six emotion tags — happy, sad, anger, fear, surprise, clear — can be requested at the prompt level without retraining.
Fine-tuning is supported with a few hours of audio. Voice cloning will arrive in the v2 pressing later this year. 3B parameters, distilled from a Llama backbone, with discrete audio tokens via the Orpheus tokenizer.
Built by Kenpath Labs in Bengaluru. Trained on SYSPIN, RASA, IndicTTS, and SPICOR datasets. Free for the first ten thousand characters per month — no card on file.
# pip install openai from openai import OpenAI client = OpenAI( base_url="https://api.svara.ai/v1", api_key="sk-svara-...", ) audio = client.audio.speech.create( model="svara-v1-indic", voice="asha", input="आज का दिन तो सच में बहुत ख़ास है", ) audio.stream_to_file("hello.wav")