Credits · Audio

The voice behind the Sanskrit recitation

Every shloka you can play and hear recited on Wisdom is synthesized, not recorded by a human reciter. Here’s the model that makes it sound the way it does, and the person who built it.

The audio playback next to each Sanskrit verse on this site is generated with Vāgdhenu, an open-source text-to-speech system built specifically for Sanskrit chant recitation (pārāyaṇa).

What Vāgdhenu is

Vāgdhenu synthesizes Devanagari text into chanted Sanskrit audio, tuned for the cadence of recitation rather than conversational speech. In evaluation by expert listeners it scored a mean opinion score of roughly 4.6.

Base model
Fine-tuned from AI4Bharat’s IndicF5 / F5-TTS — a flow-matching Diffusion Transformer (~337M params) for mel-spectrogram infilling, with Sanskrit routed through Kannada script representation.
Vocoder
A fine-tuned NVIDIA BigVGAN-v2, adapted for extended vowel rendering typical of chant.
Training data
A roughly 5-hour single-speaker Sanskrit chant corpus, with additional voice-steering retraining on paired clips.
License
Apache-2.0 for Vāgdhenu’s own contributions, built on AI4Bharat IndicF5 (MIT) and NVIDIA BigVGAN-v2 components.

Prosody in Vāgdhenu is reference-driven rather than freely designable, so recitation pacing follows the reference clips used during fine-tuning rather than being generated from scratch for every verse.

Credit

Vāgdhenu was created by prathoshap and released openly on Hugging Face, with an accompanying GitHub repository and dataset. Wisdom uses it as-is for verse recitation audio; all credit for the model and research belongs to its creator.