91.4% WER improvement · 12 Indian languages · 2-hour audio in 8 minutes
Overview
A government intelligence unit needed a secure system to transcribe and translate audio from 12 Indian languages — including dialects — without any data leaving their secure facility.
Challenge
Commercial ASR services (Google, AWS, Azure) were prohibited. Existing open-source models had poor accuracy for Indian language dialects, particularly regional variants of Hindi, Tamil, and Telugu. The system needed to process both clear audio and low-quality field recordings.
Solution
We fine-tuned OpenAI's Whisper model on a proprietary dataset of 200 hours per language, with particular focus on regional dialects. A custom preprocessing pipeline improved accuracy on low-SNR recordings by 18%. Translation used a locally-deployed multilingual LLM. Deployed on air-gapped servers.
Outcome
91.4% average word error rate improvement over base Whisper across 12 languages. Processes a 2-hour briefing in under 8 minutes. 12 language pairs supported. Fully air-gapped — zero external API calls.