Defence & AI2024· Classified Government Client — India

Project SCRIBE

Multilingual AI Transcription & Translation

On-premise AI transcription and translation system supporting 12 Indian languages for classified government briefings and intercepts.

91.4% WER improvement · 12 Indian languages · 2-hour audio in 8 minutes

Whisper Fine-tuningMultilingual NLPPyTorchAir-gapped12 LanguagesFastAPI

Overview

A government intelligence unit needed a secure system to transcribe and translate audio from 12 Indian languages — including dialects — without any data leaving their secure facility.

Challenge

Commercial ASR services (Google, AWS, Azure) were prohibited. Existing open-source models had poor accuracy for Indian language dialects, particularly regional variants of Hindi, Tamil, and Telugu. The system needed to process both clear audio and low-quality field recordings.

Solution

We fine-tuned OpenAI's Whisper model on a proprietary dataset of 200 hours per language, with particular focus on regional dialects. A custom preprocessing pipeline improved accuracy on low-SNR recordings by 18%. Translation used a locally-deployed multilingual LLM. Deployed on air-gapped servers.

Outcome

91.4% average word error rate improvement over base Whisper across 12 languages. Processes a 2-hour briefing in under 8 minutes. 12 language pairs supported. Fully air-gapped — zero external API calls.

More Work

Project SENTINEL

Acoustic Intelligence & Threat Classification

All case studies Start a project