75-day delivery · 3 days → 8 minutes document processing · 94% extraction accuracy
Overview
A government agency managing thousands of classified documents needed an AI system to search, summarise, cross-reference, and extract insights — without any document ever leaving their secure network.
Challenge
Commercial LLM APIs (OpenAI, Anthropic) were off-limits due to data residency requirements. All processing had to occur on-premise using open-weight models. The system also needed to handle multi-format documents: PDFs, scanned images, handwritten forms.
Solution
We deployed Llama 3 70B quantised to GGUF on an air-gapped GPU cluster. A custom RAG pipeline with hybrid search (BM25 + vector) was built over the document corpus using pgvector. OCR preprocessing via Tesseract and Whisper ASR for audio transcriptions. Built in 75 days from brief to production.
Outcome
Processing time for cross-referencing a 500-page classified report: reduced from 3 days to 8 minutes. 94% accuracy on information extraction benchmarks. Zero document egress since deployment. Handling 200+ queries per day.