An AI-powered multilingual voice agent that enables citizens to access government services through natural phone conversations. Built on ElevenLabs Conversational AI with a custom LangGraph-based agent backend, RAG-powered knowledge retrieval, and real-time analytics dashboard.
Government call centers are overwhelmed. The US Social Security Administration handles 93.5 million calls per year with average wait times reaching 99 minutes (ssa.gov). UK's HMRC receives over 38 million calls annually with only 71.5% answered (gov.uk). The US federal government spent $4 billion on call center contracts over five years (GAO-20-291).
When governments try to automate, they typically deploy IVR systems β "press 1 for passport, press 2 for ID card." Citizens hate these. The moment someone realizes they're talking to a recorded menu, they stop trying to solve their problem and start mashing buttons to reach a human operator. The frustration compounds the wait time problem rather than solving it.
The missing piece isn't automation β it's conversational automation. An agent that understands natural speech, responds intelligently, and actually resolves the issue instead of routing through a decision tree.
This project builds an always-on voice agent that handles citizen inquiries autonomously through natural conversation. Citizens call in, authenticate securely, and get immediate answers β whether they're checking an application status, booking an appointment, or asking about required documents. The agent speaks English and Turkish, authenticates callers securely, retrieves answers from an official knowledge base, takes real actions (booking, document requests), and escalates to humans when needed.
ββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββββββ
β β β β β LangGraph Agent β
β Citizen βββββ>β ElevenLabs βββββ>β β
β (Phone) β<βββββ Voice Layer β<βββββ ββββββββββββββββββββ β
β β β β β β Intent Classify β β
ββββββββββββββββ β - STT (Speech β β β Service Router β β
β to Text) β β β Status Check β β
β - TTS (Text β β β Appointment Book β β
β to Speech) β β β Appointment List β β
β - Workflow β β β Appointment Cancelβ β
β (Auth Gate) β β β Document Request β β
β - Language β β β Document Status β β
β Detection β β β FAQ Answer (RAG) β β
ββββββββββββββββββββ β β Complaint β β
β² β β Escalate β β
Custom LLM Endpoint β ββββββββββββββββββββ β
(FastAPI + SSE) ββββββββββ¬ββββ¬ββββ¬βββββββββ
β β β
βββββββββββββββββββββββββββ β ββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Gov Backend β β Pinecone β β Streamlit β
β (FastAPI) β β Vector Store β β Dashboard β
β β β β β (9 pages) β
β - /auth/verify β β - Gov KB β β β
β - /applicationsβ β (40 docs) β β - Overview β
β - /appointmentsβ β - ElevenLabs β β - Intents β
β - /documents β β Docs (514 β β - Auth & Lang β
β - /services β β chunks) β β - Knowledge Gaps β
β - /handoff β β β β - Conv. Flows β
β - Citizen DB β β β β - Node Details β
ββββββββββββββββββ ββββββββββββββββββββ β - System Health β
β - AI Insights β
β - Logs β
ββββββββββββββββββββ
For a detailed architecture breakdown with sequence diagrams, see ARCHITECTURE.md. For key design decisions and tradeoffs, see DECISIONS.md.
The voice interface that citizens interact with. Handles speech-to-text, text-to-speech, and voice model selection. Supports English and Turkish with automatic language detection and dynamic voice switching. Connected to our custom LLM backend via ElevenLabs' Custom LLM endpoint.
Key platform-native features used:
- Workflow engine β deterministic auth gating via dispatch tool + subagent isolation (not LLM-based)
- Language detection β system tool that auto-detects caller language and switches voice model
- Language presets β single agent with EN primary + TR preset, one endpoint for both languages
- Native knowledge base β pre-auth FAQ handled by ElevenLabs KB (no Custom LLM needed)
- Backup LLM β if our Custom LLM server is unreachable, ElevenLabs falls back to its native LLM (Level 2 graceful degradation)
The brain of the system. A multi-step agent built with LangGraph that manages the entire post-auth conversation flow. Unlike a simple prompt-response chatbot, this agent classifies intent from full conversation history, makes autonomous decisions about which tools to call, and handles complex multi-intent conversations where a citizen might check their application status, book a follow-up appointment, and request a document β all in one call.
Key capabilities:
- 11-node graph β intent_classify, service_router, status_check, appointment_book/list/cancel, document_request/status, faq_answer, complaint, escalate
- Deterministic tool chaining β status "additional_docs_needed" auto-triggers RAG document lookup, "rejected" auto-generates appeal guidance. This is what ElevenLabs native agents can't guarantee
- Stateless message passing β ElevenLabs sends full conversation history each turn, no server-side persistence needed
- Sentence-level SSE streaming β responses split by sentence with delays for natural TTS delivery
- Smart service detection β detects service type and document type from conversation context, asks clarifying questions when ambiguous
- Gender-based honorifics β "Mr. John" / "Ms. Sarah" in English, "Ahmet Bey" / "Fatma Hanim" in Turkish
Citizens must verify their identity before accessing personal information. The system uses an STT-friendly approach:
- Last 4 digits of TC Kimlik (National ID) β not the full 11-digit number
- Date of birth β day, month, year
- Father's name initial β single letter
The authentication flow follows ElevenLabs' Workflow best practices for secure caller identity verification: deterministic dispatch tool (webhook) routes to authenticated or retry subagent based on boolean result β no LLM inference involved in the auth decision. KVKK (Turkish GDPR) compliant β PII redaction in all logs, TC Kimlik stored as SHA-256 hash, audit trail with no raw PII.
The system degrades gracefully across three levels, ensuring citizens always get some level of service:
| Level | Condition | Behavior |
|---|---|---|
| 0 β Healthy | Normal operation | Full LangGraph agent with all nodes and tool chaining |
| 1 β Degraded | 3 consecutive LangGraph failures | Circuit breaker opens, direct OpenAI call (bypass graph). Auto-retries after 60s cooldown |
| 2 β Down | Custom LLM server unreachable | ElevenLabs falls back to native LLM via backup_llm_config |
The /health endpoint reports current degradation level and consecutive failure count for monitoring.
A retrieval-augmented generation pipeline that gives the agent access to government service documentation:
- 40 documents across 5 categories (passport, ID card, driver's license, civil registry, general) x 4 doc types x 2 languages
- 219 chunks embedded via OpenAI text-embedding-3-small into Pinecone serverless index
- 514 ElevenLabs documentation chunks in a separate Pinecone index for platform-aware AI insights
Pre-auth FAQ handled by ElevenLabs native knowledge base (platform mastery). Post-auth tool chaining uses Pinecone for context-aware retrieval (status_check β RAG for required docs / appeal rights).
A FastAPI-based simulation of real government backend systems with full CRUD operations:
- 7 API endpoints β application status, appointment booking/listing/cancellation, document request/status, service catalog
- 23 citizens with realistic data (Turkish + English names, multiple applications per citizen)
- 26 applications across 4 service types and 5 statuses
- 5 appointments and 3 document requests as seed data
- Conflict detection β prevents duplicate appointments for the same service
- Edge case handling β past date rejection, slot availability checks
A 9-page Streamlit dashboard with sidebar navigation providing real-time visibility into agent performance:
| Page | What it shows |
|---|---|
| Overview | KPIs (total requests, conversations, avg response, resolution rate, escalation rate), call volume trends, turns per conversation |
| Intents & Performance | Intent distribution, per-intent response time, Min/P50/P95/Max latency, node-level timing breakdown |
| Auth & Language | Auth success rate, failure reasons, auth methods, language distribution |
| Knowledge Gaps | RAG score distribution, low-score queries, content improvement recommendations |
| Conversation Flows | Sankey diagram of intent transitions, most common paths, entry/exit points, conversation depth |
| Node Details | Per-node execution count and timing, tool chain analysis, API call volume by node |
| System Health | Anomaly detection (escalation spikes, latency, auth failures, volume), threshold monitoring |
| AI Insights | GPT-4o + ElevenLabs docs RAG powered recommendations |
| Logs | Recent conversation logs with all fields |
| Component | Technology | Why |
|---|---|---|
| Voice Interface | ElevenLabs Conversational AI | Industry-leading voice quality, Workflow engine, Custom LLM integration |
| Agent Framework | LangGraph | Stateful graph-based workflows β conditional branching, deterministic tool chaining |
| LLM | OpenAI GPT-4o | Strong multilingual performance, reliable intent classification |
| Voice-Agent Bridge | FastAPI | Async streaming support, low latency, SSE for ElevenLabs Custom LLM |
| Vector Database | Pinecone (Serverless) | Managed service, metadata filtering, fast semantic search |
| Embeddings | OpenAI text-embedding-3-small | Good balance of quality and cost for multilingual documents |
| Database | SQLite (demo) / PostgreSQL (production) | SQLAlchemy ORM for easy migration |
| Dashboard | Streamlit | Rapid development, plotly integration for Sankey diagrams |
Citizen: (calls in)
Agent: Hello, welcome to Citizen Services. I'm Umut, how can I
help you today?
Citizen: I want to check my application status.
Agent: I need to ask you three quick questions to verify your
identity. Could you please tell me the last four digits
of your ID number?
Citizen: Zero zero one eight.
Agent: Could you please tell me your date of birth?
Citizen: March fifteenth, nineteen ninety.
Agent: And the first letter of your father's name?
Citizen: M.
Agent: Verification complete. Mr. Ahmet, you have two applications
on file. Your passport application is under review. Your
ID card application is approved. Which one would you like
more details about?
Citizen: The passport one.
Agent: Mr. Ahmet, your passport application is currently under
review. Biometric verification is complete. Estimated
completion is five to ten business days. Is there anything
else I can help you with?
Citizen: Yes, I'd also like to book an appointment.
Agent: Mr. Ahmet, which service would you like to book an
appointment for? Passport, ID card, driver's license,
or civil registry?
Citizen: Driver's license.
Agent: Mr. Ahmet, your driver's license appointment has been
booked. Date: April seventh, Time: ten AM, Location:
Kadikoy Office. Please bring your ID card and any
required documents.
Citizen: That's all, thank you.
Agent: You're welcome, have a great day!
Vatandas: (arar)
Ajan: Merhaba, Vatandas Hizmetleri'ne hos geldiniz. Ben Umut,
size nasil yardimci olabilirim?
Vatandas: Basvurumun durumunu ogrenmek istiyorum.
Ajan: Kimliginizi dogrulamam icin size uc kisa soru soracagim.
TC Kimlik numaranizin son dort hanesini soyler misiniz?
Vatandas: Sifir sifir bir sekiz.
Ajan: Dogum tarihinizi gun, ay ve yil olarak soyler misiniz?
Vatandas: On bes Mart bin dokuz yuz doksan.
Ajan: Baba adinizin ilk harfini soyler misiniz?
Vatandas: M.
Ajan: Dogrulama tamamlandi. Ahmet Bey, sistemde iki basvurunuz
bulunuyor. Pasaport basvurunuz incelemede. Kimlik karti
basvurunuz onaylandi. Hangisi hakkinda detayli bilgi almak
istersiniz?
Vatandas: Pasaport.
Ajan: Ahmet Bey, pasaport basvurunuz inceleme asamasindadir.
Biyometrik dogrulama tamamlandi. Tahmini tamamlanma suresi
bes ila on is gunu. Baska bir konuda yardimci olabilir
miyim?
Vatandas: Evet, bir de randevu almak istiyorum.
Ajan: Ahmet Bey, hangi hizmet icin randevu almak istiyorsunuz?
Pasaport, kimlik karti, ehliyet veya nufus islemi?
Vatandas: Ehliyet.
Ajan: Ahmet Bey, ehliyet randevunuz olusturuldu. Tarih: yedi
Nisan, Saat: sabah on, Yer: Kadikoy Nufus Mudurlugu.
Lutfen nufus cuzdaninizi ve gerekli belgeleri yaninizda
getirin.
Vatandas: Baska bir sey yok, tesekkurler.
Ajan: Rica ederim, iyi gunler!
Gartner predicts conversational AI will reduce contact center agent labor costs by $80 billion in 2026 (gartner.com). ElevenLabs signed an MoU with Ukraine's Ministry of Digital Transformation (September 2025) to integrate voice AI into the Diia government portal β serving 21+ million users with 1.6 billion backend transactions (kmu.gov.ua). Turkey's e-Devlet gateway serves 66.75 million registered users (96% of population aged 15+) with 4.23 billion logins per year (turkiye.gov.tr) β voice AI is the natural next layer on top of this digital infrastructure.
This is a demo, but the architecture is designed with production deployment in mind:
- Multi-region voice deployment β ElevenLabs supports global edge deployment for low-latency voice in any region
- Horizontal scaling β FastAPI backend and LangGraph agent are stateless per-request, can scale behind a load balancer
- Knowledge base updates β re-indexing pipeline allows document updates without downtime
- Multi-tenant architecture β the same system can serve multiple government agencies with isolated knowledge bases and authentication backends
- Compliance β CORS policy, security headers (OWASP), rate limiting, GDPR right to erasure, PII redaction, SHA-256 hashed credentials, audit trail. ElevenLabs platform holds SOC 2 Type II, ISO 27001, HIPAA, and GDPR certifications (elevenlabs.io)
- Observability β ConversationLog tracks intent, timing, RAG scores, tool chains, and API calls per request
For ElevenLabs specifically, this project shows:
- Custom LLM integration β not just using the default agent, but connecting a sophisticated LangGraph backend via Custom LLM endpoint with SSE streaming and system tool forwarding
- Workflow mastery β deterministic auth gating via dispatch tool + subagent isolation, LLM condition edges for service routing, backward edges for auth retry
- Enterprise use case thinking β government services is a massive, underserved market for voice AI
- Security-first design β caller authentication is a core concern, not an afterthought. KVKK compliance, PII redaction, audit trails
- Full-stack ownership β voice layer, agent logic, API integrations, knowledge base, analytics dashboard, and test suite β all built end-to-end
- Multilingual capability β English + Turkish with architecture ready for more languages
- Production resilience β graceful degradation across 3 levels, circuit breaker pattern, health monitoring
- Measurable analytics β 9-page dashboard with Sankey flow visualization, node-level timing, tool chain analysis, knowledge gap detection
The current system is a solid foundation. The next evolution is making it self-improving through multi-agent intelligence:
- Multi-Agent Insight Engine β specialized agents (RAG Quality Agent, Performance Agent, Pattern Agent, Platform Agent) that analyze conversation data and produce actionable, platform-aware recommendations with source data attribution
- Sentiment Tracking β per-conversation sentiment analysis to detect citizen satisfaction trends and correlate with specific intents or nodes
- A/B Testing Framework β compare prompt versions, LLM models, and conversation strategies with statistical significance tracking
- Self-Improving Nodes β each LangGraph node monitors its own performance metrics and automatically adjusts prompts, thresholds, or routing logic based on observed patterns
- Predictive Analytics β peak hour prediction and cost optimization based on historical call patterns
These features require production-level conversation volume to be meaningful and are designed to activate as data accumulates.
- Python 3.11+
- An ElevenLabs account with Conversational AI access
- An OpenAI API key
- A Pinecone account (free tier)
git clone https://github.com/Automaticare/Government-Citizen-Services-Voice-Agent.git
cd Government-Citizen-Services-Voice-Agent
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Fill in your API keys β see .env.example for required variables# Seed the citizen database (23 citizens, 26 applications, 5 appointments, 3 doc requests)
python -m api.seed_data
# Start the server (single server β gov API mounted into agent server)
uvicorn agent.server:app --reload --port 8080
# Start ngrok tunnel (for ElevenLabs to reach our server)
ngrok http 8080
# Deploy agent to ElevenLabs (set CUSTOM_LLM_URL for voice integration)
CUSTOM_LLM_URL="https://your-ngrok-url.ngrok-free.dev" python -m agent.deploy
python -m agent.deploy --dry-run # Preview config without deploying
# Run analytics dashboard
python -m streamlit run dashboard/app.py
# Run tests (231 tests)
python -m pytest tests/ -vGovernment-Citizen-Services-Voice-Agent/
βββ agent/ # LangGraph agent core
β βββ config.py # AgentConfig dataclass (API keys, custom LLM URL)
β βββ deploy.py # Deploy agent to ElevenLabs (EN primary + TR preset)
β βββ graph.py # LangGraph StateGraph definition (11 nodes)
β βββ logging_config.py # PII redaction logging (KVKK compliance)
β βββ server.py # Custom LLM proxy β SSE streaming, circuit breaker, analytics logging
β βββ state.py # AgentState TypedDict (incl. node detail tracking)
β βββ nodes/ # Graph node implementations
β β βββ intent_classify.py # LLM-based intent classification (10 intents, GPT-4o)
β β βββ service_router.py # Context-aware response using conversation history
β β βββ status_check.py # Application status + deterministic tool chaining (β RAG)
β β βββ appointment_book.py # Appointment booking with service type detection
β β βββ appointment_list.py # List existing appointments
β β βββ appointment_cancel.py # Cancel appointments with selection matching
β β βββ document_request.py # Document request with type detection
β β βββ document_status.py # Document request status listing
β β βββ faq_answer.py # FAQ with RAG grounding (Pinecone) + edge case handling
β β βββ complaint.py # Complaint recording with confirmation
β β βββ escalate.py # Human transfer β empathetic message + operator summary
β β βββ utils.py # Shared utilities (mark_completed, get_honorific)
β βββ prompts/ # Versioned system prompts
β β βββ loader.py # Prompt loader with version management
β β βββ v1.0/ # Current prompt version
β β βββ system_prompt_en.md # English system prompt
β β βββ system_prompt_tr.md # Turkish system prompt
β βββ tools/ # Validators + system tool schemas
β βββ tc_kimlik.py # TC Kimlik checksum validator + masking
β βββ app_ref.py # Application reference format validator
β βββ date_parser.py # Multi-format date parser (voice-friendly)
β βββ schemas.py # ElevenLabs system tool configs (end_call, language_detection)
βββ api/ # Government backend (mounted into agent/server.py)
β βββ server.py # Standalone FastAPI app (for independent testing)
β βββ auth.py # Identity verification β webhook, TC Kimlik, app ref methods
β βββ handoff.py # Escalation logging + guest mode FAQ
β βββ services.py # Application, appointment, document, service catalog endpoints
β βββ models.py # SQLAlchemy models (Citizen, Application, Appointment, DocumentRequest, ConversationLog, AuthAuditLog)
β βββ seed_data.py # 23 citizens + 26 applications + 5 appointments + 3 doc requests
βββ rag/ # RAG pipeline
β βββ chunker.py # Document chunking (header-based + size overlap)
β βββ embed.py # Gov KB embedding pipeline (OpenAI β Pinecone)
β βββ embed_elevenlabs.py # ElevenLabs docs embedding (514 chunks, separate index)
β βββ retriever.py # Pinecone query with language/category metadata filters
βββ dashboard/ # Streamlit analytics dashboard (9 pages)
β βββ app.py # Multi-page dashboard with sidebar navigation
β βββ insights.py # LLM-powered recommendation engine (GPT-4o + RAG)
βββ data/ # citizens.db (SQLite, gitignored), knowledge_base/ (40 docs)
βββ docs/
β βββ ARCHITECTURE.md # Detailed system design with diagrams
β βββ DECISIONS.md # Key design decisions and tradeoffs
β βββ auth_flow.md # Authentication state diagram
β βββ conversation_flow.md # Conversation flow documentation
β βββ langgraph_value.md # LangGraph capabilities and voice test scenarios
βββ tests/ # 231 tests
β βββ conftest.py # Shared test DB setup, per-test audit log cleanup
β βββ test_core_system.py # Comprehensive core system tests (104 tests, all product endings)
β βββ test_auth.py # Authentication flow tests (20 tests)
β βββ test_services.py # Government API endpoint tests
β βββ test_edge_cases.py # TC Kimlik normalization, date parsing edge cases
β βββ test_handoff.py # Escalation and handoff tests
β βββ test_schemas.py # System tool schema tests
β βββ test_logging.py # PII redaction filter tests
β βββ test_tc_kimlik.py # TC Kimlik validator tests
β βββ test_agent_connection.py # Agent config and Custom LLM config tests
β βββ test_intent_detection.py # ElevenLabs simulation API intent tests
β βββ test_language_detection.py # Language switching simulation tests
β βββ voice_e2e_test_roadmap.md # 39 manual voice test scenarios
β βββ eval/ # Automated conversation evaluation (planned)
βββ .env.example # Required environment variables
βββ .gitignore
βββ CLAUDE.md # Development guidelines and project context
βββ Makefile # make test, make test-live, make serve
βββ requirements.txt # Python dependencies
βββ issues.md # Roadmap and issue tracking
