Reliability. Governance. Reproducible AI.
The TypeScript framework for AI systems you can trust in production.
demo.mp4
AI systems must fail predictably. AI systems must be auditable. AI systems must be reproducible. AI systems must be governed by policy, not hope.
Every feature in ElsiumAI exists to serve one of these principles. If it doesn't, it doesn't ship.
Every AI framework helps you call an LLM. None of them help you trust the result.
ElsiumAI is built on three pillars that most frameworks ignore entirely:
| Pillar | The guarantee |
|---|---|
| Reliability | Your system stays up when providers break β circuit breakers, bulkhead isolation, request dedup, graceful shutdown |
| Governance | You control who does what, and you can prove it β policy engine, RBAC, approval gates, hash-chained audit trail, agent identity, runtime policy enforcement, memory integrity, MCP trust framework, compliance reporting (OWASP Agentic, EU AI Act, Colorado AI Act) |
| Reproducible AI | Tools to measure, pin, and reproduce AI outputs β seed propagation, output pinning, provenance tracking, determinism assertions |
It also does everything you'd expect β multi-provider gateway, agents, tools, RAG, workflows, MCP, streaming, cost tracking. But those are table stakes. The three pillars are what make ElsiumAI different.
npm install @elsium-ai/core @elsium-ai/gateway @elsium-ai/agentsimport { gateway } from '@elsium-ai/gateway'
import { defineAgent } from '@elsium-ai/agents'
import { env } from '@elsium-ai/core'
const llm = gateway({
provider: 'anthropic',
model: 'claude-sonnet-4-6',
apiKey: env('ANTHROPIC_API_KEY'),
})
const agent = defineAgent(
{ name: 'assistant', system: 'You are a helpful assistant.' },
{ complete: (req) => llm.complete(req) },
)
const result = await agent.run('What is TypeScript?')Providers go down. Rate limits hit. Costs spiral. ElsiumAI treats failure as a first-class concern.
import { createProviderMesh } from '@elsium-ai/gateway'
import { env } from '@elsium-ai/core'
const mesh = createProviderMesh({
providers: [
{ name: 'anthropic', config: { apiKey: env('ANTHROPIC_API_KEY') } },
{ name: 'openai', config: { apiKey: env('OPENAI_API_KEY') } },
],
strategy: 'fallback',
circuitBreaker: { // Provider failing? Circuit opens, traffic reroutes
failureThreshold: 5,
resetTimeoutMs: 30_000,
},
})| Feature | What it does |
|---|---|
| Circuit Breaker | Detects failing providers, stops sending traffic, auto-recovers |
| Bulkhead Isolation | Bounds concurrency β one slow consumer can't starve the rest |
| Request Dedup | Identical in-flight calls coalesce into one API request |
| Graceful Shutdown | Drains in-flight operations before process exit |
| Retry with Backoff | Exponential backoff with jitter, respects Retry-After headers |
| Stream Failover | Provider stream fails mid-request? Automatically switches to next provider |
Who called which model? Did they have permission? Can you prove the audit log hasn't been tampered with?
import { createPolicySet, policyMiddleware, modelAccessPolicy, costLimitPolicy, env } from '@elsium-ai/core'
import { createAuditTrail, auditMiddleware } from '@elsium-ai/observe'
import { createRBAC } from '@elsium-ai/app'
// Policy: what's allowed
const policies = createPolicySet([
modelAccessPolicy(['claude-sonnet-4-6', 'gpt-4o-mini']),
costLimitPolicy(5.00),
])
// Audit: what happened (hash-chained, tamper-proof)
const audit = createAuditTrail({ hashChain: true })
// RBAC: who can do it
const rbac = createRBAC({
roles: [{ name: 'analyst', permissions: ['model:use:gpt-4o-mini'], inherits: ['viewer'] }],
})
const llm = gateway({
provider: 'anthropic',
apiKey: env('ANTHROPIC_API_KEY'),
middleware: [policyMiddleware(policies), auditMiddleware(audit)],
})| Feature | What it does |
|---|---|
| Policy Engine | Declarative rules β deny by model, cost, token count, or content pattern |
| Runtime Policy Enforcement | Enforce policies inside the agent loop β check permissions before every tool call |
| RBAC | Role-based permissions with inheritance and wildcard matching |
| Approval Gates | Human-in-the-loop for high-stakes tool calls or expensive operations |
| Agent Identity | HMAC-SHA256 signed agent requests with replay protection and cross-agent verification |
| Memory Integrity | SHA-256 hash-chained message stores β detect tampering in agent memory |
| Audit Trail | SHA-256 hash-chained events with tamper-proof integrity verification, pluggable sinks (webhook, Splunk, Datadog) |
| Compliance Reporting | Generate reports against OWASP Agentic, EU AI Act, Colorado AI Act frameworks |
| MCP Trust Framework | Server allowlists, tool filtering, output validation, manifest integrity for MCP |
| PII Detection | Auto-redacts emails, phones, addresses, API keys before they reach the model |
LLMs are non-deterministic by nature. ElsiumAI gives you the tools to constrain, measure, and track output consistency.
import { assertDeterministic } from '@elsium-ai/testing'
import { createProvenanceTracker } from '@elsium-ai/observe'
// Verify: same input β same output
const result = await assertDeterministic(
(seed) => llm.complete({
messages: [{ role: 'user', content: [{ type: 'text', text: 'Classify: spam' }] }],
temperature: 0,
seed, // Propagated to provider API automatically
}).then(r => extractText(r.message.content)),
{ runs: 5, seed: 42, tolerance: 0 },
)
// { deterministic: true, variance: 0, uniqueOutputs: 1 }
// Prove: who/what/when produced this output
const provenance = createProvenanceTracker()
provenance.record({ prompt, model, config, input, output, traceId })| Feature | What it does |
|---|---|
| Seed Propagation | Passes seed through the stack to OpenAI, Google, and Anthropic APIs |
| Output Pinning | Locks expected outputs β model update changes your classifier? CI catches it |
| Determinism Assertions | Run N times, verify all outputs match, fail in CI if they don't |
| Provenance Tracking | SHA-256 hashes every prompt/config/input/output β full lineage per traceId |
| Request-Matched Fixtures | Replay test fixtures by content hash, not sequence order |
The three pillars are what make ElsiumAI unique. These are the fundamentals it also delivers:
- Multi-provider gateway β X-Ray mode, middleware, smart routing (fallback, cost-optimized, latency-racing, capability-aware)
- Agents β Memory, semantic guardrails, confidence scoring, state machines, multi-agent orchestration, ReAct reasoning loop
- Multimodal β Text, image, audio, and document content across all providers
- Structured output β Native JSON mode per provider (OpenAI json_schema, Anthropic tool-use, Google responseSchema)
- RAG β Document loading, PDF loading, chunking, embeddings, vector search, PgVector store, plugin registries
- Workflows β Retries, parallel execution, branching, checkpointing, resumable workflows
- MCP β Bidirectional client/server bridge, resources, prompts
- Custom providers β OpenAI-compatible adapter for Ollama, Groq, Together, any OpenAI-compatible API
- Caching β LRU response cache with TTL, custom adapters, streaming bypass
- Output guardrails β PII/secret detection in responses, content policy, block/redact/warn modes
- Batch processing β Concurrent LLM requests with semaphore control, per-item retry, progress callbacks
- Token counting & context management β Model-aware estimation, truncate/summarize/sliding-window strategies
- SSE streaming β Server-Sent Events for HTTP endpoints, real-time response streaming
- Multi-tenant β Tenant extraction, per-tenant rate limiting, tier-based access control
- A/B experiments β Weight-based variant assignment, deterministic user hashing, metrics aggregation
- Client SDK β TypeScript HTTP client with SSE parsing for consuming ElsiumAI servers
- Persistent storage β SQLite memory stores for agents, PgVector for RAG
- Cost intelligence β Budgets, projections, loop detection
- Testing β Mock providers, evals, LLM-as-judge, prompt versioning, regression suites, dataset loading (JSON/CSV/JSONL), baseline comparison, multi-turn conversation testing, tool call assertions, automated red-teaming (44 adversarial probes including multi-turn), agent metrics (efficiency, recovery, cost), unified agent eval runner, CI reporters (JUnit XML, GitHub Actions, Markdown)
- Structured extraction β Zod schema β typed output, auto-retry on validation failure
- Dev Studio β Local web dashboard for live traces, X-Ray, costs, streaming events
- AI Proxy β OpenAI-compatible proxy with cost tracking, caching, audit β any language, zero code changes
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β @elsium-ai/app β
β HTTP server Β· RBAC Β· auth Β· routes β
ββββββββββββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββββββ€
β @elsium-ai/agents β @elsium-ai/mcp β @elsium-ai/cli β
β memory Β· approval β client Β· serverβ init Β· dev Β· eval Β· studioβ
β guardrails Β· multiβ resources β proxy β
β ReAct β prompts β β
ββββββββββββ¬ββββββββββΌβββββββββ¬ββββββββΌββββββββββββ¬βββββββββββββββββ€
β gateway β tools βobserve β rag β workflows β client β
β providersβ define β trace β load β steps β HTTP+SSE β
β mesh β toolkit β audit β chunk β parallel β parsing β
β security β β prove- β embed β branch β β
β bulkhead β β nance βvector βcheckpoint β β
β cache β β experi-βpgvect β resumable β β
βguardrail β β ment βregist β β β
β batch β β β PDF β β β
β openai- β β β β β β
β compat β β β β β β
ββββββββββββ΄ββββββββββ΄βββββββββ΄ββββββββ΄ββββββββββββ΄ββββββββββββββββ€
β @elsium-ai/core β
β types Β· errors Β· stream Β· logger Β· config Β· retry Β· result β
β circuit breaker Β· dedup Β· policy engine Β· shutdown manager β
β tokens Β· context manager Β· registry Β· schema Β· multimodal β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Β· Β· Β· Β· Β· Β·
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β @elsium-ai/testing β
β mocks Β· evals Β· fixtures Β· pinning Β· determinism Β· snapshots β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three Pillars β where each feature lives:
Reliability Governance Determinism
βββββββββββ ββββββββββ βββββββββββ
circuit breaker [core] policy engine [core] seed propagation [gw]
request dedup [core] RBAC [app] output pinning [test]
shutdown manager [core] approval gates [agt] determinism test [test]
retry + backoff [core] audit trail [obs] provenance [obs]
bulkhead [gw] PII detection [gw] req-match fixts [test]
provider mesh [gw] content classify [gw] crypto hashing [test]
| Package | Description |
|---|---|
@elsium-ai/core |
Types, errors, streaming, circuit breaker, dedup, policy engine, shutdown, tokens, context manager, registry, schema |
@elsium-ai/gateway |
Multi-provider gateway, X-Ray, provider mesh, OpenAI-compatible provider, bulkhead, PII detection, caching, output guardrails, batch processing |
@elsium-ai/agents |
Agents, ReAct agent, memory, persistent stores (in-memory, SQLite), guardrails, approval gates, multi-agent |
@elsium-ai/tools |
Tool definitions with Zod validation |
@elsium-ai/rag |
Document loading, PDF loading, chunking, embeddings, BM25, hybrid search, vector search, PgVector store, plugin registries |
@elsium-ai/workflows |
DAG workflows, sequential, parallel, branching, checkpointing, resumable workflows |
@elsium-ai/observe |
Tracing, cost intelligence, audit trail, provenance tracking, A/B experiments |
@elsium-ai/mcp |
Bidirectional MCP client and server, resources, prompts |
@elsium-ai/app |
HTTP server, CORS, auth, rate limiting, RBAC, SSE streaming, multi-tenant |
@elsium-ai/client |
TypeScript HTTP client with SSE parsing for consuming ElsiumAI servers |
@elsium-ai/testing |
Mocks, evals, multi-turn agent testing, tool assertions, red-teaming (single + multi-turn), agent metrics, CI reporters |
@elsium-ai/cli |
Scaffolding, dev server, X-Ray inspection |
Beyond agents, tools, RAG, and multi-provider routing, ElsiumAI ships production infrastructure out of the box:
| Category | Feature |
|---|---|
| Reliability | Circuit Breaker, Bulkhead Isolation, Request Dedup, Graceful Shutdown, Retry with Backoff, Stream Failover |
| Governance | Policy Engine, Runtime Policy Enforcement, RBAC, Approval Gates, Agent Identity, Memory Integrity, Hash-Chained Audit, Compliance Reporting, MCP Trust Framework, PII Detection, Output Guardrails, Multi-Tenant |
| Determinism | Seed Propagation, Output Pinning, Determinism Assertions, Provenance Tracking, A/B Experiments |
| Performance | Response Caching, Batch Processing, Token Counting, Context Management |
| Multimodal | Text, Image, Audio, Document across Anthropic, OpenAI, Google |
| Structured Output | Native JSON mode per provider, Zod schema validation |
Measured with zero-latency mock provider to isolate framework cost. Full methodology and reproduction steps in benchmarks/.
| Metric | P50 | P95 | Conditions |
|---|---|---|---|
| Core completion path | 2.3ΞΌs | 5.5ΞΌs | Agent, no middleware |
| Full governance stack | 6.2ΞΌs | 9.5ΞΌs | Security + audit + policy + cost + xray + logging |
| Under concurrency | 5.0ΞΌs | 6.4ΞΌs | 100 parallel requests, full stack |
| Typical LLM network latency | 200β800ms |
| ElsiumAI overhead at P95 | <10ΞΌs |
| Framework cost contribution | <0.01% of total request time |
| Metric | Value |
|---|---|
| Cold start | <3ms |
| Bundle size (minified) | 77 KB |
| Memory per 10K requests | ~10 MB (full stack + tracing + audit, all in-memory, capped) |
| Per-request heap growth | ~1 KB |
| Circuit breaker throughput | >5M ops/sec |
Baselines are frozen per release and checked for regressions in CI. See benchmarks/results/ for historical data.
- Fail predictably β handle failure before you see it
- Trust but verify β every call auditable, every output traceable
- Reproducible by design β testable AI is trustworthy AI
- Zero magic β
createX(config)everywhere, no hidden behavior - Type safety end-to-end β from config to LLM output
- Modular β use what you need, tree-shake the rest
See CONTRIBUTING.md for development setup and guidelines.
Created and maintained by Eric Utrera (@ebutrera9103).
MIT - Copyright (c) 2026 Eric Utrera
