# SmarterRouter

> SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

- **URL**: https://www.freshcrate.ai/projects/SmarterRouter
- **Author**: peva3
- **Category**: Infrastructure
- **Latest version**: `2.2.5` (2026-04-18)
- **License**: MIT
- **Source**: https://github.com/peva3/SmarterRouter
- **Language**: Python
- **GitHub**: 113 stars, 8 forks
- **Registry**: github
- **Tags**: `ai-cache`, `ai-gateway`, `docker`, `fastapi`, `gpu-monitoring`, `llm`, `llm-proxy`, `llm-router`, `python`

## Description

SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `2.2.5` | 2026-04-18 | High | ## [2.2.5] - 2026-04-17  ### New Features - **Dynamic Model Metadata Registry** (`router/model_metadata.py`): Created comprehensive model metadata system with automatic capability detection from Ollama API, TTL caching, and pattern-based fallbacks. Supports vision, tool_calling, embedding, MoE, and quantization detection. - **Gemma 4 Support**: Added Gemma 4 series (e2b, e4b, 26b, 31b) to modality detection heuristics for both vision and tool calling capabilities. - **MoE-Aware VRAM Estimat |
| `2.2.4` | 2026-04-06 | High | ## [2.2.4] - 2026-04-06  ### Security Fixes - **Weak MD5 hash in prompt analysis cache** (`router/router.py:1302`): Replaced `hashlib.md5()` with `hashlib.sha256()` for cryptographic security in cache key generation. - **Pickle deserialization vulnerability in Redis cache** (`router/cache_redis.py:97`): Replaced `pickle.loads()/pickle.dumps()` with `json.loads()/json.dumps()` to prevent potential remote code execution from untrusted cache data. - **Redis cache connection error handling** (` |
| `2.2.3` | 2026-03-28 | Medium | ## [2.2.3] - 2026-03-27  ### Security Fixes - **SQL injection anti-pattern in index creation** (`database.py:278-281`): Changed f-string interpolation in DDL helper to parameterized query using `text(...).bindparams(...)`. The index name was hardcoded so not directly exploitable, but the pattern could be copied to user-facing code. - **Timing attack on admin API key comparison** (`state.py:467`): Changed string `!=` comparison to `hmac.compare_digest()` to prevent timing side-channel attacks |
| `2.2.2` | 2026-03-23 | Medium | ## [2.2.2] - 2026-03-16  ### Bug Fixes - **Ollama backend multimodal transformation**: Fixed OpenAI-style multimodal message handling in Ollama backend to properly convert image_url content parts to Ollama's expected images field, stripping data:image/...;base64, prefixes so Ollama vision models can actually receive image data. This resolves the issue where image uploads appeared to route correctly but the image payload was not translated into the format Ollama expects. |
| `2.2.1` | 2026-03-16 | Low | ## [2.2.1] - 2026-03-16  ### Highlights Added modality-aware routing to intelligently route requests based on input type (vision, tool-calling, text, embeddings). Enhanced changelog organization and documentation.  ### New Features  #### Modality-Aware Routing - **Modality detection module** (`router/modality.py`) - Automatic detection of request modalities from request shape:   - Vision: Image URL content parts in messages   - Tool Calling: Presence of tools in request   - Text: Defa |
| `2.2.0` | 2026-03-16 | Low | ## [2.2.0] - 2026-03-16  ### Highlights - Major platform update with performance improvements, reliability hardening, expanded security controls, and large documentation/testing expansion. - Main application architecture refactored into focused modules (`router/state.py`, `router/middleware.py`, `router/lifecycle.py`, `router/api/*`) with `main.py` reduced to an app shell.  ### Performance & Scalability - Added configurable response compression (`ROUTER_ENABLE_RESPONSE_COMPRESSION`, `ROUT |
| `2.1.9` | 2026-03-04 | Low | ## [2.1.9] - 2026-03-03  ### Performance Optimizations (Phase 2 - Quick Wins)  #### Critical Performance Fixes 1. **Fixed blocking GPU I/O with async wrapper**:    - Added `get_memory_info_async()` method to GPU backend protocol (router/gpu_backends/base.py:63-74)    - Updated VRAM monitor to use async GPU queries (router/vram_monitor.py:219-225)    - Eliminates event loop blocking during GPU memory queries (5s timeout per GPU)  2. **Implemented batched VRAM estimates**:    - Added `g |
| `2.1.8` | 2026-03-03 | Low | # [2.1.8] - 2026-03-03  ### Performance Optimizations  #### Reduced Backend API Calls - **Model list caching**: Added 10-second TTL cache for `list_models()` calls, eliminating ~100-500ms latency per request (router/router.py:33-155, main.py:125-184) - **Router engine accepts pre-fetched models**: `select_model()` now accepts optional `available_models` parameter to avoid redundant backend calls (router/router.py:1064-1079)  #### Lower Resource Consumption - **Reduced model polling freq |
| `2.1.7` | 2026-02-27 | Low | ## [2.1.7] - 2026-02-27  ### Critical Bug Fixes & Stability Improvements  #### Concurrency & Race Condition Fixes - **Fixed race condition in `SemanticCache._get_embedding()`**: Rewrote embedding cache to eliminate double lock acquisition that could cause deadlocks (router/router.py:396-467) - **Fixed global cache race condition in `_get_all_profiles()`**: Added `asyncio.Lock()` and double-checked locking pattern to prevent concurrent cache corruption (router/router.py:1363-1384) - **Fixe |
| `2.1.6` | 2026-02-27 | Low | ## [2.1.6] - 2026-02-27  ### Enhanced Cache Statistics & API  #### Detailed Cache Analytics - **Time-series tracking**: Cache hits, misses, similarity hits, evictions, and embedding cache events tracked with timestamps - **Multi-dimensional metrics**: Per-model cache counts, access patterns, and eviction reasons - **Real-time analytics**: Cache hit rates, similarity hit rates, and adaptive threshold adjustments  #### New Admin Endpoints - `GET /admin/cache/stats` - Detailed cache stati |

## Dependency audit

- **Score**: 63/100
- **Total deps**: 17
- **Resolved**: 6
- **Unresolved**: 11
- **License conflicts**: 0
- **Warnings**: 12
- **Scanned**: 2026-05-11

## Citation

- HTML: https://www.freshcrate.ai/projects/SmarterRouter
- Markdown: https://www.freshcrate.ai/projects/SmarterRouter.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/SmarterRouter/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._