# vmlx

> vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth

- **URL**: https://www.freshcrate.ai/projects/vmlx
- **Author**: jjang-ai
- **Category**: MCP Servers
- **Latest version**: `v1.5.54` (2026-06-02)
- **License**: Apache-2.0
- **Source**: https://github.com/jjang-ai/vmlx
- **Homepage**: https://vmlx.net
- **Language**: Python
- **GitHub**: 348 stars, 42 forks
- **Registry**: github
- **Tags**: `anthropic-api`, `kvcache-compression`, `kvcache-optimization`, `kvcache-reuse`, `llm`, `lmstudio`, `macbook`, `mcp-server`, `python`

## Description

vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v1.5.54` | 2026-06-02 | High | vMLX 1.5.54  Highlights: - Ships the model-owned generation-default fix in the public app bundle so explicit JANG defaults such as Step-3.7-Flash temperature 0.00, top-p 1.0, and top-k off are detected from bundle metadata, displayed in app settings, and applied before session config is saved. - Keeps startup sampling behavior derived from generation_config.json and jang_config.json instead of hidden app-side sampler forcing. - Carries the 1.5.53 emergency release fixes for Step metadata, ZAYA V |
| `v1.5.49` | 2026-05-24 | High | ## vMLX 1.5.49  This release tightens the Python/Electron runtime and packaged app after the v1.5.48 cache/settings audit.  ### Fixed  - DSV4 Flash now launches with the native SWA+CSA/HCA composite prefix cache path enabled by default, with 256-token DSV4 block indexing. - DSV4 CSA/HCA pool codec now uses the materialized pool implementation from clean JANG source, avoiding repeated historical pool dequant/concat on cache reads. - Generic KV q4/q8 cache quantization remains suppressed for DSV4; |
| `v1.5.48` | 2026-05-22 | High | vMLX 1.5.48  Highlights: - Aligned Qwen3.6 affine-JANG native-MTP VL routing across the engine registry, decode-speed launch rows, panel detection, and API policy. - Added local-path parity coverage for high-risk DSV4, Qwen, Hy3, and Nemotron artifacts so parser, reasoning, cache, modality, and launch policy do not silently diverge between UI and engine. - Added post-release guards proving explicit Chat/Responses output caps do not mutate server startup defaults.  Downloads: - The updater manife |
| `v1.5.36` | 2026-05-16 | High | vMLX 1.5.36  Fixes and release integrity: - Ships the installed-app Stream(gpu, 0) cache-hit fix for single-active JANG cache replay. - Restores bundled package assets required by the Python app, including chat templates, defaults, and Metal codebook kernels. - Rebuilds the macOS app bundle from the canonical JANG source checkout, not the unfinished JangStudio/profile-matrix worktree. - Adds release gates for bundled vmlx_engine assets, canonical jang_tools provenance, relocatable bundled-python |
| `v2.0.0-rc.1` | 2026-05-04 | High | Swift native Mac app, Developer-ID notarized.  **SHA256**: `a4f47a6db4f679a29c0191a04e640b1386971108b5d402cc7f110d227f24ae56` **Size**: 24.3 MB  Notarized + stapled. Hardened runtime. Apple Team ID: 55KGF2S5AY. Identifier: ai.jangq.vmlx. |
| `v1.5.0` | 2026-05-01 | High | Live audit 2026-04-30 caught Laguna-XS.2-mxfp4 crashing with 'Model type laguna not supported'. The architecture-specific routing branches were INSIDE the is_jang_model() gate, which only matches jang/jjqf/mxq/mxtq weight formats — MXFP4 bundles fell through to stock mlx_lm. Hoisted the Laguna + ministral3 branches to BEFORE the JANG gate so all weight formats land in the right loader. |
| `v1.3.34` | 2026-04-09 | High | Post-v1.3.33 fixes driven by user reports and a dedicated-machine test matrix across 5 models.  ## SSM Deferred Re-derive (Hybrid SSM + Thinking Models)  For thinking models on hybrid SSM architectures (Nemotron, Qwen3.5-VL), the post-generation SSM state was contaminated by thinking tokens and previously **skipped entirely** — causing 100% SSM companion cache miss on every multi-turn request.  Now the scheduler **queues a deferred re-derive** that runs during idle time: a separate prefill pass |
| `v1.3.33` | 2026-04-09 | Medium | Post-v1.3.32 audit cycle — fixes three GitHub issues and an internal release-audit sweep. Headline fix unblocks vision prompts on every batched-engine VLM.  ## Fix #1 — [Issue #56](https://github.com/jjang-ai/vmlx/issues/56): Vision/MLLM requests return empty response on BatchedEngine  Mistral 4 / Pixtral, Qwen3.5-VL, Gemma 4 — vision prompts on the batched engine silently dropped with `content=null`, `prompt_tokens=0`, `finish_reason=stop`.  **Root cause** — two-step dtype mismatch in `mllm_bat |
| `v1.3.32` | 2026-04-09 | High | Gemma 4 VLM image recognition fix (two-layer bug).  **Bug 1** — mlx_vlm MODEL_CONFIG missing `gemma4` entry → `apply_chat_template()` raised `Unsupported model: gemma4` → silent fallback to text-only tokenizer → image content parts dropped → model answered as if no image was attached.  **Fix**: Register `gemma4`/`gemma4_text` under `LIST_WITH_IMAGE_TYPE` in `vmlx_engine/__init__.py` at import time so the Jinja chat template renders `<\|image\|>` tokens correctly.  **Bug 2** — Gemma 4 vision tower |
| `v1.3.31` | 2026-04-09 | Medium | ## 4-agent audit 2026-04-07 — Phase 5 polish (22 fixes)  Cross-agent audit delivering cross-session cache sharing, Aho-Corasick stop matching, SSM companion cache extraction, Gemma 4 native shim, and the mlx-lm 0.31.2 bump.  **263/263 cache tests PASS** — 27/27 live cells across Qwen3 / Gemma 4 / Nemotron Cascade 2 / Mistral 4.  ### Agent 1 — Cache (Coordinator) - LRU+Trie cross-session prefix sharing on MemoryAwarePrefixCache (production default) — system → user → assistant priority eviction - |

## Citation

- HTML: https://www.freshcrate.ai/projects/vmlx
- Markdown: https://www.freshcrate.ai/projects/vmlx.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/vmlx/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._