2 packages • ⭐ 76,633 total stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.