Search results for "quantization"
FlashInfer: Kernel Library for LLM Serving
Faster Whisper transcription with CTranslate2
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
High-Performance Engine for Multi-Vector Search
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Benchmark for vector databases.
Unified framework for building enterprise RAG pipelines with small, specialized models
autonomous AI agent that builds full-stack apps. local models. no cloud. no API keys. runs on your hardware.
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
A self-improving AI agent that learns from experience. Runs entirely on a local 9B model. Security by absence — dangerous capabilities were never built.
A local LLM-based autonomous agent orchestration platform featuring async background tasks, context-isolated sub-agents, dynamic knowledge injection, and strict security approval gates (Plan Mode).
⚡ Optimize vector searches with a hyper-efficient cache that uses machine learning for faster, smarter data access and reduced costs.
