freshcrate

Search results for "quantization"

Clear filters
16 results found (Python)
flashinfer-python📁0.6.8.post1🏛️ Flagship5,467

FlashInfer: Kernel Library for LLM Serving

torchao📁0.17.0🌳 Mature2,790

Package for applying ao techniques to GPU models

faster-whisper📁1.2.1🏛️ Flagship22,327

Faster Whisper transcription with CTranslate2

vllm📁v0.19.1🏛️ Flagship77,587

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm-mlx📁v0.2.8🌳 Mature917

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

vmlx📁v1.3.34🌿 Growing348

vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth

fast-plaid📁1.4.5🌿 Growing245

High-Performance Engine for Multi-Vector Search

cognita📁0.0.0🌳 Mature4,405

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

llmware📁v0.4.6🌿 Growing14,862

Unified framework for building enterprise RAG pipelines with small, specialized models

tsunami📁main@2026-04-21🌱 Seedling16

autonomous AI agent that builds full-stack apps. local models. no cloud. no API keys. runs on your hardware.

awesome-opensource-ai📁main@2026-04-20🌿 Growing2,849

Curated list of the best truly open-source AI projects, models, tools, and infrastructure.

rag-chatbot📁main@2026-04-14🌿 Growing407

RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.

contemplative-agent📁v2.1.0🌱 Seedling4

A self-improving AI agent that learns from experience. Runs entirely on a local 9B model. Security by absence — dangerous capabilities were never built.

MOP📁0.0.0🌱 Seedling1

A local LLM-based autonomous agent orchestration platform featuring async background tasks, context-isolated sub-agents, dynamic knowledge injection, and strict security approval gates (Plan Mode).

vector-cache-optimizer📁base-setup@2026-04-21🌱 Seedling1

⚡ Optimize vector searches with a hyper-efficient cache that uses machine learning for faster, smarter data access and reduced costs.