freshcrate
Home > #inference

Tag: #inference

5 packages • ⭐ 77,358 total stars

vllmv0.19.1🌿 Growing76,155

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm-mlxv0.2.8🌿 Growing798

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

oramacorev1.2.38🌱 Seedling249

OramaCore is the complete runtime you need for your projects, answer engines, copilots, and search. It includes a fully-fledged full-text search engine, vector database, LLM interface, and many more u

llm7.io0.0.0🌿 Growing139

LLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.

rasputin-memoryv0.9.1🌱 Seedling17

The memory system your AI agent deserves. 4-stage hybrid retrieval — Vector + BM25 + Knowledge Graph + Neural Reranker — in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb