freshcrate — #inference

Home > #inference

Tag: #inference

5 packages • ⭐ 77,358 total stars

vllmv0.19.1🌿 Growing⭐76,155

A high-throughput and memory-efficient inference and serving engine for LLMs

amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference pythonby vllm-project

vllm-mlxv0.2.8🌿 Growing⭐798

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

anthropic apple-silicon audio-processing claude-code computer-vision image-understanding inference llm pythonby waybarrios

oramacorev1.2.38🌱 Seedling⭐249

OramaCore is the complete runtime you need for your projects, answer engines, copilots, and search. It includes a fully-fledged full-text search engine, vector database, LLM interface, and many more u

fulltext-search inference llms rust vector-database vector-searchby oramasearch

llm7.io0.0.0🌿 Growing⭐139

LLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.

ai api artificial-intelligence inference large-language-models llm models typescriptby chigwell

rasputin-memoryv0.9.1🌱 Seedling⭐17

The memory system your AI agent deserves. 4-stage hybrid retrieval — Vector + BM25 + Knowledge Graph + Neural Reranker — in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb

agent-memory ai ai-memory bm25 embeddings falkordb hybrid-search inference python ragby jcartu