freshcrate
Skin:/
Home > #inference

Tag: #inference

11 packages â€ĸ ⭐ 107,710 total stars

vllmv0.22.1đŸ›ī¸ Flagship⭐77,587

A high-throughput and memory-efficient inference and serving engine for LLMs

faster-whisper1.2.1đŸ›ī¸ Flagship⭐22,327

Faster Whisper transcription with CTranslate2

ctranslate2v4.7.2đŸŒŗ Mature⭐4,444

Fast inference engine for Transformer models

xgrammarv0.2.1đŸŒŗ Mature⭐1,637

Efficient, Flexible and Portable Structured Generation

vllm-mlxv0.3.0đŸŒŗ Mature⭐917

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

apache-tvm-ffiv0.1.11đŸŒŋ Growing⭐377

tvm ffi

oramacorev1.2.38đŸŒŋ Growing⭐249

OramaCore is the complete runtime you need for your projects, answer engines, copilots, and search. It includes a fully-fledged full-text search engine, vector database, LLM interface, and many more u

llm7.iomain@2026-06-01đŸŒŋ Growing⭐142

LLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.

rasputin-memoryv0.9.1🌱 Seedling⭐30

The memory system your AI agent deserves. 4-stage hybrid retrieval — Vector + BM25 + Knowledge Graph + Neural Reranker — in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb

tritonclient2.67.0🌱 Seedling

Python client library and utilities for communicating with Triton Inference Server

foundation-ai-agent1.0.1🌱 Seedling

12 native protocol layers for AI-agent systems. No wrappers. No SDK dependencies.