Tag: #llm-inference
8 packages âĸ â 12,331 total stars
Plano is an AI-native proxy and data plane for agentic apps â with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector DBs, memory) to agents that can interact with your data. With its modular architecture it's
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
A command-line interface tool for serving LLM using vLLM.
Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.
A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes pre-configured components for: đ AI Gateway (LiteLLM) đ¤ LLM Serving (vLLM, SGLang, Ollama
Zero-code LLM security & observability proxy. Real-time prompt injection detection, PII scanning, and cost control for OpenAI-compatible APIs. Built in Rust.
