Search results for "inference"
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.
OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch). Lightweight, extensible Python/FastAPIβuse as library or standalone service.
The memory system your AI agent deserves. 4-stage hybrid retrieval β Vector + BM25 + Knowledge Graph + Neural Reranker β in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb
423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.
Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
The Pinecone Python client
Droid LLM Hunter is a tool to scan for vulnerabilities in Android applications using Large Language Models (LLMs).
RAGLight is a modular framework for Retrieval-Augmented Generation (RAG). It makes it easy to plug in different LLMs, embeddings, and vector stores, and now includes seamless MCP integration to connec
"RAG-Anything: All-in-One RAG Framework"
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. ππ» Integrates with 50+ LLM Providers,
Give any AI agent a full desktop β it sees the screen, clicks, types, and runs apps like a human. Automate anything with a UI: browsers, legacy software, internal tools. No API needed. One Docker comm
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
JRVS AI Agent with JARCORE autonomous coding engine - RAG knowledge base, web scraping, calendar, code generation. Powered by whatever local AI you choose.
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules β running loca
A high-throughput and memory-efficient inference and serving engine for LLMs
Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.
Agentic RAG R1 Framework via Reinforcement Learning
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
Make AI work for Everyone - Monitoring and governing for your AI/ML
An AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesβsuch as security vulnerabilities, bugs, and maintainability concerns.
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
Official MCP Servers for AWS
754 structured cybersecurity skills for AI agents Β· Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF Β· agentskills.io standard Β· Works with Claude Code, GitHub Cop
A Multi-Agentic AI Assistant/Builder
Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine β zero data sent to external services.
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β they're redefining how software changes the world.
One API for 20+ LLM providers, your databases, and your files β self-hosted, open-source AI gateway with RAG, voice, and guardrails.
METAβAGENTIC Ξ±βAGI ποΈβ¨ β Mission π― Endβtoβend: Identify π β OutβLearn π β OutβThink π§ β OutβDesign π¨ β OutβStrategise βοΈ β OutβExecute β‘
Conversational & memory-enabled AI research partner for multi-omics analysis. From biological idea to full research paper.
AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x
A model-driven approach to building AI agents in just a few lines of code.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Unified framework for building enterprise RAG pipelines with small, specialized models
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. π Official updates only via twitter @Martin993
π¬ Harness Vibe Research with Self-evolving AI Scientists
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
β‘ Lightweight offline AI agent for local models. No cloud, no API keys β just your GPU.
Multi-agent swing trading system β automated screening, research, and execution with backtesting and live trading
π¦ The first autonomous hackathon agent stop assisting and start competing (π Hackathon Champion Project).
Agentic memory for CTI in Python β STIX knowledge graphs, threat-actor alias resolution, offline-first RAG, MCP server for Claude Code and LangChain agents
Local-first AI assistant β 9 specialized agents (code, web, debug, securityβ¦), 10M token vector memory, mobile relay via secure tunnel, real-time web search and document processing. Runs 100% on your
π€ The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil
π LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz
The API layer for AI agents. Dashboard + 22K APIs + 18 Direct Call providers. MCP native.
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Lightweight semantic code search engine β 2-stage vector + FTS + RRF fusion + MCP server for Claude Code
A command-line interface tool for serving LLM using vLLM.
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features β fully self-hosted via Ollama.
Your AI-powered SWE teammate, built into your git workflow
Autonomous AI agent for Crustocean, powered by Hermes Agent from Nous Research
Lightweight hallucination detection framework for RAG applications
Control robots and physical hardware with natural language through Strands Agents.
CloneMe is an advanced AI platform that builds your digital twinβan AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
A tool that compiles messy natural language prompts into a structured intermediate representation (IR) and optionally sends them to LLMs like ChatGPT for cleaner, more reliable responses.
Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate β built to help studen
Complete Workspace Template for OpenClaw - Full agent lifecycle with unified memory system (Markdown + SQLite), self-evolution, RAG. Not for SubAgent/Skill use.
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge base
Modular multi-agent orchestration framework powered by LangGraph and FastAPI.
Local-first autonomous coding agent that plans, executes, validates, and finishes software tasks end-to-end.
Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services
FlashInfer: Kernel Library for LLM Serving
Microsoft Azure AI Inference Client Library for Python
Official Python package for working with the Roboflow API
Faster Whisper transcription with CTranslate2
Efficient, Flexible and Portable Structured Generation
Fast inference engine for Transformer models
Calculate prices for calling LLM inference APIs.
PyTorch native Metrics
Client library for the Qdrant vector search engine
Python client library and utilities for communicating with Triton Inference Server
The Blis BLAS-like linear algebra library, as a self-contained C-extension.
Open source library for training and deploying models on Amazon SageMaker.
SGLang is a fast serving framework for large language models and vision language models.
An abstract syntax tree for Python with inference support.
Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
the blessed package to manage your versions by scm tags
Medical-AI is a AI framework specifically for Medical Applications https://aibharata.github.io/medicalAI/
