Search results for "inference"
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for
The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector DBs, memory) to agents that can interact with your data. With its modular architecture it's
Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing so you stay focused on your agents core logic.
The memory system your AI agent deserves. 4-stage hybrid retrieval β Vector + BM25 + Knowledge Graph + Neural Reranker β in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb
Local-first memory plugin for OpenClaw AI agents. LLM-powered extraction, plain markdown storage, hybrid search via QMD. Gives agents persistent long-term memory across conversations.
Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.
Universal AI Development Platform with MCP server integration, multi-provider support, and professional CLI. Build, test, and deploy AI applications with multiple ai providers.
ByteRover CLI (brv) - The portable memory layer for autonomous coding agents (formerly Cipher)
Run a fleet of AI agents on Kubernetes. Administer your cluster agentically
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
EdgeCrab π¦ A Super Powerful Personal Assistant inspired by NousHermes and OpenClaw β Rust-native, blazing-fast terminal UI, ReAct tool loop, multi-provider LLM support, ACP protocol, gateway adapters
Curated directory of terminal-native AI coding agents and the harnesses that orchestrate them. Covers open-source tools (Pi, OpenCode, Aider, Goose), platform agents (Claude Code, Codex, Gemini CLI),
Own your AI. The native macOS harness for AI agents -- any model, persistent memory, autonomous execution, cryptographic identity. Built in Swift. Fully offline. Open source.
The worldβs fastest AI model gateway (450x less overhead than LiteLLM). Unified access to LLMs across endpoints (openAI, self-hosted, etc.) behind a single authentication layer - with API key generati
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
π€ A composable framework for building AI applications.
Security scanner for AI-generated ("vibe-coded") code. Runs SAST, DAST, and sandboxed exploit simulation across 15+ languages using 30+ tools. Catches what LLMs introduce before it ships β wit
LLM7.io offers a single API gateway that connects you to a wide array of leading AI models from various providers.
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. ππ» Integrates with 50+ LLM Providers,
Give any AI agent a full desktop β it sees the screen, clicks, types, and runs apps like a human. Automate anything with a UI: browsers, legacy software, internal tools. No API needed. One Docker comm
Autonomous Agents (LLMs) research papers. Updated Daily.
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
Semiont supports human+ai collaborative knowledge work. Use it as: a Wiki, Semantic Layer, Context Graph, Knowledge Base, Annotator, Research Tool, or Agentic Memory...
A functional programming language optimized for LLM code generation. Compiles to Rust and WebAssembly.
π₯ Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
JRVS AI Agent with JARCORE autonomous coding engine - RAG knowledge base, web scraping, calendar, code generation. Powered by whatever local AI you choose.
Generic rag framework to apply the power of LLMs on any given dataset
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules β running loca
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
A high-throughput and memory-efficient inference and serving engine for LLMs
Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.
A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes pre-configured components for: π AI Gateway (LiteLLM) π€ LLM Serving (vLLM, SGLang, Ollama
Agentic RAG R1 Framework via Reinforcement Learning
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
Make AI work for Everyone - Monitoring and governing for your AI/ML
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a c
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history,
β₯ AI Coding agent for the terminal β hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
Official MCP Servers for AWS
A Multi-Agentic AI Assistant/Builder
This repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related website
π¦ Open-source alternative to Claude Code, built from scratch in Rust. Agentic coding CLI β thinks, plans, and executes with any LLM. Compatible with Claude Code workflows.
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
2026 swarm Agent εΉ΄οΌswarm Agent γAgent teamγ ai codingγskillγmemoryγevolveγagentic RL η AI Agentιε
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β they're redefining how software changes the world.
One API for 20+ LLM providers, your databases, and your files β self-hosted, open-source AI gateway with RAG, voice, and guardrails.
METAβAGENTIC Ξ±βAGI ποΈβ¨ β Mission π― Endβtoβend: Identify π β OutβLearn π β OutβThink π§ β OutβDesign π¨ β OutβStrategise βοΈ β OutβExecute β‘
Conversational & memory-enabled AI research partner for multi-omics analysis. From biological idea to full research paper.
MCP server for token-efficient large document analysis via the use of REPL state
AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x
A model-driven approach to building AI agents in just a few lines of code.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
An open-source, cloud-native, high-performance gateway unifying multiple LLM providers, from local solutions like Ollama to major cloud providers such as OpenAI, Groq, Cohere, Anthropic, Cloudflare an
Unified framework for building enterprise RAG pipelines with small, specialized models
OramaCore is the complete runtime you need for your projects, answer engines, copilots, and search. It includes a fully-fledged full-text search engine, vector database, LLM interface, and many more u
Generate OpenAPI 3.1 specs from Go source code via static analysis β zero annotations, automatic framework detection
A curated list of awesome works related to high dimensional structure/vector search & database
Open-source Agentic AI framework in Go for building, orchestrating, and deploying intelligent agents. LLM-agnostic, event-driven, with multi-agent workflows, MCP tool discovery, and production-grade o
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. π Official updates only via twitter @Martin993
π¬ Harness Vibe Research with Self-evolving AI Scientists
NextPlaid, ColGREP: Multi-vector search, from database to coding agents.
OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch). Lightweight, extensible Python/FastAPIβuse as library or standalone service.
One API for 25+ LLMs, OpenAI, Anthropic, Bedrock, Azure. Caching, guardrails & cost controls. Go-native LiteLLM & Kong AI Gateway alternative.
DSPEx - Declarative Self-improving Elixir | A BEAM-Native AI Program Optimization Framework
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Artifical Ecology For Thought and Emergent Reasoning. The Colony That Builds With You.
Declarative Self Improving Elixir - DSPy Orchestration in Elixir
Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native support. Use local or cloud models with full agent capabilities.
Security-first AI agent orchestration system. Built-in agents with predefined capabilities, strict guardrails on what they can and cannot do, and a four-layer defense system that enforces security at
The official TypeScript/Node client for the Pinecone vector database
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
The Pinecone Python client
The Go client for Chroma vector database
SQLite-Vector is a cross-platform, ultra-efficient SQLite extension that brings vector search capabilities to your embedded database.
754 structured cybersecurity skills for AI agents Β· Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF Β· agentskills.io standard Β· Works with Claude Code, GitHub Cop
TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
The SDK For Browser Agents
Droid LLM Hunter is a tool to scan for vulnerabilities in Android applications using Large Language Models (LLMs).
The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.
RAGLight is a modular framework for Retrieval-Augmented Generation (RAG). It makes it easy to plug in different LLMs, embeddings, and vector stores, and now includes seamless MCP integration to connec
"RAG-Anything: All-in-One RAG Framework"
Lightweight semantic code search engine β 2-stage vector + FTS + RRF fusion + MCP server for Claude Code
Make your OpenClaw agents better, cheaper, and faster.
A type-safe, lightweight, modern, and performant binding Java binding of Microsoft's ONNX Runtime
Local AI anywhere, for everyone β LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.
We gave AI agents a brain. Memory, planning, continuity, and self-repair β the missing cognitive architecture layer. Runs on your Mac.
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features β fully self-hosted via Ollama.
Your AI-powered SWE teammate, built into your git workflow
Autonomous AI agent for Crustocean, powered by Hermes Agent from Nous Research
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
Self-hosted orchestrator for AI autonomous agents. Run Claude Code & Open Code in isolated linux workspaces. Manage your skills, configs and encrypted secrets with a git repo.
CloneMe is an advanced AI platform that builds your digital twinβan AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
An AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesβsuch as security vulnerabilities, bugs, and maintainability concerns.
Implement a Pytorch-like DL library in C++ from scratch, step by step
Self-hosted AI coding assistant
Complete Workspace Template for OpenClaw - Full agent lifecycle with unified memory system (Markdown + SQLite), self-evolution, RAG. Not for SubAgent/Skill use.
Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate β built to help studen
A self-operating entity with $50+ in real USDC that sells article summaries for $0.03, pays $0.018 in Ollama compute costs, and autonomously raises its price when running low all while tracking itsel
Build and manage projects with an autonomous browser-based IDE featuring integrated multi-modal AI tools for efficient development workflows.
Lightweight, modular AI agent runtime β thinks (Hrafn) and remembers (MuninnDB) π¦ββ¬
Open-source autonomous AI assistant with 5-tier security, 62 tools, 14 LLM providers. Written in Rust. Single binary.
π Process JSON data in batches with `llm-batch`, leveraging sequential or parallel modes for efficient interaction with LLMs.
Agent-ready telemetry SDK β enriches OpenTelemetry across Java, Go, Python, Node.js, and browser with structured context for AI-driven observability.
Modular multi-agent orchestration framework powered by LangGraph and FastAPI.
Local-first autonomous coding agent that plans, executes, validates, and finishes software tasks end-to-end.
Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services
A command-line interface tool for serving LLM using vLLM.
AIGNE DocSmith is a powerful, AI-driven documentation generation tool built on the AIGNE Framework. It automates the creation of detailed, structured, and multi-language documentation directly from yo
Superagent protects your AI applications against prompt injections, data leaks, and harmful outputs. Embed safety directly into your app and prove compliance to your customers.
Lightweight hallucination detection framework for RAG applications
Deterministic governance engine for AI agents. Enforce rules defined in .md governance files across AI systems.
TSUKUYOMI is an advanced modular intelligence framework designed for the democratization of Intelligence Analysis via systematic analysis, processing, and reporting across multiple domains. Built on a
A simple neural network inference framework
MCP (Model Context Protocol) Servers authored and maintained by the PulseMCP team. We build reliable servers thoughtfully designed specifically for MCP Client-powered workflows.
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge base
Pure C ONNX runtime with zero dependancies for embedded devices
A Model Context Protocol (MCP) server that provides secure, read-only access to BigQuery datasets. Enables Large Language Models (LLMs) to safely query and analyze data through a standardized interfac
LSP server leveraging LLMs for code completion (and more?)
π€π aiFlows: The building blocks of your collaborative AI
Medical-AI is a AI framework specifically for Medical Applications https://aibharata.github.io/medicalAI/
