Search results for "benchmark"
#1 Terminal Benchmark 2.0 โ AI that ships your tickets.
Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica
Local-first memory plugin for OpenClaw AI agents. LLM-powered extraction, plain markdown storage, hybrid search via QMD. Gives agents persistent long-term memory across conversations.
PraisonAI ๐ฆ โ Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous agents that research, plan, code, and execute tasks. Deployed in 5 lines of code with built-in memory, R
Persistent memory for AI coding agents
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.
LeanKG: Stop Burning Tokens. Start Coding Lean.
ByteRover CLI (brv) - The portable memory layer for autonomous coding agents (formerly Cipher)
ARIS โ๏ธ (Auto-Research-In-Sleep) โ Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in โ works wi
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for
The leading, most token-efficient MCP server for GitHub source code exploration via tree-sitter AST parsing
Memory that lasts and compounds. MentisDB gives agents durable memory so they do not just remember, they improve over time. It stores append-only thought chains plus a Git-like skills registry, lett
Own your AI. The native macOS harness for AI agents -- any model, persistent memory, autonomous execution, cryptographic identity. Built in Swift. Fully offline. Open source.
The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.
LLM Agent that leverages cheminformatics tools to provide informed responses.
Group Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
ReLE่ฏๆต๏ผไธญๆAIๅคงๆจกๅ่ฝๅ่ฏๆต๏ผๆ็ปญๆดๆฐ๏ผ๏ผ็ฎๅๅทฒๅๆฌ359ไธชๅคงๆจกๅ๏ผ่ฆ็chatgptใgpt-5.2ใo4-miniใ่ฐทๆญgemini-3-proใClaude-4.6ใๆๅฟERNIE-X1.1ใERNIE-5.0ใqwen3-maxใqwen3.5-plusใ็พๅทใ่ฎฏ้ฃๆ็ซใๅๆฑคsenseChat็ญๅ็จๆจกๅ๏ผ ไปฅๅstep3.5-flashใkimi-k2.5ใernie4.5ใMin
Make AI work for Everyone - Monitoring and governing for your AI/ML
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Enterprise-grade (40m+ lines) codebase intelligence in a zero-setup, private and local Claude Plugin or MCP: managed indexing, hybrid semantic search, polyglot code dependency graphs, and DB/API/infra
The agent-native LLM router for OpenClaw. 41+ models, <1ms routing, USDC payments on Base & Solana via x402.
Universal memory layer for AI Agents
SDL-MCP (Symbol Delta Ledger MCP Server) is a cards-first context system for coding agents that saves tokens and improves context.
Open-source sandboxes for code execution, browser use, and AI agents.
Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us
Lightweight persistent memory system for Claude Code โ FTS5 search, episode batching, error-triggered recall
Autonomous Agents (LLMs) research papers. Updated Daily.
A functional programming language optimized for LLM code generation. Compiles to Rust and WebAssembly.
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pr
๐ฅ Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
This repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding โ they're redefining how software changes the world.
Unleash Next-Level AI! ๐ ๐ป Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! ๐ Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! ๐ OpenAI-Compatible. ๏ฟฝ
Framework for benchmarking vector search engines
๐ The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architect
Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
Fast Compiler for C# Expression Trees and the lightweight LightExpression alternative. Diagnostic and code generation tools for the expressions.
Memori is agent-native memory infrastructure. A SQL-native, LLM-agnostic layer that turns agent execution and conversation into structured, persistent state for production systems.
A modular MCP server that provides commonly used developer tools for AI coding agents
AgenticX is a unified, production-ready multi-agent platform โ Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical m
Brain-inspired knowledge graph: spreading activation, Hebbian learning, memory consolidation.
Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 12 platforms
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.
Self-evolving cognitive memory and context engine for AI agents in Java. Empowering 24/7 proactive agents like OpenClaw with understanding and SOTA performance.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related website
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules โ running loca
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
mkdir beats vector DB. B-tree NeuronFS: 0-byte folders govern AI โ โฉ0 infrastructure, ~200x token efficiency. OS-native constraint engine for LLM agents.
Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,
๐ LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz
The Next-Gen Agent-Native Skill Recommendation Engine
Fast, small, and fully autonomous AI personal assistant infrastructure, ANY OS, ANY PLATFORM โ deploy anywhere, swap anything ๐ฆ
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
AI-first security scanner with 76 analyzers, 9,600+ detection rules, and repo poisoning detection for AI/ML, LLM agents, and MCP servers. Scan any GitHub repo with: medusa scan --git user/repo
A lightweight, lightning-fast, in-process vector database
Life sciences computational skills for scientific AI agents
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
MaverickMCP - Personal Stock Analysis MCP Server
BioMCP: Biomedical Model Context Protocol
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
High-performance zero-dependency L4/L7 load balancer written in Go. Single binary with Web UI, clustering, MCP/AI integration. 8.5K RPS, 39 E2E tests.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Pragmatic AI Labs MCP Agent Toolkit - An MCP Server designed to make code with agents more deterministic
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Unified framework for building enterprise RAG pipelines with small, specialized models
A multi-agent LLM system for detecting and resolving cognitive dissonance.
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with ๐ฆ by the humans at https://kilo.ai
Semantic code searcher and codebase utility
A curated list of awesome works related to high dimensional structure/vector search & database
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
The Mind Palace for AI Agents โ Autonomous Cognitive OS with affect-tagged memory (valence engine), token-economic RL (surprisal gate + UBI), Hebbian learning, ACT-R spreading activation, Synapse Engi
๐ฌ Harness Vibe Research with Self-evolving AI Scientists
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Must-read papers on Repository-level Code Generation & Issue Resolution ๐ฅ
Plugin suite + bundled MCP servers for Claude Code. Full delivery lifecycle: Agile pipeline with multi-model AI review, project bootstrap, documentation generation, codebase audits, performance optimi
Your personal AI knowledge system โ self-hosted, agent-driven, and always private.
Token-efficient browser MCP server โ structured web pages for AI agents, not raw accessibility dumps
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
Lightweight, embedded graph-based memory system for AI applications. Fast (<3ms recall), offline-first, with MCP server support for Claude and other AI tools.
Declarative Self Improving Elixir - DSPy Orchestration in Elixir
BigQuery MCP server for Claude โ query any BigQuery dataset in natural language, with built-in SEO analysis tools for GSC bulk export data
Local-first identity, memory, and secrets for AI agents. Portable state across models and harnesses.
A powerful Model Context Protocol (MCP) server providing comprehensive Google Maps API integration with LLM processing capabilities.
Open-source DNS & email security scanner. One MCP endpoint, 57 checks, zero install. Cloudflare Workers.
Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat.
Agentic memory for CTI in Python โ STIX knowledge graphs, threat-actor alias resolution, offline-first RAG, MCP server for Claude Code and LangChain agents
๐ค The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil
Synthadoc: An open-source LLM knowledge compilation engine that turns raw documents into structured, local-first wikis. A transparent, human-readable alternative to traditional RAG, which can be self-
The LLM Evaluation Framework
Autonomous AI agent that researches viral content, generates posts, publishes them, measures engagement โ and rewrites its own strategy based on what worked. Self-learning loop powered by LangGraph +
A collection of Agent Skills Standard and Best Practice for Programming Languages, Frameworks that help our AI Agent follow best practies on frameworks and programming laguages
Self-hosted AI Agent Memory + Code Intelligence Platform โ one MCP endpoint for persistent memory, AST-aware code search, shared knowledge, and quality enforcement across all your AI coding agents.
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
A lock-free, in-memory fuzzy search engine for Kotlin Multiplatform. L2-normalized sparse vector embeddings with O(1) cosine similarity โ handles typos, transpositions, and blind continuation. Zero-al
An open-source AI assistant framework with skills and agent architecture
C# .NET NOSQL ( key value, object store embedded TextSearch SemanticSearch Vector layer ) ACID multi-paradigm database management system.
TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
Autonomous VAPT platform. Give it a target (FQDN, IP, CIDR) โ it hunts, it reports. Inspired by the Obsidian Order.
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.
Single-file memory layer for AI agents, sub mili-second RAG on Apple Silicon. Metal Optimized On-Device. No Server. No API. One File. Pure Swift
My personal Claude Code and OpenAI Codex setup with battle-tested skills, commands, hooks, agents and MCP servers that I use daily.
Universal memory layer for AI applications. Self-host in minutes. Open source.
Benchmark for vector databases.
Mattermost Agents plugin supporting multiple LLMs
AI engineering framework with quality gates, persistent memory, and multi-platform support. Works inside Claude Code, Cursor, Copilot, Codex, and Gemini.
Local-first AI agent bootstrap: Playwright Browser MCP + ContextDB for Codex CLI, Claude Code, Gemini CLI, and OpenCode.
A deterministic development harness for Claude Code โ MCP workflow engine, enforcement hooks, YAML workflows, and multi-agent consensus (Claude + Codex + Gemini)
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features โ fully self-hosted via Ollama.
Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent โ watches, understand
Local AI anywhere, for everyone โ LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.
Open Framework for AI Agents to play Red Alert through Reinforcement Learning
๐กโ๏ธAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reporting๐กโ๏ธ
Self-evolving AI agent framework with 5-layer safety gatekeeper. Agents observe failures, propose fixes, and safely apply them. Built on HKUDS/nanobot.
๐ง Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.
MCP plugin that intercepts AI agent edits in RAM, validates them (TypeScript compiler + gopls + pyright), auto-heals missing imports, and commits atomically. If anything breaks, disk stays untouched
Implement a Pytorch-like DL library in C++ from scratch, step by step
Computer Environments Elicit General Agentic Intelligence in LLMs
Local-first Agentic Memory Layer for MCP Agents โข 25 tools โข Hybrid search (FTS5 + vector + MMR) โข GDPR โข 100% local
Autonomous local AI assistant in Go โ 40+ tools, 20+ LLM providers, multi-agent orchestration, self-improving
The graph-native hybrid retrieval engine for AI and GraphRAG. Graph + Vector + Full-Text in a single transactional engine.
Supercharge Your LLM Application Evaluations ๐
A tool supports OPENAI and other LLMs with Claude Skills, you can also use it as a subagent
PromptManager is a desktop application for cataloguing, searching, and executing AI prompts, and much more.
Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.
ASAN: A conceptual architecture for a self-creating (autopoietic), energy-efficient, and governable multi-agent AI system.
Open infrastructure/control plane for Unchained
An AI guardian that remembers, watches, and acts.
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
A framework for optimizing textual system components (AI prompts, code snippets, etc.) using LLM-based reflection and Pareto-efficient evolutionary search.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Faster Whisper transcription with CTranslate2
Embeddings, Retrieval, and Reranking
GraphQL Framework for Python
Client library to connect to the LangSmith Observability and Evaluation Platform.
Fast implementation of asyncio event loop on top of libuv
Production-ready AI agent library using AI SDK v6 ToolLoopAgent for GAIA benchmarks with swappable providers
Benchmark framework for evaluating crypto skills in AI agent ecosystems
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge base
A simple neural network inference framework
FlexRAG: A RAG Framework for Information Retrieval and Generation.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Efficient Retrieval Augmentation and Generation Framework
