Search results for "benchmarking"
Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica
Local-first memory plugin for OpenClaw AI agents. LLM-powered extraction, plain markdown storage, hybrid search via QMD. Gives agents persistent long-term memory across conversations.
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.
LeanKG: Stop Burning Tokens. Start Coding Lean.
The worldβs fastest AI model gateway (450x less overhead than LiteLLM). Unified access to LLMs across endpoints (openAI, self-hosted, etc.) behind a single authentication layer - with API key generati
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
LLM Agent that leverages cheminformatics tools to provide informed responses.
Make AI work for Everyone - Monitoring and governing for your AI/ML
Autonomous Agents (LLMs) research papers. Updated Daily.
π₯ Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
This repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.
Framework for benchmarking vector search engines
π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architect
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l
Autonomous CLI agent integrations for the Spring AI ecosystem with Claude Code, Gemini CLI, and secure sandbox execution
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related website
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β they're redefining how software changes the world.
Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with π¦ by the humans at https://kilo.ai
π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade archit
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Must-read papers on Repository-level Code Generation & Issue Resolution π₯
A collection of Agent Skills Standard and Best Practice for Programming Languages, Frameworks that help our AI Agent follow best practies on frameworks and programming laguages
CLI, MCP server, and npm library that turns any website into an API β no docs, no SDK, no browser.
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Swift-based vector database for on-device RAG using MLTensor and MLX Embedders
Database for Android and JVM - first and fast, lightweight on-device vector database
JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2
DSL and compiler framework for automated finite-differences and stencil computation
π‘βοΈAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reportingπ‘βοΈ
Benchmark for vector databases.
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
