Search results for "benchmarking"
Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica
A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
Make AI work for Everyone - Monitoring and governing for your AI/ML
Framework for benchmarking vector search engines
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β they're redefining how software changes the world.
π LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with π¦ by the humans at https://kilo.ai
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.
[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2
DSL and compiler framework for automated finite-differences and stencil computation
π‘βοΈAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reportingπ‘βοΈ
Benchmark for vector databases.
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
