awesome-agent-benchmarks

Home > Testing > awesome-agent-benchmarks

awesome-agent-benchmarks

🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

agent-based-modeling agent-benchmark agentic agentic-ai ai ai-agent ai-models awesome

Why this rank:Recent releaseHealthy release cadenceStrong adoption

Description

🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

Release History

Version	Changes	Urgency	Date
master@2026-07-18	Latest activity on master branch	High	7/18/2026
0.0.0	No release found — using repo HEAD	High	4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

opentulpaSelf-hosted personal AI agent that lives in your DMs. Describe any workflow: triage Gmail, pull a Giphy feed, build a Slack bot, monitor markets. It writes the code, runs it, schedules it, and saves imain@2026-07-21

evalsA comprehensive evaluation framework for AI agents and LLM applications.v1.0.3

AgenvoyAgentic framework | Self-improving memory | Pluggable tool extensions | Sandbox executionv0.28.27

haystack-cookbook👩🏻‍🍳 A collection of example notebooks using Haystackmain@2026-07-23

ClawRecipesSave 120+ Hours of Setup Pain (I did it for you) – Launch Your OpenClaw Agent Teams with 1 Command (15+ Recipes)v0.5.2

More in Testing

multi-agent-ralph-loopAutonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,

trulensEvaluation and Tracking for LLM Experiments and AI Agents

ObservalObserval is an AI agent registry with first in class observabilty and eval framework

pilot#1 Terminal Benchmark 2.0 — AI that ships your tickets.