freshcrate

Search results for "agent-benchmark"

3 results found
ai-agents-reality-check📁0.0.0🌿 Growing57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

OpenClawProBench📁main@2026-04-15🌿 Growing340

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

awesome-agent-benchmarks📁master@2026-04-21🌱 Seedling3

🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.