freshcrate
Skin:/
Home > Testing > awesome-agent-benchmarks

awesome-agent-benchmarks

๐Ÿง  Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

Description

๐Ÿง  Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

Release History

VersionChangesUrgencyDate
master@2026-06-07Latest activity on master branchHigh6/7/2026
0.0.0No release found โ€” using repo HEADHigh4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

opentulpaSelf-hosted personal AI agent that lives in your DMs. Describe any workflow: triage Gmail, pull a Giphy feed, build a Slack bot, monitor markets. It writes the code, runs it, schedules it, and saves imain@2026-06-05
uix-ai-agent๐Ÿค– Generate UI & UX flows for web and mobile apps using natural language prompts with UIX AI Agent, your intelligent design assistant.main@2026-06-07
AgenvoyAgentic framework | Self-improving memory | Pluggable tool extensions | Sandbox executionv0.26.4
agentic-ai๐Ÿค– Explore AI agent architectures with agentic-ai, featuring ReAct agents, reflection-based designs, and modular LLM integrations using LangChain and LangGraph.main@2026-06-05
DeepAnalyze๐Ÿ” Empower data scientists with DeepAnalyze, a tool that leverages large language models for automated data analysis and insights generation.main@2026-06-05

More in Testing

vector-db-benchmarkFramework for benchmarking vector search engines
fspecFSPEC: The Spec-Driven, Multi-Agent Coding Factory. It is infrastructure for the "Dark Factory"โ€”the emerging model of fully autonomous software development where AI agents handle all implementation wh
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v