Home > Testing > octobench

octobench

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

agentic agents ai ai-workflow anthropic automation benchmark codex

Why this rank:Recent releaseHealthy release cadenceStrong adoption

Description

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

README

trading

Release History

Version	Changes	Urgency	Date
main@2026-07-19	Latest activity on main branch	High	7/19/2026
0.0.0	No release found — using repo HEAD	High	4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.v0.7.44

hatch3rInstall an agentic coding setup that adds multiple AI agents, skills, and rules to enhance automation across GitHub, Azure DevOps, or GitLab repositories.main@2026-07-19

samplesAgent samples built using the Strands Agents SDK.main@2026-07-17

sdk-pythonA model-driven approach to building AI agents in just a few lines of code.typescript/v1.10.0

autonomous-agentic-research-swarmFile-based autonomous agentic research swarm template (Planner/Worker/Judge) with contracts, workstreams, and deterministic quality gates.main@2026-07-11

More in Testing

multi-agent-ralph-loopAutonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,

trulensEvaluation and Tracking for LLM Experiments and AI Agents

ObservalObserval is an AI agent registry with first in class observabilty and eval framework

pilot#1 Terminal Benchmark 2.0 — AI that ships your tickets.