octobench
Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.
Why this rank:Recent releaseHealthy release cadenceStrong adoption
Description
Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.
README
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| main@2026-06-02 | Latest activity on main branch | High | 6/2/2026 |
| 0.0.0 | No release found โ using repo HEAD | High | 4/9/2026 |
Dependencies & License Audit
Loading dependencies...
Similar Packages
simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.v0.6.103
hatch3rInstall an agentic coding setup that adds multiple AI agents, skills, and rules to enhance automation across GitHub, Azure DevOps, or GitLab repositories.main@2026-06-04
claude-container๐ณ Run Claude Code safely in isolated Docker containers with persistent projects and easy setup on macOS using Justfile automation.master@2026-06-02
More in Testing
vector-db-benchmarkFramework for benchmarking vector search engines
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโsuch as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v
llm_context_benchmarks ๐ LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz
