freshcrate
Skin:/
Home > Testing > octobench

octobench

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

Description

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

README

trading

Release History

VersionChangesUrgencyDate
main@2026-06-02Latest activity on main branchHigh6/2/2026
0.0.0No release found โ€” using repo HEADHigh4/9/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.v0.6.103
hatch3rInstall an agentic coding setup that adds multiple AI agents, skills, and rules to enhance automation across GitHub, Azure DevOps, or GitLab repositories.main@2026-06-04
samplesAgent samples built using the Strands Agents SDK.main@2026-06-04
claude-container๐Ÿณ Run Claude Code safely in isolated Docker containers with persistent projects and easy setup on macOS using Justfile automation.master@2026-06-02
sdk-pythonA model-driven approach to building AI agents in just a few lines of code.python/v1.42.0

More in Testing

vector-db-benchmarkFramework for benchmarking vector search engines
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v
llm_context_benchmarks ๐Ÿ“Š LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz