1 package • ⭐ 1 total stars
Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.