Tag: #benchmark
10 packages âĸ â 5,875 total stars
Efficient Retrieval Augmentation and Generation Framework
Fast Compiler for C# Expression Trees and the lightweight LightExpression alternative. Diagnostic and code generation tools for the expressions.
Benchmark for vector databases.
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
Framework for benchmarking vector search engines
A coding agent optimized to smaller LLMs
Production-ready AI agent library using AI SDK v6 ToolLoopAgent for GAIA benchmarks with swappable providers
MCP server giving AI a knowledge graph over Obsidian vaults. 13-layer scoring that learns. Local-first, zero cloud.
Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.
