Home > #evaluation-framework
Tag: #evaluation-framework
3 packages • ⭐ 34,645 total stars
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and
The LLM Evaluation Framework
Define and control AI agents in markdown with full prompt transparency, persistent memory, and integrated tools via the Claude Agent SDK.
