freshcrate
Skin:/
Home > #evaluation-framework

Tag: #evaluation-framework

3 packages â€ĸ ⭐ 35,294 total stars

promptfoo0.121.14đŸ›ī¸ Flagship⭐20,382

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

harnessmaster@2026-06-04🌱 Seedling⭐1

Define and control AI agents in markdown with full prompt transparency, persistent memory, and integrated tools via the Claude Agent SDK.