freshcrate
Home > #evaluation

Tag: #evaluation

14 packages • ⭐ 130,490 total stars

mlflowv3.11.1🌱 Seedling25,285

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

langfusev3.169.0🌿 Growing24,578

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

promptfoocode-scan-action-0.1.5🌿 Growing19,943

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

opik2.0.6🌳 Mature18,767

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

WeKnorav0.4.0🌳 Mature13,819

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

ragasv0.4.3🌱 Seedling13,329

Supercharge Your LLM Application Evaluations 🚀

AutoRAGv0.3.22🌱 Seedling4,693

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

agentav0.96.7🌳 Mature4,011

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

langwatchskills@v0.3.0🌿 Growing3,193

The platform for LLM evaluations and AI agent testing

OpenClawProBenchmain@2026-04-15🌿 Growing340

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

aragv0.1.0🌿 Growing247

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

evalsv0.1.15🌿 Growing103

A comprehensive evaluation framework for AI agents and LLM applications.

arthur-engine2.1.529🌿 Growing75

Make AI work for Everyone - Monitoring and governing for your AI/ML