freshcrate — Search

Search results for "llm-eval"

10 results found (Python)

mlflow-skinny 📁3.11.1🏛️ Flagship⭐25,478

MLflow is an open source platform for the complete machine learning lifecycle

opik 📁2.0.9🏛️ Flagship⭐18,965

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

evaluation hacktoberfest hacktoberfest2025 langchain llama-index llm llm-evaluation llm-observability pythonby comet-mlPython

AI-Infra-Guard 📁v4.1.4🌳 Mature⭐3,521

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

agent agent-security ai-infra ai-red-teaming ai-security llm llm-evaluation llm-jailbreak pythonby TencentPython

mlflow 📁ts/v0.2.0-rc.1🏛️ Flagship⭐25,479

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

agentops agents ai ai-governance apache-spark evaluation langchain llm-evaluation pythonby mlflowPython

giskard-oss 📁giskard-checks/v1.0.2b1🏛️ Flagship⭐5,289

🐢 Open-Source Evaluation & Testing library for LLM Agents

agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation pythonby Giskard-AIPython

trulens 📁trulens-2.7.2🌳 Mature⭐3,261

Evaluation and Tracking for LLM Experiments and AI Agents

agent-evaluation agentops ai-agents ai-monitoring ai-observability evals explainable-ml llm-eval pythonby trueraPython

AutoRAG 📁v0.3.22🌳 Mature⭐4,713

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation pythonby Marker-Inc-KoreaPython

GTA 📁v0.2.0🌿 Growing⭐143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

llm-agent llm-evaluation pythonby open-compassPython

arag 📁v0.1.0🌿 Growing⭐252

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

agent agentic-ai agenticrag deepresearch evaluation graphrag llm llmagents pythonby Ayanami0730Python

deepeval 📁v3.9.5🌳 Mature⭐14,911

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics pythonby confident-aiPython