freshcrate

Search results for "evaluation"

Clear filters
76 results found (Python)
opik๐Ÿ“2.0.6๐ŸŒณ Matureโญ18,767

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

LLM-Agents-Ecosystem-Handbook๐Ÿ“0.0.0๐ŸŒณ Matureโญ508

One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.

ai-agents-reality-check๐Ÿ“0.0.0๐ŸŒฟ Growingโญ57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

openlit๐Ÿ“openlit-1.18.1๐ŸŒฟ Growingโญ2,358

Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. ๐Ÿš€๐Ÿ’ป Integrates with 50+ LLM Providers,

CodeGen๐Ÿ“0.0.0๐ŸŒณ Matureโญ773

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pr

agent-framework๐Ÿ“python-1.1.0๐ŸŒณ Matureโญ9,325

A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.

PraisonAI๐Ÿ“v4.6.25๐ŸŒณ Matureโญ6,900

PraisonAI ๐Ÿฆž โ€” Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous agents that research, plan, code, and execute tasks. Deployed in 5 lines of code with built-in memory, R

OpenSandbox๐Ÿ“docker/execd/v1.0.13๐ŸŒณ Matureโญ9,925

Secure, Fast, and Extensible Sandbox runtime for AI agents.

langchain๐Ÿ“langchain-core==1.3.0๐ŸŒณ Matureโญ133,178

The agent engineering platform

RAPTOR๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ13

RAPTOR (Robust AI-Powered Toolkit for Operational Robots) is an AI-native Content Insight Engine that transforms passive media storage into an intelligent knowledge platform through automated analysis

tulip_agent๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ44

autonomous agent with access to a tool library

LRAT๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ34

The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.

GEA๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ23

Group Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

arthur-engine๐Ÿ“2.1.529๐ŸŒฟ Growingโญ75

Make AI work for Everyone - Monitoring and governing for your AI/ML

cognithor๐Ÿ“v0.92.2๐ŸŒฟ Growingโญ94

Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us

arag๐Ÿ“v0.1.0๐ŸŒฟ Growingโญ247

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

pydantic-deepagents๐Ÿ“0.3.15๐ŸŒฟ Growingโญ648

Python Deep Agent framework built on top of Pydantic-AI, designed to help you quickly build production-grade autonomous AI agents with planning, filesystem operations, subagent delegation, skills, and

MODULAR-RAG-MCP-SERVER๐Ÿ“0.0.0๐ŸŒณ Matureโญ783

A modular RAG (Retrieval-Augmented Generation) system with MCP Server architecture. Using Skill to make AI follow each step of the spec and complete the code 100% by AI.

evals๐Ÿ“v0.1.15๐ŸŒฟ Growingโญ103

A comprehensive evaluation framework for AI agents and LLM applications.

AI-Infra-Guard๐Ÿ“v4.1.4๐ŸŒฟ Growingโญ3,428

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

OpenClawProBench๐Ÿ“main@2026-04-15๐ŸŒฟ Growingโญ340

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

claw-eval๐Ÿ“main@2026-04-15๐ŸŒฟ Growingโญ394

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

unsloth-buddy๐Ÿ“main@2026-04-15๐ŸŒฟ Growingโญ212

Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA ยท TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc

Agentic-RAG-R1๐Ÿ“0.0.0๐ŸŒฟ Growingโญ412

Agentic RAG R1 Framework via Reinforcement Learning

AgenticX๐Ÿ“v0.3.7๐ŸŒฟ Growingโญ105

AgenticX is a unified, production-ready multi-agent platform โ€” Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical m

ISC-Bench๐Ÿ“v0.0.5๐ŸŒฟ Growingโญ786

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

honcho๐Ÿ“main@2026-04-21๐ŸŒฟ Growingโญ2,030

Memory library for building stateful agents

aura๐Ÿ“main@2026-04-21๐ŸŒฑ Seedlingโญ47

A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules โ€” running loca

deer-flow๐Ÿ“main@2026-04-21๐ŸŒฟ Growingโญ60,446

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of ta

LLM-Agent-Paper-daily๐Ÿ“main@2026-04-21๐ŸŒฑ Seedlingโญ20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

samples๐Ÿ“main@2026-04-20๐ŸŒฟ Growingโญ717

Agent samples built using the Strands Agents SDK.

agentscope๐Ÿ“v1.0.19๐ŸŒฟ Growingโญ23,421

Build and run agents you can see, understand and trust.

awesome-code-agents๐Ÿ“main@2026-04-20๐ŸŒฟ Growingโญ94

A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding โ€” they're redefining how software changes the world.

auto-deep-researcher-24x7๐Ÿ“main@2026-04-19๐ŸŒฟ Growingโญ261

๐Ÿ”ฅ An autonomous AI agent that runs your deep learning experiments 24/7 while you sleep. Zero-cost monitoring, Leader-Worker architecture, constant-size memory.

skills-vote๐Ÿ“main@2026-04-19๐ŸŒฑ Seedlingโญ31

The Next-Gen Agent-Native Skill Recommendation Engine

medusa๐Ÿ“v2026.5.5๐ŸŒฟ Growingโญ252

AI-first security scanner with 76 analyzers, 9,600+ detection rules, and repo poisoning detection for AI/ML, LLM agents, and MCP servers. Scan any GitHub repo with: medusa scan --git user/repo

crewAI๐Ÿ“1.14.2๐ŸŒฟ Growingโญ48,611

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

giskard-oss๐Ÿ“giskard-checks/v1.0.2b1๐ŸŒฑ Seedlingโญ5,225

๐Ÿข Open-Source Evaluation & Testing library for LLM Agents

maverick-mcp๐Ÿ“main@2026-04-17๐ŸŒฟ Growingโญ479

MaverickMCP - Personal Stock Analysis MCP Server

trulens๐Ÿ“trulens-2.7.2๐ŸŒฑ Seedlingโญ3,237

Evaluation and Tracking for LLM Experiments and AI Agents

AReaL๐Ÿ“v1.0.3๐ŸŒฟ Growingโญ5,017

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

mlflow๐Ÿ“v3.11.1๐ŸŒฑ Seedlingโญ25,285

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

cognitive-dissonance-dspy๐Ÿ“main@2026-04-14๐ŸŒฟ Growingโญ276

A multi-agent LLM system for detecting and resolving cognitive dissonance.

LLM-Wiki๐Ÿ“main@2026-04-18๐ŸŒฑ Seedlingโญ7

Autonomous knowledge base plugin for Claude Code - captures reserch, ideas, and decisions into an interlinked wiki with reserch-on-miss, semantic search, and a Wikipedia-style web UI. Knowledge compou

ai-real-estate-assistant๐Ÿ“dev@2026-04-13๐ŸŒฟ Growingโญ159

Advanced AI Real Estate Assistant using RAG, LLMs, and Python. Features market analysis, property valuation, and intelligent search.

The Multi-Agent Custom Automation Engine Solution Accelerator is an AI-driven system that manages a group of AI agents to accomplish tasks based on user input. Powered by Microsoft Agent Framework, Az

AutoRAG๐Ÿ“v0.3.22๐ŸŒฑ Seedlingโญ4,693

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

cyber-pilot๐Ÿ“v3.7.0-beta๐ŸŒฟ Growingโญ53

Cyber Pilot is a traceable delivery system for requirements, design, plans, and code.

UltraRAG๐Ÿ“v0.3.0.2๐ŸŒฟ Growingโญ5,480

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

sv-excel-agent๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ179

An Excel AI agent that uses MCP tools to let LLMs read, edit, and automate Excel spreadsheets.

skill๐Ÿ“v1.2.1๐ŸŒฑ Seedlingโญ978

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with ๐Ÿฆ€ by the humans at https://kilo.ai

llm_intents๐Ÿ“1.7.1๐ŸŒฑ Seedlingโญ111

Exposes internet search tools for use by LLM-backed Assist in Home Assistant

Standard๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ18

JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2

any-agent๐Ÿ“1.18.0๐ŸŒฑ Seedlingโญ1,141

A single interface to use and evaluate different agent frameworks

camel๐Ÿ“v0.2.90๐ŸŒฑ Seedlingโญ16,654

๐Ÿซ CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

KawaiiGPT๐Ÿ“KawaiiGPT๐ŸŒฑ Seedlingโญ831

KawaiiGPT โ€” Open-source LLM gateway accessing DeepSeek, Gemini, and Kimi-K2 through reverse-engineered Pollinations API with no API keys required, built-in prompt injection capabilities for security r

RAGElo๐Ÿ“0.4.0๐ŸŒฑ Seedlingโญ128

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

deltallm๐Ÿ“v0.1.20-rc2๐ŸŒฑ Seedlingโญ3

Route, manage, and analyze your LLM requests across multiple providers with a unified API interface

ragas๐Ÿ“v0.4.3๐ŸŒฑ Seedlingโญ13,329

Supercharge Your LLM Application Evaluations ๐Ÿš€

LightAgent๐Ÿ“v0.5.0๐ŸŒฑ Seedlingโญ831

LightAgent: Lightweight AI agent framework with memory, tools & tree-of-thought. Supports multi-agent collaboration, self-learning, and major LLMs (OpenAI/DeepSeek/Qwen). Open-source with MCP/SSE prot

prd-taskmaster๐Ÿ“v3.0.0๐ŸŒฑ Seedlingโญ184

AI-powered PRD generation for Claude Code with taskmaster integration

PAI-RAG๐Ÿ“v0.4.3๐ŸŒฑ Seedlingโญ450

An easy-to-use framework for modular RAG

py-gpt๐Ÿ“v2.7.12๐ŸŒฑ Seedlingโญ1,724

Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, spe

evo-agents๐Ÿ“master@2026-04-19๐ŸŒฑ Seedlingโญ3

Complete Workspace Template for OpenClaw - Full agent lifecycle with unified memory system (Markdown + SQLite), self-evolution, RAG. Not for SubAgent/Skill use.

uniAI๐Ÿ“0.0.0๐ŸŒฑ Seedlingโญ1

Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate โ€” built to help studen

gptme๐Ÿ“v0.31.0๐ŸŒฑ Seedlingโญ4,266

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonomous agent on top!

multi-agent-orchestration-framework๐Ÿ“v0.1.0๐ŸŒฑ Seedlingโญ26

Modular multi-agent orchestration framework powered by LangGraph and FastAPI.

Government-Citizen-Services-Voice-Agent๐Ÿ“main@2026-04-15๐ŸŒฑ Seedlingโญ1

Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services

LettuceDetect๐Ÿ“0.1.8๐Ÿ’ค Dormantโญ545

Lightweight hallucination detection framework for RAG applications

HealthFlow๐Ÿ“datasets๐Ÿ’ค Dormantโญ40

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

RagaAI-Catalyst๐Ÿ“v2.2.4๐Ÿ’ค Dormantโญ16,130

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced anal

FlexRAG๐Ÿ“0.3.0๐Ÿ’ค Dormantโญ235

FlexRAG: A RAG Framework for Information Retrieval and Generation.

Qwen-Agent๐Ÿ“v0.0.26๐Ÿ’ค Dormantโญ15,963

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

medicalAI๐Ÿ“v1.2.9-rcโšฐ๏ธ Archivedโญ21

Medical-AI is a AI framework specifically for Medical Applications https://aibharata.github.io/medicalAI/