freshcrate

Search results for "eval"

Clear filters
96 results found (Python)
llama-index📁0.14.21🏛️ Flagship48,773

Interface between LLMs and your data

mlflow-skinny📁3.11.1🏛️ Flagship25,478

MLflow is an open source platform for the complete machine learning lifecycle

langsmith📁0.7.33🌳 Mature858

Client library to connect to the LangSmith Observability and Evaluation Platform.

google-cloud-aiplatform📁1.148.1🌳 Mature880

Vertex AI API client library

onyx📁v3.2.6🏛️ Flagship27,905

Open Source AI Platform - AI Chat with advanced features that works with every LLM

opik📁2.0.9🏛️ Flagship18,965

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

ragflow📁v0.25.0🏛️ Flagship78,674

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

adk-python📁v1.31.1🏛️ Flagship19,165

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

evals📁v0.1.15🌿 Growing106

A comprehensive evaluation framework for AI agents and LLM applications.

arthur-engine📁2.1.529🌿 Growing77

Make AI work for Everyone - Monitoring and governing for your AI/ML

AI-Infra-Guard📁v4.1.4🌳 Mature3,521

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

fast-agent📁v0.6.17🌳 Mature3,750

Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support

logfire📁v4.32.1🌳 Mature4,185

AI observability platform for production LLM and agent systems.

mlflow📁ts/v0.2.0-rc.1🏛️ Flagship25,479

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

giskard-oss📁giskard-checks/v1.0.2b1🏛️ Flagship5,289

🐢 Open-Source Evaluation & Testing library for LLM Agents

trulens📁trulens-2.7.2🌳 Mature3,261

Evaluation and Tracking for LLM Experiments and AI Agents

AutoRAG📁v0.3.22🌳 Mature4,713

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

RAG-Anything📁v1.2.10🏛️ Flagship16,790

"RAG-Anything: All-in-One RAG Framework"

fast-plaid📁1.4.5🌿 Growing245

High-Performance Engine for Multi-Vector Search

txtai📁v9.7.0🏛️ Flagship12,412

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

llm-wiki📁v1.1.0-rc8🌿 Growing139

LLM-powered knowledge base from your Claude Code, Codex CLI, Copilot, Cursor & Gemini sessions. Karpathy's LLM Wiki pattern — implemented and shipped.

qwe-qwe📁v0.17.6🌱 Seedling35

⚡ Lightweight offline AI agent for local models. No cloud, no API keys — just your GPU.

ai-plugin-scanner📁v2.0.45🌿 Growing158

Security and best-practices scanner for AI Plugins, covering Codex, Claude, Opencode, Gemini & more. Scores trust for plugins 0-100.

cognithor📁v0.92.3🌿 Growing115

Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us

PraisonAI📁v4.6.27🏛️ Flagship6,969

PraisonAI 🦞 — Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous agents that research, plan, code, and execute tasks. Deployed in 5 lines of code with built-in memory, R

claude-code-plugins-plus-skills📁v4.26.0🌳 Mature1,995

423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.

mcp-client-for-ollama📁v0.28.0🌳 Mature655

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l

ha-mcp📁v7.3.0.dev386🌳 Mature2,465

The Unofficial and Awesome Home Assistant MCP Server

restai📁v6.1.45🌿 Growing485

RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage, tuning, analytics etc. Built-in image/audio generat

Auto-claude-code-research-in-sleep📁v0.4.4🏛️ Flagship7,173

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works wi

ISC-Bench📁v0.0.5🌳 Mature799

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

synaptic-memory📁v0.16.0🌱 Seedling27

Brain-inspired knowledge graph: spreading activation, Hebbian learning, memory consolidation.

connectonion📁v0.9.1🌳 Mature863

The Best AI Agent Framework for Agent Collaboration.

caveman📁v1.6.0🏛️ Flagship42,198

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

LRAT📁0.0.0🌱 Seedling39

The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.

any-agent📁1.18.0🌳 Mature1,153

A single interface to use and evaluate different agent frameworks

ai-agents-reality-check📁0.0.0🌿 Growing57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

llmware📁v0.4.6🌿 Growing14,862

Unified framework for building enterprise RAG pipelines with small, specialized models

claude-codex-settings📁v2.3.0🌳 Mature623

My personal Claude Code and OpenAI Codex setup with battle-tested skills, commands, hooks, agents and MCP servers that I use daily.

LIA-Assistant📁v1.17.1🌱 Seedling17

Open-source multi-agent AI assistant powered by LangGraph, FastAPI & Next.js — 16+ agents, Human-in-the-Loop, MCP integration, voice TTS, RAG, 500+ metrics, 6 languages.

arag📁v0.1.0🌿 Growing252

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

cyllama📁0.2.11🌱 Seedling25

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

RAGElo📁0.4.0🌿 Growing128

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

JRVS📁0.0.0🌿 Growing236

JRVS AI Agent with JARCORE autonomous coding engine - RAG knowledge base, web scraping, calendar, code generation. Powered by whatever local AI you choose.

ragas📁v0.4.3🌳 Mature13,570

Supercharge Your LLM Application Evaluations 🚀

Observal📁v0.2.0🌿 Growing572

Observal is an AI agent registry with first in class observabilty and eval framework

GTA📁v0.2.0🌿 Growing143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

yao-meta-skill📁main@2026-04-19🌿 Growing297

YAO = Yielding AI Outcomes. A lightweight but rigorous system for creating, evaluating, packaging, and governing reusable agent skills.

OpenClawProBench📁main@2026-04-15🌿 Growing453

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

claw-eval📁main@2026-04-15🌿 Growing465

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

TrustRAG📁0.0.0🌳 Mature1,253

TrustRAG:The RAG Framework within Reliable input,Trusted output

AgenticX📁v0.3.7🌿 Growing114

AgenticX is a unified, production-ready multi-agent platform — Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical m

PageIndex📁main@2026-04-10🌿 Growing25,597

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

atomic-knowledge📁v0.2.0🌱 Seedling36

Markdown-first work-memory protocol for existing agents, with maintained knowledge, candidate notes, evals, and an example KB.

tulip_agent📁0.0.0🌱 Seedling44

autonomous agent with access to a tool library

sec-edgar-mcp📁v1.0.8🌿 Growing253

A SEC EDGAR MCP (Model Context Protocol) Server

pdd📁main@2026-04-21🌿 Growing656

Prompt Driven Development Command Line Interface

deer-flow📁main@2026-04-21🌿 Growing63,234

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of ta

agentic-chatops📁main@2026-04-20🌿 Growing100

3-tier agentic ChatOps (n8n + GPT-4o + Claude Code) implementing all 21 patterns from "Agentic Design Patterns" — solo operator managing 137 devices

auto-deep-researcher-24x7📁main@2026-04-19🌿 Growing622

🔥 An autonomous AI agent that runs your deep learning experiments 24/7 while you sleep. Zero-cost monitoring, Leader-Worker architecture, constant-size memory.

cognitive-dissonance-dspy📁main@2026-04-14🌿 Growing276

A multi-agent LLM system for detecting and resolving cognitive dissonance.

cdpilot📁v0.3.0🌱 Seedling25

Zero-dependency browser automation CLI. 70+ commands, 10 test assertions, smart commands (click/fill by text — no LLM needed). MCP server for AI agents with 500x fewer tokens. Extract, observe, script

claude-code-config📁0.0.0🌱 Seedling88

Claude Code skills, architectural principles, and alternative approaches for AI-assisted development

learn-hermes-agent📁0.0.0🌱 Seedling16

A 27-chapter hands-on tutorial for building an autonomous AI agent from zero in Python. Agent loop, tool system, memory, skills, MCP, multi-platform gateway, and self-evolution — inspired by Herme

NanoCoder-Pro📁0.0.0🌱 Seedling54

NanoCoder Pro — Autonomous Coding Agent with Master-SubAgent Architecture

simplenote-mcp-server📁v1.15.0🌱 Seedling17

MCP Server for Simplenote integration with Claude Desktop

llm_context_benchmarks📁0.0.0🌱 Seedling59

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

sinain-hud📁overlay-v2.8.0🌱 Seedling5

Ambient intelligence that sees what you see, hears what you hear, and acts on your behalf

dory📁v0.1.0🌱 Seedling14

One memory layer for every AI agent. Local-first, markdown source of truth, and CLI/HTTP/MCP native. Your agent forgot who you are. Again. Dory fixes that.

agent2📁v0.1.0🌱 Seedling26

The production runtime for AI agents. Schema in, API out. Built on PydanticAI + FastAPI.

claude-ruby-grape-rails📁v1.13.4🌱 Seedling5

Claude Code plugin for Ruby, Rails, Grape, PostgreSQL, Redis, and Sidekiq development

claude-skills📁v2.0.0🌿 Growing12,208

220+ Claude Code skills & agent plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, product, compliance, C-level advisory.

deltallm📁v0.1.21-rc1🌱 Seedling4

Route, manage, and analyze your LLM requests across multiple providers with a unified API interface

Geneclaw📁v0.1.0🌱 Seedling36

Self-evolving AI agent framework with 5-layer safety gatekeeper. Agents observe failures, propose fixes, and safely apply them. Built on HKUDS/nanobot.

Nightshift📁v0.0.7🌱 Seedling1

Autonomous overnight codebase improvement agent for Claude Code. Run it before bed, wake up to production-ready fixes.

RagaAI-Catalyst📁v2.2.4💤 Dormant16,141

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced anal

DOX📁main@2026-04-15🌱 Seedling2

Broken RAG For The Broken Souls

surf📁0.0.0🌱 Seedling1

The open framework for extensible & grounded AI agent orchestration.

uniAI📁0.0.0🌱 Seedling1

Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate — built to help studen

geon-decoder📁main@2026-04-11🌱 Seedling3

GEON: Structure-first decoding via equivalence classes and field closure

sawzhang_skills📁0.0.0🌱 Seedling2

Claude Code skills collection — CCA study guides, Twitter research, MCP review, auto-iteration tools

pytorch_template📁v0.3.0🌱 Seedling10

AI-agent-friendly PyTorch research pipeline — one YAML config drives preflight, training, Optuna HPO, and real-time TUI monitoring

Agent_Life_Space📁v1.36.0🌱 Seedling1

Self-hosted autonomous AI agent — 9-layer cascade, Docker sandbox, encrypted vault, review/build/control plane, 1407+ tests

evo-agents📁master@2026-04-19🌱 Seedling3

Complete Workspace Template for OpenClaw - Full agent lifecycle with unified memory system (Markdown + SQLite), self-evolution, RAG. Not for SubAgent/Skill use.

Government-Citizen-Services-Voice-Agent📁main@2026-04-15🌱 Seedling1

Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services

idle-harness📁main@2026-04-18🌱 Seedling1

GAN-inspired multi-agent system that autonomously builds full-stack web apps from a single prompt using Claude AI agents

fastRAG📁v3.1.2💤 Dormant1,776

Efficient Retrieval Augmentation and Generation Framework

lmnr0.7.47🌱 Seedling

Python SDK for Laminar

boostedblob1.0.0🌱 Seedling

Command line tool and async library to perform basic file operations on local paths, Google Cloud Storage paths and Azure Blob Storage paths.