Search results for "evals"
Evaluation and Tracking for LLM Experiments and AI Agents
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonomous agent on top!
AI Agent Framework, the Pydantic way
The agent engineering platform
structured outputs for llms
AI observability platform for production LLM and agent systems.
Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support
A comprehensive evaluation framework for AI agents and LLM applications.
A SEC EDGAR MCP (Model Context Protocol) Server
Supercharge Your LLM Application Evaluations π
Memory library for building stateful agents
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
Markdown-first work-memory protocol for existing agents, with maintained knowledge, candidate notes, evals, and an example KB.
An Excel AI agent that uses MCP tools to let LLMs read, edit, and automate Excel spreadsheets.
AI skills that turns coding agents into UiPath experts.
One memory layer for every AI agent. Local-first, markdown source of truth, and CLI/HTTP/MCP native. Your agent forgot who you are. Again. Dory fixes that.
The LLM Evaluation Framework
The production runtime for AI agents. Schema in, API out. Built on PydanticAI + FastAPI.
π¦Ύ A productionβready research outreach AI agent that plans, discovers, reasons, uses tools, autoβbuilds cited briefings, and drafts tailored emails with toolβchaining, memory, tests, and turnkey Dock
Framework for large language model evaluations
