Search results for "evals"
AI Observability & Evaluation
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
AI Agent Framework, the Pydantic way
The agent engineering platform
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
๐ชข Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. ๐YC W23
AI observability platform for production LLM and agent systems.
Build Agentic AI solutions on AWS, using latest OSS Agentic Frameworks.
Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support
Agent ensembles to design, generate, and select the best code for every task.
A comprehensive evaluation framework for AI agents and LLM applications.
Latitude is the open-source agent engineering platform
ReLE่ฏๆต๏ผไธญๆAIๅคงๆจกๅ่ฝๅ่ฏๆต๏ผๆ็ปญๆดๆฐ๏ผ๏ผ็ฎๅๅทฒๅๆฌ359ไธชๅคงๆจกๅ๏ผ่ฆ็chatgptใgpt-5.2ใo4-miniใ่ฐทๆญgemini-3-proใClaude-4.6ใๆๅฟERNIE-X1.1ใERNIE-5.0ใqwen3-maxใqwen3.5-plusใ็พๅทใ่ฎฏ้ฃๆ็ซใๅๆฑคsenseChat็ญๅ็จๆจกๅ๏ผ ไปฅๅstep3.5-flashใkimi-k2.5ใernie4.5ใMin
The app framework built for AI coding agents. Own every line. Your AI already knows how to build on it.
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
Memory library for building stateful agents
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
๐ค Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.
Evaluation and Tracking for LLM Experiments and AI Agents
Framework to build resilient language agents as graphs.
Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.
From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework
An Excel AI agent that uses MCP tools to let LLMs read, edit, and automate Excel spreadsheets.
The LLM Evaluation Framework
A collection of Agent Skills Standard and Best Practice for Programming Languages, Frameworks that help our AI Agent follow best practies on frameworks and programming laguages
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
structured outputs for llms
TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.
The production runtime for AI agents. Schema in, API out. Built on PydanticAI + FastAPI.
A selective learning and memory substrate for agentic systems โ typed, revisable, decayable memory with competence learning and trust-aware retrieval.
Mattermost Agents plugin supporting multiple LLMs
AI-indexed portfolio and CV site with machine-readable profile data, evidence-backed case studies, verification signals, and a live MCP endpoint for agent access.
A SEC EDGAR MCP (Model Context Protocol) Server
Supercharge Your LLM Application Evaluations ๐
Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonomous agent on top!
๐ฆพ A productionโready research outreach AI agent that plans, discovers, reasons, uses tools, autoโbuilds cited briefings, and drafts tailored emails with toolโchaining, memory, tests, and turnkey Dock
