Search results for "eval"
Agent ensembles to design, generate, and select the best code for every task.
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
The platform for LLM evaluations and AI agent testing
๐ชข Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. ๐YC W23
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and
The Mind Palace for AI Agents โ Autonomous Cognitive OS with affect-tagged memory (valence engine), token-economic RL (surprisal gate + UBI), Hebbian learning, ACT-R spreading activation, Synapse Engi
From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
Codingbuddy orchestrates 29 specialized AI agents to deliver code quality comparable to a team of human experts through a PLAN โ ACT โ EVAL workflow.
Latitude is the open-source agent engineering platform
Universal AI Development Platform with MCP server integration, multi-provider support, and professional CLI. Build, test, and deploy AI applications with multiple ai providers.
A Model Context Protocol (MCP) server that gives Claude direct control over Strudel.cc for AI-assisted music generation and live coding.
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for
The Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.
Generate a map of your codebaseto help AI Agents understand your architecture, coding conventions and patterns. Discoverable with Semantic Search
The app framework built for AI coding agents. Own every line. Your AI already knows how to build on it.
Open-source security platform for AI agents -- audits skills before install, monitors 24/7, shares threat intelligence across all users. | AI Agent ้ๆบๅฎๅ จๅนณๅฐ -- ๅฎ่ฃๅๅฏฉ่จ skillใ24/7 ๅณๆ็ฃๆงใ็คพ็พคๅ ฑไบซๅจ่ ๆ ๅ ฑใ
An MCP server for interacting with Sentry via LLMs.
AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework
A collection of Agent Skills Standard and Best Practice for Programming Languages, Frameworks that help our AI Agent follow best practies on frameworks and programming laguages
Anti-detection browser server for AI agents โ REST API wrapping Camoufox engine with OpenClaw plugin support
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
Make your OpenClaw agents better, cheaper, and faster.
MAGI: Markdown for Agent Guidance & Instruction - A next-generation markdown extension designed specifically for AI systems. MAGI enhances standard markdown with structured metadata, embedded AI instr
Official Repo of Moss
The Self-Growing Karpathy LLM Wiki โ grown by an AI agent yoyo from Karpathy's founding prompt
A Model Context Protocol (MCP) server that provides advanced code analysis and reasoning capabilities powered by Google's Gemini AI
AI agent security scanner. Detect vulnerabilities in agent configurations, MCP servers, and tool permissions. Available as CLI, GitHub Action, ECC plugin, and GitHub App integration. ๐ก๏ธ
๐ค Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.
The most comprehensive MCP server for Polymarket โ 48 tools spanning direct trading, market discovery, smart money tracking, copy trading, backtesting, risk management, and portfolio optimization. Wor
Production-ready AI agent framework โ semantic memory, multi-agent mesh, MCP server, intelligent routing, governance, and 67+ platform integrations.
Production-grade TypeScript AI runtime focused on reliability, governance, and reproducible LLM systems. Multi-provider gateway, agents, RAG, workflows, policy engine, audit trails, and deterministic
kbot โ the AI agent that dreams, learns, and evolves. 764+ tools, 35 agents, 20 providers. Music production, iPhone control, financial analysis, cyber threat intel. Always-on daemon. Runs offline. npm
Open-source Cloudflare Browser Rendering proxy โ 10 MCP tools for Claude Code (content, screenshot, PDF, markdown, scrape, JSON AI extraction, links, a11y, crawl)
AI agent evaluation framework for Claude and beyond
A standalone library for AI agent regression testing using LLM-as-judge evaluation
Build semantic vector databases from code and docs to enable AI agents to understand and navigate your entire codebase effectively.
Define and control AI agents in markdown with full prompt transparency, persistent memory, and integrated tools via the Claude Agent SDK.
ChatFlow - AI-based chat flow framework, personalize your ChatGPT workflows and build the road to automationใChatFlow โโ ๆ้ ไธชๆงๅ ChatGPT ๆต็จ๏ผๆๅปบ่ชๅจๅไน่ทฏ
