Search results for "validation"
Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica
PraisonAI π¦ β Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous agents that research, plan, code, and execute tasks. Deployed in 5 lines of code with built-in memory, R
The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.
ARIS βοΈ (Auto-Research-In-Sleep) β Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in β works wi
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Enhanced Proxmox MCP server with advanced virtualization management and full OpenAPI integration.
AI Agent Framework, the Pydantic way
44 plug-and-play skills for OpenClaw β self-modifying AI agent with cron scheduling, security guardrails, persistent memory, knowledge graphs, and MCP health monitoring. Your agent teaches itself new
Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us
MCP server to manage Facebook and Instagram Ads (Meta Ads)
Python Deep Agent framework built on top of Pydantic-AI, designed to help you quickly build production-grade autonomous AI agents with planning, filesystem operations, subagent delegation, skills, and
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pr
Paper-first SPY options validation platform with broker-backed scorecards, hard risk gates, paired-trade accounting, and live dashboards.
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l
π€ MCP server for Apple Mail - Manage emails with AI using Claude Desktop. Search, send, organize mail with natural language.
AgenticX is a unified, production-ready multi-agent platform β Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical m
Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.
Brain-inspired knowledge graph: spreading activation, Hebbian learning, memory consolidation.
Cyber Pilot is a traceable delivery system for requirements, design, plans, and code.
AINL helps turn AI from "a smart conversation" into "a structured worker." It is designed for teams building AI workflows that need multiple steps, state and memory, tool use, repeatable execution, v
MCP server for Fabric Real-Time Intelligence (https://aka.ms/fabricrti) supporting tools for Eventhouse (https://aka.ms/eventhouse), Azure Data Explorer (https://aka.ms/adx, and other RTI services (co
AI observability platform for production LLM and agent systems.
Official MCP Servers for AWS
Comprehensive paid advertising audit & optimization skill for Claude Code. 225+ checks across Google, Meta, YouTube, LinkedIn, TikTok, Microsoft & Apple Search Ads with weighted scoring, parallel agen
π The fast, Pythonic way to build MCP servers and clients.
AI Agent Backend Platform on FastAPI β MCP server + AI orchestration + async DDD architecture. Zero-boilerplate CRUD, auto domain discovery, 14 Claude Code AI development skills.
Self-hosted orchestration layer for autonomous AI agent teams. Shared memory, heartbeat scheduling, vault-first secrets, and cross-model peer review β one command to deploy.
Memory library for building stateful agents
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules β running loca
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
Project Infinity leverages MCP and Graph RAG to turn LLMs into a professional D&D 5e Game Master, governed by a dedicated dice server and a persistent player database for a truly consistent adventure.
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β they're redefining how software changes the world.
The Next-Gen Agent-Native Skill Recommendation Engine
METAβAGENTIC Ξ±βAGI ποΈβ¨ β Mission π― Endβtoβend: Identify π β OutβLearn π β OutβThink π§ β OutβDesign π¨ β OutβStrategise βοΈ β OutβExecute β‘
Automated security investigation tool using Microsoft MCP Servers, GitHub Copilot, Python Modules and custom copilot-instructions.
A comprehensive evaluation framework for AI agents and LLM applications.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Life sciences computational skills for scientific AI agents
Self-hosted personal AI agent that lives in your DMs. Describe any workflow: triage Gmail, pull a Giphy feed, build a Slack bot, monitor markets. It writes the code, runs it, schedules it, and saves i
MaverickMCP - Personal Stock Analysis MCP Server
Open-Source Intelligent Command Layer
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
The Unofficial and Awesome Home Assistant MCP Server
Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search & Drive with AI - Comprehensive Google Workspace / G Suite MCP Server & CLI Tool
A multi-agent LLM system for detecting and resolving cognitive dissonance.
AI-powered bug bounty hunting from your terminal - recon, 20 vuln classes, autonomous hunting, and report generation. All inside Claude Code.
MCP server for OpenAI's Deep Research APIs, Gemini Deep Research Agent, and Hugging Face's Open Deep Research
The Multi-Agent Custom Automation Engine Solution Accelerator is an AI-driven system that manages a group of AI agents to accomplish tasks based on user input. Powered by Microsoft Agent Framework, Az
π¬ Harness Vibe Research with Self-evolving AI Scientists
structured outputs for llms
Official data.gouv.fr Model Context Protocol (MCP) server that allows AI chatbots to search, explore, and analyze datasets from the French national Open Data platform, directly through conversation.
Enterprise-ready MCP Gateway & Registry that centralizes AI development tools with secure OAuth authentication, dynamic tool discovery, and unified access for both autonomous AI agents and AI coding a
Lightweight, embedded graph-based memory system for AI applications. Fast (<3ms recall), offline-first, with MCP server support for Claude and other AI tools.
The LLM Evaluation Framework
Autonomous quantitative trading research platform that transforms stock lists into fully backtested strategies using AI agents, real market data, and mathematical formulations, all without requiring a
Give your AI agents persistent memory.
Agentic AI assistant on Telegram, powered by Claude Code. Runs locally with shell access, spec-driven PR reviews, layered security, persistent memory, and scheduled jobs. Your machine, your data, your
Declarative framework for orchestrating multi-model LLM pipelines with context engineering and quality gates.
OSCAL tools for AI agents
The official Python SDK for Model Context Protocol servers and clients
OpenAI-compatible HTTP LLM proxy / gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch). Lightweight, extensible Python/FastAPIβuse as library or standalone service.
The production runtime for AI agents. Schema in, API out. Built on PydanticAI + FastAPI.
Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm compliant.
Autonomous VAPT platform. Give it a target (FQDN, IP, CIDR) β it hunts, it reports. Inspired by the Obsidian Order.
Tool that just makes your open source project better using LLM agents
JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2
AI-powered web app builder β describe it, build it, ship it. 2-agent LangGraph system (Sonnet 4.5 + o4-mini) generates React apps from natural language with live preview and one-click deploy.
π§ PromptDrifter β oneβcommand CI guardrail that catches prompt drift and fails the build when your LLM answers change.
KawaiiGPT β Open-source LLM gateway accessing DeepSeek, Gemini, and Kimi-K2 through reverse-engineered Pollinations API with no API keys required, built-in prompt injection capabilities for security r
Project CodeGuard is an open-source, model-agnostic security framework that embeds secure-by-default practices into AI coding agent workflows. It provides comprehensive security rules that guide AI as
A thing that uses AI to write perfect applications. For those who want to know how: a governance runtime enforcing immutable constitutional rules on AI coding agents.
Published in CNCF Landscape: A MCP server for Kubernetes.
A Model Context Protocol server that provides task orchestration capabilities for AI assistants
π‘βοΈAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reportingπ‘βοΈ
Self-evolving AI agent framework with 5-layer safety gatekeeper. Agents observe failures, propose fixes, and safely apply them. Built on HKUDS/nanobot.
CloneMe is an advanced AI platform that builds your digital twinβan AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
AI-powered PRD generation for Claude Code with taskmaster integration
Universal LLM Gateway: One API, every LLM. OpenAI/Anthropic-compatible endpoints with multi-provider translation and intelligent load-balancing.
Broken RAG For The Broken Souls
AI co-pilot for ComfyUI β 113 tools for workflow authoring, model provisioning, and iterative rendering. Multi-provider (Claude, GPT-4o, Gemini, Ollama). Ships as MCP server or standalone CLI.
PromptManager is a desktop application for cataloguing, searching, and executing AI prompts, and much more.
A collection of Summoner clients and agents featuring example implementations and reusable templates
UK due diligence MCP server β Companies House, corporate research, compliance checks
Local-first autonomous coding agent that plans, executes, validates, and finishes software tasks end-to-end.
An automated, agentic exploratory testing tool that performs comprehensive QA testing on web applications, simulating human user interactions through various input methods (mouse, keyboard, TAB naviga
A Python-based framework for building multi-agent systems with LLMs. Currently in pre-launch alpha.
A command-line interface tool for serving LLM using vLLM.
π¦Ύ A productionβready research outreach AI agent that plans, discovers, reasons, uses tools, autoβbuilds cited briefings, and drafts tailored emails with toolβchaining, memory, tests, and turnkey Dock
MCP server for 28 security frameworks (ISO 27001, NIST CSF 2.0, NIST 800-53, SOC 2, IEC 62443)
Medical-AI is a AI framework specifically for Medical Applications https://aibharata.github.io/medicalAI/
