freshcrate
Home > MCP Servers > ai-agent-handbook

ai-agent-handbook

Comprehensive guide to AI agent engineering: how 30+ frameworks actually work under the hood. Context rot, compaction, system prompt assembly, SOUL.md, agent loops, memory systems, tool sprawl, MCP,

Description

Comprehensive guide to AI agent engineering: how 30+ frameworks actually work under the hood. Context rot, compaction, system prompt assembly, SOUL.md, agent loops, memory systems, tool sprawl, MCP, progressive disclosure, multi-agent orchestration, Plan/Act, episodic memory. Code examples throughout. Pick the right stack, avoid the common traps

README

AI Agent Engineering Handbook

How modern AI agents are built โ€” from 30+ open-source framework codebases.

Should you use LangGraph or CrewAI? What makes OpenClaw tick? Why does Claude Code compact at 92%? How do you stop context rot from killing your agent at 25% fill?

This guide answers these questions โ€” and 200 more.

License: MIT PRs Welcome

Read the Guide ยท PDF


What is this

An organized collection of patterns, architectures, and implementation details from 30+ AI agent frameworks โ€” extracted by reading their actual source code, system prompts, and compaction logic.

Not opinions. Not tutorials. Just documented patterns from codebases that are running in production.

Why it exists

Most agent knowledge in 2026 is either surface-level blog posts or buried in source code nobody has time to read. We went through the codebases of OpenClaw, Claude Code, LangGraph, CrewAI, Hermes Agent, and 25+ others to document how they actually work โ€” the agent loops, the prompt assembly, the context management, the memory systems, the tool architectures.


Questions this answers

  • LangGraph vs CrewAI vs PydanticAI โ€” which one for what?
  • How does OpenClaw's SOUL.md / AGENTS.md pattern work and why does everyone copy it?
  • Why do agents get worse the longer they run? (context rot, and 12 ways to fight it)
  • MCP servers are eating 72% of my context window โ€” how do I fix tool sprawl?
  • Should I use one agent or multiple? When does multi-agent actually help?
  • How does Claude Code handle compaction? What survives, what gets dropped?
  • What's the actual agent loop code? Is it really just a while loop?
  • Skills as markdown vs. compiled tools โ€” what do the top frameworks use?
  • How does Hermes Agent improve itself over time? (episodic memory + self-evolution)
  • What's the minimum viable agent architecture for production?

What's covered

Part Topic Contents
Iโ€“II Agent Loops 8 loop variants with code: ReAct, Plan+Execute, Reflection, Compaction, Code-as-Action, Event-Driven, Graph State Machine, Heartbeat. Termination. Error recovery.
III System Prompts Assembly patterns. SOUL.md / AGENTS.md separation. Skill catalogs. Anti-patterns.
IV Context Management 6 compaction strategies with real prompts from Claude Code and OpenClaw. Trigger strategies. What survives compaction.
IV-B Context Rot 3 mechanisms. 12 defenses. The 40-60% rule. Agent Cognitive Compressor. Measuring degradation.
V Memory 5-tier hierarchy. File-based vs. vector vs. observational vs. episodic. 8 framework implementations compared.
VI Tools MCP. Code-as-action. Skills-as-markdown. JIT loading. Tool sprawl (72% context consumed). 7 solutions. Progressive disclosure. SDP.
VII Orchestration 6 multi-agent patterns. State passing. A2A protocol. Sizing and topology guidelines.
VIII Planning 5 strategies. Reflection loops. Cline's Plan/Act gold standard.
IXโ€“XI Human-in-the-Loop, State, Security Permission models. Checkpointing. Durable execution. Sandboxing. Prompt injection defense.
XIIโ€“XIII Testing, Deployment Benchmarks. Eval strategies. Cost optimization. Observability. Gateway architecture.
XIV Synthesis Reference architecture. Decision framework for choosing your stack.

Selected findings

Things we found that weren't obvious:

  • A 100-line agent scores 74% on SWE-bench. The loop is the easy part. Context assembly, tool design, and memory are what matter.
  • Context quality degrades starting at ~25% window fill, not at 100%. Every frontier model tested shows this (Chroma Research, 18 models).
  • Three MCP servers consumed 143K of 200K tokens with tool descriptions alone โ€” before the agent read a single user message. Tool selection accuracy drops from 43% to 14% with bloated toolsets.
  • Personality and operational instructions should be separate files. OpenClaw, Claude Code, and Hermes converged on this independently.
  • The primary reason to use sub-agents is context isolation, not parallelism. Anthropic measured 90.2% improvement.
  • Skills defined as markdown files (not code) is the dominant extensibility pattern across Claude Code, OpenClaw, and Cline. Progressive 3-tier loading cuts token cost by 94%.
  • Re-injecting instructions near the end of context defeats "instruction centrifugation" โ€” system prompt influence fading as context grows.

30+ Frameworks analyzed

# Framework Stars Category
1 OpenClaw 210k+ Personal AI Agent
2 AutoGPT 170k+ Autonomous Agent
3 n8n 150k+ Workflow
4 Dify 129k+ Agent Platform
5 OpenCode 120k+ Coding Agent
6 MS Agent Framework 75k+ Enterprise
7 Langflow 55k+ Visual Builder
8 browser-use 50k+ Browser Agent
9 OpenHands 50k+ Coding Agent
10 MetaGPT 50k+ Multi-Agent
11 CrewAI 46k+ Multi-Agent
12 LangGraph 44.6k+ Agent Framework
13 AG2 40k+ Multi-Agent
14 Cline 35k+ IDE Agent
15 Aider 30k+ Coding Agent
16 Mastra 25k+ Agent Framework
17 Goose 25k+ Coding Agent
18 Roo Code 22k+ IDE Agent
19 SWE-agent 20k+ Research Agent
20 Bolt.new 20k+ Web Dev Agent
21 Agno 18.5k+ Agent Runtime
22 Google ADK 17.8k+ Agent Toolkit
23 smolagents 15k+ Agent Framework
24 PydanticAI 15.1k+ Agent Framework
25 Claude Agent SDK 15k+ Agent SDK
26 Hermes Agent 8.3k+ Self-Improving
27 Composio 8k+ Orchestrator
28 Stagehand 8k+ Browser Agent
29 AWS Agent Squad 5k+ Orchestrator
30 Devika โ€” AI Engineer

Star counts as of March 2026.


Quick decision guide

Coding agent             โ†’ Claude Agent SDK, Cline, Roo Code
Visual / no-code         โ†’ n8n, Dify, Langflow
Personal assistant       โ†’ OpenClaw, Hermes Agent
Multi-agent (graphs)     โ†’ LangGraph
Multi-agent (roles)      โ†’ CrewAI
Enterprise .NET/Java     โ†’ MS Agent Framework, Google ADK
Type safety              โ†’ PydanticAI
TypeScript native        โ†’ Mastra
Minimal footprint        โ†’ smolagents
Self-improving           โ†’ Hermes Agent
Browser automation       โ†’ browser-use, Stagehand

Repo structure

โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ COMPREHENSIVE_AGENT_ENGINEERING_GUIDE_2026.md   โ† The guide (~5,000 lines)
โ”œโ”€โ”€ COMPREHENSIVE_AGENT_ENGINEERING_GUIDE_2026.pdf  โ† PDF version
โ”œโ”€โ”€ CONTRIBUTING.md
โ””โ”€โ”€ LICENSE

Contributing

Contributions welcome. See CONTRIBUTING.md.

Corrections, new framework analyses, production patterns you've discovered โ€” anything that makes this more accurate and useful.


Author

Dmitriy Vasilyev ยท AI Enthusiast


License

MIT

Release History

VersionChangesUrgencyDate
0.0.0No release found โ€” using repo HEADMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchMedium3/20/2026
main@2026-03-20Latest activity on main branchLow3/20/2026
main@2026-03-20Latest activity on main branchLow3/20/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

spaceship-mcp๐Ÿš€ Manage domains, DNS, contacts, and listings with spaceship-mcp, a community-built MCP server for the Spaceship API.main@2026-04-21
nmap-mcp๐Ÿ” Enable AI-driven network security scanning with a production-ready Nmap MCP server supporting diverse tools, scan types, and timing templates.main@2026-04-21
claude-container๐Ÿณ Run Claude Code safely in isolated Docker containers with persistent projects and easy setup on macOS using Justfile automation.master@2026-04-21
noapi-google-search-mcp๐Ÿ” Enable local LLMs with real-time Google search, live feeds, OCR, and video insights using noapi-google-search-mcp server tools.main@2026-04-21
website-design-systems-mcp๐ŸŽจ Extract complete design systems from websites and generate AI-ready skill.md files to replicate exact design elements efficiently.main@2026-04-21