How modern AI agents are built โ from 30+ open-source framework codebases.
Should you use LangGraph or CrewAI? What makes OpenClaw tick? Why does Claude Code compact at 92%? How do you stop context rot from killing your agent at 25% fill?
This guide answers these questions โ and 200 more.
An organized collection of patterns, architectures, and implementation details from 30+ AI agent frameworks โ extracted by reading their actual source code, system prompts, and compaction logic.
Not opinions. Not tutorials. Just documented patterns from codebases that are running in production.
Most agent knowledge in 2026 is either surface-level blog posts or buried in source code nobody has time to read. We went through the codebases of OpenClaw, Claude Code, LangGraph, CrewAI, Hermes Agent, and 25+ others to document how they actually work โ the agent loops, the prompt assembly, the context management, the memory systems, the tool architectures.
- LangGraph vs CrewAI vs PydanticAI โ which one for what?
- How does OpenClaw's SOUL.md / AGENTS.md pattern work and why does everyone copy it?
- Why do agents get worse the longer they run? (context rot, and 12 ways to fight it)
- MCP servers are eating 72% of my context window โ how do I fix tool sprawl?
- Should I use one agent or multiple? When does multi-agent actually help?
- How does Claude Code handle compaction? What survives, what gets dropped?
- What's the actual agent loop code? Is it really just a while loop?
- Skills as markdown vs. compiled tools โ what do the top frameworks use?
- How does Hermes Agent improve itself over time? (episodic memory + self-evolution)
- What's the minimum viable agent architecture for production?
| Part | Topic | Contents |
|---|---|---|
| IโII | Agent Loops | 8 loop variants with code: ReAct, Plan+Execute, Reflection, Compaction, Code-as-Action, Event-Driven, Graph State Machine, Heartbeat. Termination. Error recovery. |
| III | System Prompts | Assembly patterns. SOUL.md / AGENTS.md separation. Skill catalogs. Anti-patterns. |
| IV | Context Management | 6 compaction strategies with real prompts from Claude Code and OpenClaw. Trigger strategies. What survives compaction. |
| IV-B | Context Rot | 3 mechanisms. 12 defenses. The 40-60% rule. Agent Cognitive Compressor. Measuring degradation. |
| V | Memory | 5-tier hierarchy. File-based vs. vector vs. observational vs. episodic. 8 framework implementations compared. |
| VI | Tools | MCP. Code-as-action. Skills-as-markdown. JIT loading. Tool sprawl (72% context consumed). 7 solutions. Progressive disclosure. SDP. |
| VII | Orchestration | 6 multi-agent patterns. State passing. A2A protocol. Sizing and topology guidelines. |
| VIII | Planning | 5 strategies. Reflection loops. Cline's Plan/Act gold standard. |
| IXโXI | Human-in-the-Loop, State, Security | Permission models. Checkpointing. Durable execution. Sandboxing. Prompt injection defense. |
| XIIโXIII | Testing, Deployment | Benchmarks. Eval strategies. Cost optimization. Observability. Gateway architecture. |
| XIV | Synthesis | Reference architecture. Decision framework for choosing your stack. |
Things we found that weren't obvious:
- A 100-line agent scores 74% on SWE-bench. The loop is the easy part. Context assembly, tool design, and memory are what matter.
- Context quality degrades starting at ~25% window fill, not at 100%. Every frontier model tested shows this (Chroma Research, 18 models).
- Three MCP servers consumed 143K of 200K tokens with tool descriptions alone โ before the agent read a single user message. Tool selection accuracy drops from 43% to 14% with bloated toolsets.
- Personality and operational instructions should be separate files. OpenClaw, Claude Code, and Hermes converged on this independently.
- The primary reason to use sub-agents is context isolation, not parallelism. Anthropic measured 90.2% improvement.
- Skills defined as markdown files (not code) is the dominant extensibility pattern across Claude Code, OpenClaw, and Cline. Progressive 3-tier loading cuts token cost by 94%.
- Re-injecting instructions near the end of context defeats "instruction centrifugation" โ system prompt influence fading as context grows.
| # | Framework | Stars | Category |
|---|---|---|---|
| 1 | OpenClaw | 210k+ | Personal AI Agent |
| 2 | AutoGPT | 170k+ | Autonomous Agent |
| 3 | n8n | 150k+ | Workflow |
| 4 | Dify | 129k+ | Agent Platform |
| 5 | OpenCode | 120k+ | Coding Agent |
| 6 | MS Agent Framework | 75k+ | Enterprise |
| 7 | Langflow | 55k+ | Visual Builder |
| 8 | browser-use | 50k+ | Browser Agent |
| 9 | OpenHands | 50k+ | Coding Agent |
| 10 | MetaGPT | 50k+ | Multi-Agent |
| 11 | CrewAI | 46k+ | Multi-Agent |
| 12 | LangGraph | 44.6k+ | Agent Framework |
| 13 | AG2 | 40k+ | Multi-Agent |
| 14 | Cline | 35k+ | IDE Agent |
| 15 | Aider | 30k+ | Coding Agent |
| 16 | Mastra | 25k+ | Agent Framework |
| 17 | Goose | 25k+ | Coding Agent |
| 18 | Roo Code | 22k+ | IDE Agent |
| 19 | SWE-agent | 20k+ | Research Agent |
| 20 | Bolt.new | 20k+ | Web Dev Agent |
| 21 | Agno | 18.5k+ | Agent Runtime |
| 22 | Google ADK | 17.8k+ | Agent Toolkit |
| 23 | smolagents | 15k+ | Agent Framework |
| 24 | PydanticAI | 15.1k+ | Agent Framework |
| 25 | Claude Agent SDK | 15k+ | Agent SDK |
| 26 | Hermes Agent | 8.3k+ | Self-Improving |
| 27 | Composio | 8k+ | Orchestrator |
| 28 | Stagehand | 8k+ | Browser Agent |
| 29 | AWS Agent Squad | 5k+ | Orchestrator |
| 30 | Devika | โ | AI Engineer |
Star counts as of March 2026.
Coding agent โ Claude Agent SDK, Cline, Roo Code
Visual / no-code โ n8n, Dify, Langflow
Personal assistant โ OpenClaw, Hermes Agent
Multi-agent (graphs) โ LangGraph
Multi-agent (roles) โ CrewAI
Enterprise .NET/Java โ MS Agent Framework, Google ADK
Type safety โ PydanticAI
TypeScript native โ Mastra
Minimal footprint โ smolagents
Self-improving โ Hermes Agent
Browser automation โ browser-use, Stagehand
โโโ README.md
โโโ COMPREHENSIVE_AGENT_ENGINEERING_GUIDE_2026.md โ The guide (~5,000 lines)
โโโ COMPREHENSIVE_AGENT_ENGINEERING_GUIDE_2026.pdf โ PDF version
โโโ CONTRIBUTING.md
โโโ LICENSE
Contributions welcome. See CONTRIBUTING.md.
Corrections, new framework analyses, production patterns you've discovered โ anything that makes this more accurate and useful.
Dmitriy Vasilyev ยท AI Enthusiast
