📄 Our comprehensive survey paper on Context Engineering is now published! Check out our latest academic insights and theoretical foundations.
A comprehensive survey and collection of resources on Context Engineering - the evolution from static prompting to dynamic, context-aware AI systems, and increasingly to agent runtimes, memory systems, protocols, coding agents, and observability stacks.
For questions, suggestions, or collaboration opportunities, please feel free to reach out:
Lingrui Mei
📧 Email: meilingrui25b@ict.ac.cn or meilingrui22@mails.ucas.ac.cn
I WROTE THE WRONG EMAIL ADDRESS IN THE FIRST VERSION OF MY PAPER!! You can also open an issue in this repository for general discussions and suggestions.
- [2025.07.17] 🔥🔥 Our paper is now published! Check out "A Survey of Context Engineering for Large Language Models" on arXiv and Hugging Face Papers
- [2025.07.03] Repository initialized with comprehensive outline
- [2025.07.03] Survey structure established following modern context engineering paradigms
In the era of Large Language Models (LLMs), the limitations of static prompting have become increasingly apparent. Context Engineering represents the natural evolution to address LLM uncertainty and achieve production-grade AI deployment. Unlike traditional prompt engineering, context engineering encompasses the complete information payload provided to LLMs at inference time, including all structured informational components necessary for plausible task completion.
This repository serves as a comprehensive survey of context engineering techniques, methodologies, and applications.
As of March 2026, context engineering remains a useful and necessary concept, but it is no longer the whole story. The center of gravity has shifted from "how to pack the best prompt" to how agent systems manage runtime state, memory, tools, protocols, approvals, and long-horizon execution. In practice, context engineering now sits inside a broader stack that also includes agent harnesses, interoperability protocols, project memory for coding agents, and trace-first observability.
This repository still preserves its original survey structure on long context, RAG, memory, agent communication, tool use, evaluation, and applications. At the same time, this README is being reorganized to better reflect the agent era through additional coverage of:
- Agent harnesses and runtime systems for planning, subagents, checkpoints, sandboxes, and human approval loops
- Context management in production through compaction, caching, artifact-backed context, and scoped instruction loading
- Memory artifacts and portability including persistent memory, memory interchange formats, persona packaging, and project memory
- Open protocols such as MCP, A2A, AG-UI, ACP, and portable agent schemas
- Coding agents and computer use as the most visible production setting for context engineering today
- Evaluation, observability, and telemetry for long-running agent systems rather than only static benchmarks
Readers primarily interested in the 2026 shift should jump to the expanded sections on:
- Agent harnesses and runtime systems, inspired by Anthropic's effective agents guide, OpenAI's Agents and Tools documentation, Google ADK, and LangChain Deep Agents
- Open protocols and interoperability, including Model Context Protocol, A2A, AG-UI, and AgentSchema
- Coding agents and project memory, including OpenAI Codex, Claude Code memory, and Letta memory blocks
- Evaluation and observability, including LangSmith observability and OpenTelemetry semantic conventions for GenAI
- Awesome Context Engineering
- 💬 Join Our Community
- 📧 Contact
- 📰 News
- 🎯 Introduction
- 🧭 2026 Agent Era Update
- 📚 Table of Contents
- 🔗 Related Survey
- 🏗️ Definition of Context Engineering
- 🌐 Related Blogs
- 🤔 Why Context Engineering?
- 🔧 Components, Techniques and Architectures
- 🛠️ Implementation and Challenges
- 📊 Evaluation Paradigms for Context-Driven Systems
- 🚀 Applications and Systems
- 🔮 Limitations and Future Directions
- 🤝 Contributing
- 📄 License
- 📑 Citation
⚠️ Disclaimer- 📧 Contact
- 🙏 Acknowledgments
- Star History
- 📖 Our Paper
General AI Survey Papers
- A Survey of Large Language Models, Zhao et al.,
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models, Gao et al.,
Context and Reasoning
- A Survey on In-context Learning, Dong et al.,
- Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al.,
Memory Systems and Context Persistence
Survey
- A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al.,
- From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al.,
- Survey on Evaluation of LLM-based Agents, Anonymous et al.,
- A Survey of Personalized Large Language Models: Progress and Future Directions, Anonymous et al.,
- Agentic Retrieval-Augmented Generation: A Survey, Anonymous et al.,
- Retrieval-Augmented Generation with Graphs (GraphRAG), Anonymous et al.,
Benchmarks
- Evaluating Very Long-Term Conversational Memory of LLM Agents (LOCOMO), Anonymous et al.,
- Episodic Memories Generation and Evaluation Benchmark for Large Language Models, Anonymous et al.,
- On the Structural Memory of LLM Agents, Anonymous et al.,
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, Yang et al.,
- Neural Turing Machines, Graves et al.,
- Differentiable Neural Computers, Graves et al.,
- Differentiable Neural Computers with Memory Demon, Anonymous et al.,
- Neural Turing Machines, Graves et al.,
- Memorizing Transformers, Wu et al.,
- Recurrent Memory Transformer, Bulatov et al.,
- Memformer: A Memory-Augmented Transformer for Sequence Modeling, Wu et al.,
- Token Turing Machines, Ryoo et al.,
- TransformerFAM: Feedback Attention is Working Memory, Irie et al.,
Production Memory Systems
- MemGPT: Towards LLMs as Operating Systems, Packer et al.,
- Memory OS of AI Agent, Kang et al.,
- arigraph: learning knowledge graph world models with episodic memory for llm agents, Anonymous et al.,
- Zep: A Temporal Knowledge Graph Architecture for Agent Memory, Anonymous et al.,
- GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models, Anonymous et al.,
- From Local to Global: A GraphRAG Approach to Query-Focused Summarization, Edge et al.,
- arigraph: learning knowledge graph world models with episodic memory for llm agents, Anonymous et al.,
- Larimar: Large Language Models with Episodic Memory Control, Goyal et al.,
- EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al.,
- Empowering Working Memory for Large Language Model Agents, Anonymous et al.,
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al.,
- Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al.,
- Generative Agents: Interactive Simulacra of Human Behavior, Park et al.,
- Self-Controlled Memory Framework for Large Language Models, Anonymous et al.,
- AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Shin et al.,
Additional RAG and Retrieval Surveys
- Retrieval-Augmented Generation for AI-Generated Content: A Survey, Various,
- Large language models (LLMs): survey, technical frameworks, and future challenges, Various,
Context is not just the single prompt users send to an LLM. Context is the complete information payload provided to a LLM at inference time, encompassing all structured informational components that the model needs to plausibly accomplish a given task.
To formally define Context Engineering, we must first mathematically characterize the LLM generation process. Let us model an LLM as a probabilistic function:
$$P(\text{output} | \text{context}) = \prod_{t=1}^T P(\text{token}_t | \text{previous tokens}, \text{context})$$ Where:
-
$\text{context}$ represents the complete input information provided to the LLM -
$\text{output}$ represents the generated response sequence -
$P(\text{token}_t | \text{previous tokens}, \text{context})$ is the probability of generating each token given the context
In traditional prompt engineering, the context is treated as a simple string:
$$\text{context} = \text{prompt}$$ However, in Context Engineering, we decompose the context into multiple structured components:
$$\text{context} = \text{Assemble}(\text{instructions}, \text{knowledge}, \text{tools}, \text{memory}, \text{state}, \text{query})$$ Where
$\text{Assemble}$ is a context assembly function that orchestrates:-
$\text{instructions}$ : System prompts and rules -
$\text{knowledge}$ : Retrieved relevant information -
$\text{tools}$ : Available function definitions -
$\text{memory}$ : Conversation history and learned facts -
$\text{state}$ : Current world/user state -
$\text{query}$ : User's immediate request
Context Engineering is formally defined as the optimization problem:
$$\text{Assemble}^* = \arg\max_{\text{Assemble}} \mathbb{E} [\text{Reward}(\text{LLM}(\text{context}), \text{target})]$$ Subject to constraints:
$|\text{context}| \leq \text{MaxTokens} \text{(context window limitation)}$ $\text{knowledge} = \text{Retrieve}(\text{query}, \text{database})$ - $\text{memory} = \text{Select}(\text{history}, \text{query})$
$\text{state} = \text{Extract}(\text{world})$
Where:
-
$\text{Reward}$ measures the quality of generated responses -
$\text{Retrieve}$ , $\text{Select}$,$\text{Extract}$ are functions for information gathering
The context assembly can be decomposed as:
$$\text{context} = \text{Concat}(\text{Format}(\text{instructions}), \text{Format}(\text{knowledge}), \text{Format}(\text{tools}), \text{Format}(\text{memory}), \text{Format}(\text{query}))$$ Where
$\text{Format}$ represents component-specific structuring, and$\text{Concat}$ assembles them respecting token limits and optimal positioning.Context Engineering is therefore the discipline of designing and optimizing these assembly and formatting functions to maximize task performance.
From this formalization, we derive four fundamental principles:
-
System-Level Optimization: Context generation is a multi-objective optimization problem over assembly functions, not simple string manipulation.
-
Dynamic Adaptation: The context assembly function adapts to each
$\text{query}$ and$\text{state}$ at inference time:$\text{Assemble}(\cdot | \text{query}, \text{state})$ . -
Information-Theoretic Optimality: The retrieval function maximizes relevant information:
$\text{Retrieve} = \arg\max \text{Relevance}(\text{knowledge}, \text{query})$ . -
Structural Sensitivity: The formatting functions encode structure that aligns with LLM processing capabilities.
Context Engineering can be formalized within a Bayesian framework where the optimal context is inferred:
$$P(\text{context} | \text{query}, \text{history}, \text{world}) \propto P(\text{query} | \text{context}) \cdot P(\text{context} | \text{history}, \text{world})$$ Where:
-
$P(\text{query} | \text{context})$ models query-context compatibility -
$P(\text{context} | \text{history}, \text{world})$ represents prior context probability
The optimal context assembly becomes:
$$\text{context}^* = \arg\max_{\text{context}} P(\text{answer} | \text{query}, \text{context}) \cdot P(\text{context} | \text{query}, \text{history}, \text{world})$$ This Bayesian formulation enables:
- Uncertainty Quantification: Modeling confidence in context relevance
- Adaptive Retrieval: Updating context beliefs based on feedback
- Multi-step Reasoning: Maintaining context distributions across interactions
Dimension Prompt Engineering Context Engineering Mathematical Model $\text{context} = \text{prompt}$ (static)$\text{context} = \text{Assemble}(...)$ (dynamic)Optimization Target $\arg\max_{\text{prompt}} P(\text{answer} \mid \text{query}, \text{prompt})$ $\arg\max_{\text{Assemble}} \mathbb{E}[\text{Reward}(...)]$ Complexity $O(1)$ context assembly$O(n)$ multi-component optimizationInformation Theory Fixed information content Adaptive information maximization State Management Stateless function Stateful with $\text{memory}(\text{history}, \text{query})$ Scalability Linear in prompt length Sublinear through compression/filtering Error Analysis Manual prompt inspection Systematic evaluation of assembly components
- The rise of "context engineering"
- The New Skill in AI is Not Prompting, It's Context Engineering
- davidkimai/Context-Engineering: "Context engineering is the delicate art and science of filling the context window with just the right information for the next step."
- Context Engineering is Runtime of AI Agents | by Bijit Ghosh | Jun, 2025 | Medium
- Context Engineering
- Context Engineering for Agents
- Cognition | Don't Build Multi-Agents
- 从Prompt Engineering到Context Engineering - 53AI-AI知识库|大模型知识库|大模型训练|智能体开发
- Mastering Claude Code in 30 minutes
- Context Engineering for Agents
- Andrej Karpathy on X: "+1 for "context engineering" over "prompt engineering"
- 复旦大学/上海创智学院邱锡鹏:Context Scaling,通往AGI的下一幕
The evolution from prompt engineering to context engineering represents a fundamental maturation in AI system design. As influential figures like Andrej Karpathy, Tobi Lutke, and Simon Willison have argued, the term "prompt engineering" has been diluted to mean simply "typing things into a chatbot," failing to capture the complexity required for industrial-strength LLM applications.
- Unclear Human Intent Expression: Human intentions are often unclear, incomplete, or ambiguous when expressed in natural language
- AI's Incomplete Understanding of Human Intent: AI systems struggle to fully comprehend complex human intentions, especially those involving implicit context or cultural nuances
- Overly Literal AI Interpretation: AI systems often interpret human instructions too literally, missing the underlying intent or contextual meaning
Single models alone cannot solve complex problems that require:
- (1) Large-scale External Knowledge: Vast amounts of external knowledge that exceed model capacity
- (2) Accurate External Knowledge: Precise, up-to-date information that models may not possess
- (3) Novel External Knowledge: Emerging knowledge that appears after model training
Static Knowledge Limitations:
- Static Knowledge Problem: Pre-trained models contain static knowledge that becomes outdated
- Knowledge Cutoff: Models cannot access information beyond their training data
- Domain-Specific Gaps: Models lack specialized knowledge for specific industries or applications
- AI Hallucination: LLMs generate plausible but factually incorrect information when lacking proper context
- Lack of Provenance: Absence of clear source attribution for generated information
- Confidence Calibration: Models often appear confident even when generating false information
- Transparency Gaps: Inability to trace how conclusions were reached
- Accountability Issues: Difficulty in verifying the reliability of AI-generated content
Traditional prompting treats context as a static string, but enterprise applications require:
- Dynamic Information Assembly: Context created on-the-fly, tailored to specific users and queries
- Multi-Source Integration: Combining databases, APIs, documents, and real-time data
- State Management: Maintaining conversation history, user preferences, and workflow status
- Tool Orchestration: Coordinating external function calls and API interactions
If prompt engineering is writing a single line of dialogue for an actor, context engineering is the entire process of building the set, designing lighting, providing detailed backstory, and directing the scene. The dialogue only achieves its intended impact because of the rich, carefully constructed environment surrounding it.
Most failures in modern agentic systems are no longer attributable to core model reasoning capabilities but are instead "context failures". The true engineering challenge lies not in what question to ask, but in ensuring the model has all necessary background, data, tools, and memory to answer meaningfully and reliably.
While prompt engineering suffices for simple, self-contained tasks, it breaks down when scaled to:
- Complex, multi-step applications
- Data-rich enterprise environments
- Stateful, long-running workflows
- Multi-user, multi-tenant systems
Enterprise applications demand:
- Deterministic Behavior: Predictable outputs across different contexts and users
- Error Handling: Graceful degradation when information is incomplete or contradictory
- Audit Trails: Transparency in how context influences model decisions
- Compliance: Meeting regulatory requirements for data handling and decision making
Context Engineering enables:
- Cost Optimization: Strategic choice between RAG and long-context approaches
- Latency Management: Efficient information retrieval and context assembly
- Resource Utilization: Optimal use of finite context windows and computational resources
- Maintenance Scalability: Systematic approaches to updating and managing knowledge bases
Context Engineering provides the architectural foundation for managing state, integrating diverse data sources, and maintaining coherence across these demanding scenarios.
LLMs are essentially "brains in a vat" - powerful reasoning engines lacking connection to specific environments. Context Engineering provides:
- Synthetic Sensory Systems: Retrieval mechanisms as artificial perception
- Proxy Embodiment: Tool use as artificial action capabilities
- Artificial Memory: Structured information storage and retrieval
Context Engineering addresses the fundamental challenge of information retrieval where the "user" is not human but an AI agent. This requires:
- Semantic Understanding: Bridging the gap between intent and expression
- Relevance Optimization: Ranking and filtering vast knowledge bases
- Query Transformation: Converting ambiguous requests into precise retrieval operations
Context Engineering elevates AI development from a collection of "prompting tricks" to a rigorous discipline of systems architecture. It applies decades of knowledge in operating system design, memory management, and distributed systems to the unique challenges of LLM-based applications.
This discipline is foundational for unlocking the full potential of LLMs in production systems, enabling the transition from one-off text generation to autonomous agents and sophisticated AI copilots that can reliably operate in complex, dynamic environments.
Position Interpolation and Extension Techniques
- Extending Context Window of Large Language Models via Position Interpolation, Chen et al.,
Memory-Efficient Attention Mechanisms
- Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences, Kang et al.,
Ultra-Long Sequence Processing (100K+ Tokens)
- TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation, Wu et al.,
Comprehensive Extension Surveys and Methods
- Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Various,
- A Controlled Study on Long Context Extension and Generalization in LLMs, Various,
- Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques, An et al.,
- Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion, Wang et al.,
Audio-Visual Context Integration and Processing
- Aligned Better, Listen Better for Audio-Visual Large Language Models, Guo et al.,
- AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue, Chen et al.,
- SonicVisionLM: Playing Sound with Vision Language Models, Xie et al.,
Multi-Modal Prompt Engineering and Context Design
- CaMML: Context-Aware Multimodal Learner for Large Models, Chen et al.,
- Visual In-Context Learning for Large Vision-Language Models, Zhou et al.,
- CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention, Li et al.,
CVPR 2024 Vision-Language Advances
- CaMML: Context-Aware Multimodal Learner for Large Models, Chen et al.,
- Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques, An et al.,
- Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Various,
- TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation, Wu et al.,
- Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences, Kang et al.,
- Retrieval-Augmented Generation for AI-Generated Content: A Survey, Various,
- Evaluating Very Long-Term Conversational Memory of LLM Agents (LOCOMO), Anonymous et al.,
- A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al.,
- A Survey on In-context Learning, Dong et al.,


