Home > AI Agents > Awesome-Context-Engineering

Awesome-Context-Engineering

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

agent agentic-ai agi awesome-list cognitive-science context-engineering llm rag

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

README

Awesome Context Engineering

💬 Join Our Community

Join our WeChat group for discussions and updates!

Join our Discord server

📄 Our comprehensive survey paper on Context Engineering is now published! Check out our latest academic insights and theoretical foundations.

A comprehensive survey and collection of resources on Context Engineering - the evolution from static prompting to dynamic, context-aware AI systems, and increasingly to agent runtimes, memory systems, protocols, coding agents, and observability stacks.

📧 Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

Lingrui Mei
📧 Email: meilingrui25b@ict.ac.cn or meilingrui22@mails.ucas.ac.cn

I WROTE THE WRONG EMAIL ADDRESS IN THE FIRST VERSION OF MY PAPER!! You can also open an issue in this repository for general discussions and suggestions.

📰 News

[2025.07.17] 🔥🔥 Our paper is now published! Check out "A Survey of Context Engineering for Large Language Models" on arXiv and Hugging Face Papers
[2025.07.03] Repository initialized with comprehensive outline
[2025.07.03] Survey structure established following modern context engineering paradigms

🎯 Introduction

In the era of Large Language Models (LLMs), the limitations of static prompting have become increasingly apparent. Context Engineering represents the natural evolution to address LLM uncertainty and achieve production-grade AI deployment. Unlike traditional prompt engineering, context engineering encompasses the complete information payload provided to LLMs at inference time, including all structured informational components necessary for plausible task completion.

This repository serves as a comprehensive survey of context engineering techniques, methodologies, and applications.

🧭 2026 Agent Era Update

From Context Engineering to Agent Engineering

As of March 2026, context engineering remains a useful and necessary concept, but it is no longer the whole story. The center of gravity has shifted from "how to pack the best prompt" to how agent systems manage runtime state, memory, tools, protocols, approvals, and long-horizon execution. In practice, context engineering now sits inside a broader stack that also includes agent harnesses, interoperability protocols, project memory for coding agents, and trace-first observability.

What This Repository Now Covers

This repository still preserves its original survey structure on long context, RAG, memory, agent communication, tool use, evaluation, and applications. At the same time, this README is being reorganized to better reflect the agent era through additional coverage of:

Agent harnesses and runtime systems for planning, subagents, checkpoints, sandboxes, and human approval loops
Context management in production through compaction, caching, artifact-backed context, and scoped instruction loading
Memory artifacts and portability including persistent memory, memory interchange formats, persona packaging, and project memory
Open protocols such as MCP, A2A, AG-UI, ACP, and portable agent schemas
Coding agents and computer use as the most visible production setting for context engineering today
Evaluation, observability, and telemetry for long-running agent systems rather than only static benchmarks

Reading Guide for 2026 Topics

Readers primarily interested in the 2026 shift should jump to the expanded sections on:

Agent harnesses and runtime systems, inspired by Anthropic's effective agents guide, OpenAI's Agents and Tools documentation, Google ADK, and LangChain Deep Agents
Open protocols and interoperability, including Model Context Protocol, A2A, AG-UI, and AgentSchema
Coding agents and project memory, including OpenAI Codex, Claude Code memory, and Letta memory blocks
Evaluation and observability, including LangSmith observability and OpenTelemetry semantic conventions for GenAI

📚 Table of Contents

Awesome Context Engineering

🔗 Related Survey

General AI Survey Papers

A Survey of Large Language Models, Zhao et al.,

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models, Gao et al.,

Context and Reasoning

A Survey on In-context Learning, Dong et al.,

Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al.,

Memory Systems and Context Persistence

Survey

A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al.,
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al.,
Survey on Evaluation of LLM-based Agents, Anonymous et al.,
A Survey of Personalized Large Language Models: Progress and Future Directions, Anonymous et al.,
Agentic Retrieval-Augmented Generation: A Survey, Anonymous et al.,

Retrieval-Augmented Generation with Graphs (GraphRAG), Anonymous et al.,

Benchmarks

Evaluating Very Long-Term Conversational Memory of LLM Agents (LOCOMO), Anonymous et al.,
Episodic Memories Generation and Evaluation Benchmark for Large Language Models, Anonymous et al.,
On the Structural Memory of LLM Agents, Anonymous et al.,
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, Yang et al.,
Neural Turing Machines, Graves et al.,
Differentiable Neural Computers, Graves et al.,
Differentiable Neural Computers with Memory Demon, Anonymous et al.,

Memory-Augmented Transformers

Memorizing Transformers, Wu et al.,
Recurrent Memory Transformer, Bulatov et al.,
Memformer: A Memory-Augmented Transformer for Sequence Modeling, Wu et al.,
Token Turing Machines, Ryoo et al.,
TransformerFAM: Feedback Attention is Working Memory, Irie et al.,

Production Memory Systems

MemGPT: Towards LLMs as Operating Systems, Packer et al.,
Memory OS of AI Agent, Kang et al.,
arigraph: learning knowledge graph world models with episodic memory for llm agents, Anonymous et al.,
Zep: A Temporal Knowledge Graph Architecture for Agent Memory, Anonymous et al.,
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models, Anonymous et al.,
From Local to Global: A GraphRAG Approach to Query-Focused Summarization, Edge et al.,

Episodic and Working Memory

Larimar: Large Language Models with Episodic Memory Control, Goyal et al.,
EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al.,
Empowering Working Memory for Large Language Model Agents, Anonymous et al.,

Conversational Memory

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al.,
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al.,
Generative Agents: Interactive Simulacra of Human Behavior, Park et al.,
Self-Controlled Memory Framework for Large Language Models, Anonymous et al.,

Foundational Survey Papers from Major Venues

AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Shin et al.,

Additional RAG and Retrieval Surveys

Retrieval-Augmented Generation for AI-Generated Content: A Survey, Various,
Large language models (LLMs): survey, technical frameworks, and future challenges, Various,

🏗️ Definition of Context Engineering

Context is not just the single prompt users send to an LLM. Context is the complete information payload provided to a LLM at inference time, encompassing all structured informational components that the model needs to plausibly accomplish a given task.

LLM Generation

To formally define Context Engineering, we must first mathematically characterize the LLM generation process. Let us model an LLM as a probabilistic function:

$$P(\text{output} | \text{context}) = \prod_{t=1}^T P(\text{token}_t | \text{previous tokens}, \text{context})$$

Where:

$\text{context}$ represents the complete input information provided to the LLM
$\text{output}$ represents the generated response sequence
$P(\text{token}_t | \text{previous tokens}, \text{context})$ is the probability of generating each token given the context

Definition of Context

In traditional prompt engineering, the context is treated as a simple string: $$\text{context} = \text{prompt}$$

However, in Context Engineering, we decompose the context into multiple structured components:

$$\text{context} = \text{Assemble}(\text{instructions}, \text{knowledge}, \text{tools}, \text{memory}, \text{state}, \text{query})$$

Where $\text{Assemble}$ is a context assembly function that orchestrates:

$\text{instructions}$: System prompts and rules
$\text{knowledge}$: Retrieved relevant information
$\text{tools}$: Available function definitions
$\text{memory}$: Conversation history and learned facts
$\text{state}$: Current world/user state
$\text{query}$: User's immediate request

Definition of Context Engineering

Context Engineering is formally defined as the optimization problem:

$$\text{Assemble}^* = \arg\max_{\text{Assemble}} \mathbb{E} [\text{Reward}(\text{LLM}(\text{context}), \text{target})]$$

Subject to constraints:

$|\text{context}| \leq \text{MaxTokens} \text{(context window limitation)}$
$\text{knowledge} = \text{Retrieve}(\text{query}, \text{database})$
$\text{memory} = \text{Select}(\text{history}, \text{query})$
$\text{state} = \text{Extract}(\text{world})$

Where:

$\text{Reward}$ measures the quality of generated responses
$\text{Retrieve}$, $\text{Select}$, $\text{Extract}$ are functions for information gathering

Dynamic Context Orchestration

The context assembly can be decomposed as:

$$\text{context} = \text{Concat}(\text{Format}(\text{instructions}), \text{Format}(\text{knowledge}), \text{Format}(\text{tools}), \text{Format}(\text{memory}), \text{Format}(\text{query}))$$

Where $\text{Format}$ represents component-specific structuring, and $\text{Concat}$ assembles them respecting token limits and optimal positioning.

Context Engineering is therefore the discipline of designing and optimizing these assembly and formatting functions to maximize task performance.

Mathematical Principles

From this formalization, we derive four fundamental principles:

System-Level Optimization: Context generation is a multi-objective optimization problem over assembly functions, not simple string manipulation.
Dynamic Adaptation: The context assembly function adapts to each $\text{query}$ and $\text{state}$ at inference time: $\text{Assemble}(\cdot | \text{query}, \text{state})$.
Information-Theoretic Optimality: The retrieval function maximizes relevant information: $\text{Retrieve} = \arg\max \text{Relevance}(\text{knowledge}, \text{query})$.
Structural Sensitivity: The formatting functions encode structure that aligns with LLM processing capabilities.

Theoretical Framework: Bayesian Context Inference

Context Engineering can be formalized within a Bayesian framework where the optimal context is inferred:

$$P(\text{context} | \text{query}, \text{history}, \text{world}) \propto P(\text{query} | \text{context}) \cdot P(\text{context} | \text{history}, \text{world})$$

Where:

$P(\text{query} | \text{context})$ models query-context compatibility
$P(\text{context} | \text{history}, \text{world})$ represents prior context probability

The optimal context assembly becomes:

$$\text{context}^* = \arg\max_{\text{context}} P(\text{answer} | \text{query}, \text{context}) \cdot P(\text{context} | \text{query}, \text{history}, \text{world})$$

This Bayesian formulation enables:

Uncertainty Quantification: Modeling confidence in context relevance
Adaptive Retrieval: Updating context beliefs based on feedback
Multi-step Reasoning: Maintaining context distributions across interactions

Comparison

Dimension	Prompt Engineering	Context Engineering
Mathematical Model	$\text{context} = \text{prompt}$ (static)	$\text{context} = \text{Assemble}(...)$ (dynamic)
Optimization Target	$\arg\max_{\text{prompt}} P(\text{answer} \mid \text{query}, \text{prompt})$	$\arg\max_{\text{Assemble}} \mathbb{E}[\text{Reward}(...)]$
Complexity	$O(1)$ context assembly	$O(n)$ multi-component optimization
Information Theory	Fixed information content	Adaptive information maximization
State Management	Stateless function	Stateful with $\text{memory}(\text{history}, \text{query})$
Scalability	Linear in prompt length	Sublinear through compression/filtering
Error Analysis	Manual prompt inspection	Systematic evaluation of assembly components

🌐 Related Blogs

Social Media & Talks

🤔 Why Context Engineering?

The Paradigm Shift: From Tactical to Strategic

The evolution from prompt engineering to context engineering represents a fundamental maturation in AI system design. As influential figures like Andrej Karpathy, Tobi Lutke, and Simon Willison have argued, the term "prompt engineering" has been diluted to mean simply "typing things into a chatbot," failing to capture the complexity required for industrial-strength LLM applications.

1. Fundamental Challenges with Current Approaches

Human Intent Communication Challenges

Unclear Human Intent Expression: Human intentions are often unclear, incomplete, or ambiguous when expressed in natural language
AI's Incomplete Understanding of Human Intent: AI systems struggle to fully comprehend complex human intentions, especially those involving implicit context or cultural nuances
Overly Literal AI Interpretation: AI systems often interpret human instructions too literally, missing the underlying intent or contextual meaning

Complex Knowledge Requirements

Single models alone cannot solve complex problems that require:

(1) Large-scale External Knowledge: Vast amounts of external knowledge that exceed model capacity
(2) Accurate External Knowledge: Precise, up-to-date information that models may not possess
(3) Novel External Knowledge: Emerging knowledge that appears after model training

Static Knowledge Limitations:

Static Knowledge Problem: Pre-trained models contain static knowledge that becomes outdated
Knowledge Cutoff: Models cannot access information beyond their training data
Domain-Specific Gaps: Models lack specialized knowledge for specific industries or applications

Reliability and Trustworthiness Issues

AI Hallucination: LLMs generate plausible but factually incorrect information when lacking proper context
Lack of Provenance: Absence of clear source attribution for generated information
Confidence Calibration: Models often appear confident even when generating false information
Transparency Gaps: Inability to trace how conclusions were reached
Accountability Issues: Difficulty in verifying the reliability of AI-generated content

2. Limitations of Static Prompting

From Strings to Systems

Traditional prompting treats context as a static string, but enterprise applications require:

Dynamic Information Assembly: Context created on-the-fly, tailored to specific users and queries
Multi-Source Integration: Combining databases, APIs, documents, and real-time data
State Management: Maintaining conversation history, user preferences, and workflow status
Tool Orchestration: Coordinating external function calls and API interactions

The "Movie Production" Analogy

If prompt engineering is writing a single line of dialogue for an actor, context engineering is the entire process of building the set, designing lighting, providing detailed backstory, and directing the scene. The dialogue only achieves its intended impact because of the rich, carefully constructed environment surrounding it.

3. Enterprise and Production Requirements

Context Failures Are the New Bottleneck

Most failures in modern agentic systems are no longer attributable to core model reasoning capabilities but are instead "context failures". The true engineering challenge lies not in what question to ask, but in ensuring the model has all necessary background, data, tools, and memory to answer meaningfully and reliably.

Scalability Beyond Simple Tasks

While prompt engineering suffices for simple, self-contained tasks, it breaks down when scaled to:

Complex, multi-step applications
Data-rich enterprise environments
Stateful, long-running workflows
Multi-user, multi-tenant systems

Reliability and Consistency

Enterprise applications demand:

Deterministic Behavior: Predictable outputs across different contexts and users
Error Handling: Graceful degradation when information is incomplete or contradictory
Audit Trails: Transparency in how context influences model decisions
Compliance: Meeting regulatory requirements for data handling and decision making

Economic and Operational Efficiency

Context Engineering enables:

Cost Optimization: Strategic choice between RAG and long-context approaches
Latency Management: Efficient information retrieval and context assembly
Resource Utilization: Optimal use of finite context windows and computational resources
Maintenance Scalability: Systematic approaches to updating and managing knowledge bases

Context Engineering provides the architectural foundation for managing state, integrating diverse data sources, and maintaining coherence across these demanding scenarios.

4. Cognitive and Information Science Foundations

Artificial Embodiment

LLMs are essentially "brains in a vat" - powerful reasoning engines lacking connection to specific environments. Context Engineering provides:

Synthetic Sensory Systems: Retrieval mechanisms as artificial perception
Proxy Embodiment: Tool use as artificial action capabilities
Artificial Memory: Structured information storage and retrieval

Information Retrieval at Scale

Context Engineering addresses the fundamental challenge of information retrieval where the "user" is not human but an AI agent. This requires:

Semantic Understanding: Bridging the gap between intent and expression
Relevance Optimization: Ranking and filtering vast knowledge bases
Query Transformation: Converting ambiguous requests into precise retrieval operations

5. The Future of AI System Architecture

Context Engineering elevates AI development from a collection of "prompting tricks" to a rigorous discipline of systems architecture. It applies decades of knowledge in operating system design, memory management, and distributed systems to the unique challenges of LLM-based applications.

This discipline is foundational for unlocking the full potential of LLMs in production systems, enabling the transition from one-off text generation to autonomous agents and sophisticated AI copilots that can reliably operate in complex, dynamic environments.

🔧 Components, Techniques and Architectures

Context Scaling

Position Interpolation and Extension Techniques

Extending Context Window of Large Language Models via Position Interpolation, Chen et al., Memory-Efficient Attention Mechanisms
- Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences, Kang et al., Ultra-Long Sequence Processing (100K+ Tokens)
  - TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation, Wu et al., Comprehensive Extension Surveys and Methods
    - Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Various,
    - A Controlled Study on Long Context Extension and Generalization in LLMs, Various,
    - Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques, An et al.,
    - Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion, Wang et al., Audio-Visual Context Integration and Processing
      - Aligned Better, Listen Better for Audio-Visual Large Language Models, Guo et al.,
      - AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue, Chen et al.,
      - SonicVisionLM: Playing Sound with Vision Language Models, Xie et al., Multi-Modal Prompt Engineering and Context Design
        
        CaMML: Context-Aware Multimodal Learner for Large Models, Chen et al.,
        
        Visual In-Context Learning for Large Vision-Language Models, Zhou et al.,
        
        CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention, Li et al.,
        
        CVPR 2024 Vision-Language Advances
        
        CogAgent: A Visual Language Model for GUI Agents, Various, Video and Temporal Understanding
        
        Video Understanding with Large Language Models: A Survey, Various,
        Context Management in Production

In the agent era, context engineering increasingly means runtime context management rather than only prompt construction. Production systems now rely on compaction, caching, artifact-backed state, and scoped instruction loading to keep long-horizon agents efficient and controllable.

Runtime Context Management Patterns

OpenAI Agents Guide, OpenAI,
OpenAI Tools: Conversation State, Prompt Caching, and Compaction, OpenAI,
Google ADK: Context Caching and Context Compression, Google,
Claude Code Memory and Scoped Project Instructions, Anthropic,
LangChain Deep Agents: Filesystem-Based Context Management, LangChain,

Production Design Questions

When should state stay in the prompt versus move into files, memory stores, or external tools?
How should long-running threads be compacted without losing provenance, instructions, or active plans?
How should project rules be loaded conditionally by path, task, or subagent instead of globally?
How should prompt caching be combined with memory writes and retrieval freshness?

Structured Data Integration

Knowledge Graph-Enhanced Language Models

Learn Together: Joint Multitask Finetuning of Pretrained KG-enhanced LLM for Downstream Tasks, Martynova et al.,
Knowledge Graph-Guided Retrieval Augmented Generation, Zhu et al.,

Graph Neural Networks Combined with Language Models

Are Large Language Models In-Context Graph Learners?, Li et al.,
NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models, Ji et al.,

Structured Data Integration

CoddLLM: Empowering Large Language Models for Data Analytics, Authors et al.,
Structure-Guided Large Language Models for Text-to-SQL Generation, Authors et al.,
StructuredRAG: JSON Response Formatting with Large Language Models, Authors et al., Foundational KG-LLM Integration Methods
- Unifying Large Language Models and Knowledge Graphs: A Roadmap, Various,
- All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks, Various,
- Large Language Models for Graph Learning, Various,
Self-Generated Context

Self-Supervised Context Generation and Augmentation
- SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models, Chuang et al., Reasoning Models That Generate Their Own Context
  - Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al.,
  - Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., Iterative Context Refinement and Self-Improvement
    - Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al.,
    - Large Language Models Can Self-Improve in Long-context Reasoning, Li et al., Meta-Learning and Autonomous Context Evolution
      - Meta-in-context learning in large language models, Coda-Forno et al.,
      - EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers, Guo et al.,
      - Agent-Pro: Learning to Evolve Coder Agents via Proposal-based Programming, Zhang et al.,
      Foundational Chain-of-Thought Research
      - Chain-of-thought prompting elicits reasoning in large language models, Wei et al.,
      🛠️ Implementation and Challenges
      
      0. Agent Harnesses and Runtime Systems
      
      In 2026, many of the most important advances in context engineering no longer live only inside the prompt. They live inside the agent harness: the runtime loop that manages plans, subagents, checkpoints, files, approvals, tool execution, and recovery from failure. This is where context engineering becomes agent engineering.
      
      Harness and Runtime Design References
      - Building Effective Agents, Anthropic,
      - OpenAI Agents Guide, OpenAI,
      - Google Agent Development Kit (ADK), Google,
      - LangChain Deep Agents Overview, LangChain,
      - Microsoft Agent Framework Overview, Microsoft,
      Core Runtime Concerns
      - Planning and decomposition: how long tasks are split into manageable units
      - Durable execution: how agent state is checkpointed, resumed, or replayed
      - Context isolation: how subagents and tools avoid polluting each other's working state
      - Sandboxing and artifacts: how file systems, shells, browsers, and outputs become part of the context pipeline
      - Human approvals and interrupts: how production agents remain controllable during risky or long-running actions
      1. Retrieval-Augmented Generation (RAG)
      
      survey
      - Retrieval-Augmented Generation for Large Language Models: A Survey, Yunfan Gao et al.,
      - Evaluation of Retrieval-Augmented Generation: A Survey, Hao Yu et al.,
      Naive RAG
      - Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Xindi Wang et al.,
      - In-context Examples Selection for Machine Translation, Sweta Agrawal et al.,
      - In Defense of RAG in the Era of Long-Context Language Models, Tan Yu et al.,
      - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis et al.,
      - LightRAG: Simple and Fast Retrieval-Augmented Generation, Zirui Guo et al., Advanced RAG
        
        Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity, Soyeong Jeong et al.,
        
        FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering, Tianchi Cai et al.
        
        IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues, Diji Yang et al.,
        
        RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation, Chao Jin et al.,
        
        Corrective Retrieval Augmented Generation, Shi-Qi Yan et al.,
        
        Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models, Fei Wang et al.,
        
        Learning to Filter Context for Retrieval-Augmented Generation, Zhiruo Wang et al.,
        
        IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions, Zhebin Zhang et al.,
        
        Retrieval Meets Long Context Large Language Models, Peng Xu et al.,
        
        Dense x retrieval: What retrieval granularity should we use?, Tong Chen et al., Modular RAG
        
        FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jiajie Jin et al.,
        
        Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation, Yubing Ren et al.
        
        RA-DIT: RETRIEVAL-AUGMENTED DUAL INSTRUCTION TUNING, Xi Victoria Lin et al.,
        
        Query Rewriting for Retrieval-Augmented Large Language Models, Xinbei Ma et al.,
        
        Graph-Based RAG
        
        Don't Forget to Connect! Improving RAG with Graph-based Reranking, Jialin Dong et al.,
        
        From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Darren Edge et al.,
        
        GRAG: Graph Retrieval-Augmented Generation, Yuntong Hu et al., Agentic RAG
        
        From RAG to Memory: Non-Parametric Continual Learning for Large Language Models, Bernal Jiménez Gutiérrez et al.,
        
        PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Myeonghwa Lee et al.,
        
        Paperqa: Retrieval-augmented generative agent for scientific research, Jakub Lála et al.,
        
        Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogues, Hongru Wang et al.,
        
        HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation, Liu et al.,
        
        Real-Time and Streaming RAG
        
        StreamingRAG: Real-time Contextual Retrieval and Generation Framework, Sankaradas et al.,
        
        2. Memory Systems
        
        Runtime Memory Design Patterns
        
        Modern memory systems are no longer a single retrieval store. Production agents increasingly separate:
        
        Session / thread state for active work in progress
        
        Long-term semantic memory for user or project facts
        
        Episodic memory for trajectories, past actions, and reusable experiences
        
        Procedural memory for learned workflows, instructions, and stable operating preferences
        
        Memory Design References
        
        LangGraph Memory Overview, LangChain,
        
        Letta Memory Blocks, Letta,
        
        Claude Code Memory, Anthropic,
        
        Project Memory and Instruction Artifacts
        
        Coding agents have made project memory concrete. In practice, memory now often lives in artifacts such as repository instruction files, scoped rules, reusable skills, and long-lived project notes rather than only in vector stores.
        
        Project Memory References
        
        Introducing Codex, OpenAI,
        
        Claude Code Memory, Anthropic,
        
        Claude Code Subagents, Anthropic,
        
        LangChain Deep Agents Overview, LangChain,
        
        Persistent Memory Architecture
        
        MemGPT: Towards LLMs as Operating Systems, Packer et al.,
        
        Memory-Augmented Generative Adversarial Transformers, Anonymous et al.,
        
        Memory Interchange Standards
        
        PAM (Portable AI Memory): An Open Interchange Format for AI User Memories, Daniel Gines, Memory-Augmented Neural Networks
        
        Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applications, Khosla et al.,
        
        A Machine with Short-Term, Episodic, and Semantic Memory Systems, Kim et al.,
        
        Episodic Memory and Context Persistence
        
        The Role of Memory in LLMs: Persistent Context for Smarter Conversations, Porcu,
        
        Episodic Memory in AI Agents Poses Risks that Should Be Studied and Mitigated, Christiano et al.,
        
        Larimar: Large Language Models with Episodic Memory Control, Goyal et al.,
        
        EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al.,
        
        Empowering Working Memory for Large Language Model Agents, Anonymous et al.,
        
        Continual Learning and Memory Consolidation
        
        Prediction Error-Driven Memory Consolidation for Continual Learning, Anonymous et al.,
        
        Overcoming Catastrophic Forgetting in Continual Learning by Exploring Eigenvalues of Hessian Matrix, Anonymous et al.,
        
        Probabilistic Metaplasticity for Continual Learning with Memristors in Spiking Networks, Anonymous et al.,
        
        Conversational Memory
        
        MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al.,
        
        Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al.,
        
        Generative Agents: Interactive Simulacra of Human Behavior, Park et al.,
        
        Self-Controlled Memory Framework for Large Language Models, Anonymous et al.,
        
        Personalization and Memory
        
        Personalized LLM Response Generation with Parameterized User Memory Injection, Anonymous et al.,
        
        Soul-Driven Interaction Design: A Position Paper on Declarative Persona Specifications for AI Agents, Lee,
        
        Soul Spec — Open Specification for AI Agent Persona Packages, ClawSouls, Safety and Alignment with Memory
        
        Constitutional AI: Harmlessness from AI Feedback, Bai et al.,
        
        Improving alignment of dialogue agents via targeted human judgements (Sparrow), Glaese et al.,
        
        Tool Integration and Memory
        
        WebGPT: Browser-assisted question-answering with human feedback, Nakano et al.,
        
        ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al.,
        
        Learning and Reflection
        
        Language Models are Few-Shot Learners (GPT-3), Brown et al.,
        
        Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al.,
        3. Agent Communication

Survey

A Survey of AI Agent Protocols, Yingxuan Yang et al.,
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems, Bingyu Yan et al.,
Large Language Model based Multi-Agents: A Survey of Progress and Challenges, Taicheng Guo et al.,
Open Agent Protocols and Interoperability

Open protocols have become a major part of agent engineering. In practice, modern agent systems increasingly separate:

agent-to-tool protocols such as MCP
agent-to-agent protocols such as A2A and ACP-style remote invocation
agent-to-UI protocols such as AG-UI
portable agent definitions such as AgentSchema

Official Protocol and Interoperability References

Model Context Protocol Specification, MCP Working Group,
Model Context Protocol Architecture, MCP Working Group,
Agent2Agent Protocol (A2A), Google,
AG-UI Documentation, CopilotKit Team,
ACP Connect, AGNTCY,
AgentSchema, Microsoft,

Agent Interoperability Protocols

A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), and Agent-to-Agent Protocol (A2A), Zhang et al.,
Expressive Multi-Agent Communication via Identity-Aware Learning, Du et al.,
Context-aware Communication for Multi-agent Reinforcement Learning (CACOM), Li et al.,
Agent Capability Negotiation and Binding Protocol (ACNBP), Ken Huang et al.,
A Scalable Communication Protocol for Networks of Large Language Models, Samuele Marro et al., Structured Communication Frameworks
- Learning Structured Communication for Multi-Agent Reinforcement Learning, Wang et al.,
- Task-Agnostic Contrastive Pre-Training for Inter-Agent Communication, Sun et al.,
- AC2C: Adaptively Controlled Two-Hop Communication for Multi-Agent Reinforcement Learning, Xuefeng Wang et al.,
- CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Guohao Li et al.,
- MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution, Wei Tao et al.,
- AutoAgents: A Framework for Automatic Agent Generation, Guangyao Chen et al., LLM-Enhanced Agent Communication
  - ProAgent: Building Proactive Cooperative AI with Large Language Models, Ceyao Zhang et al.,
  - Multi-Agent Incentive Communication via Decentralized Teammate Modeling, Lei Yuan et al.
  - ProAgent: Building Proactive Cooperative Agents with Large Language Models, Zhang et al.,
  - CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards, Xue et al.,
  - Achilles Heel of Distributed Multi-Agent Systems, Zhang et al.,
  4. Tool Use and Function Calling
  
  Hosted Agent Tools and Computer Use
  
  The frontier of tool use has shifted from static function schemas to hosted tool runtimes, remote servers, and computer use interfaces. In the agent era, tools are increasingly connected through platform-managed execution, approval flows, and UI-aware control loops rather than single-shot JSON calls.
  
  Official Tooling and Computer Use References
  - OpenAI Tools Guide, OpenAI,
  - Introducing Codex, OpenAI,
  - Computer Use for Claude 3.5, Anthropic,
  - Google Vertex AI Agent Engine, Google,
  - OSWorld, Xie et al.,
  - Lumen — Vision-first browser agent with self-healing deterministic replay over CDP. Screenshot → model → action loop with multi-provider support (Anthropic, Google). Foundational Tool Learning
    - Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al.,
    - Tool Learning with Large Language Models: A Survey, Qu et al., Advanced Function Calling Systems
      - Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks, Smith et al.,
      - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al.,
      - A Real-World WebAgent for Complex Web Tasks, Zhai et al.,
      Multi-Agent Function Calling
      - ToolACE: Winning the Points of LLM Function Calling, Zhang et al.,
      - Berkeley Function Leaderboard (BFCL): Evaluating Function-Calling Abilities, Various,
        📊 Evaluation Paradigms for Context-Driven Systems

Context Quality Assessment

Foundational Long-Context Benchmarks

RULER: What's the Real Context Size of Your Long-Context Language Models?, Cheng-Ping Hsieh et al.,
∞BENCH: Extending Long Context Evaluation Beyond 100K Tokens, Zhang et al., Multimodal and Specialized Evaluation
- MultiModal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models, Wang et al., RAG and Generation Evaluation
  - Evaluation of Retrieval-Augmented Generation: A Survey, Li et al.,
  - Ragas: Automated Evaluation of Retrieval Augmented Generation, Espinosa-Anke et al.,
  - Human Evaluation Protocol for Generative AI Chatbots in Clinical Microbiology, Griego-Herrera et al.,
  Benchmarking Context Engineering
  
  Synthetic vs. Realistic Evaluation
  - Needle-in-a-Haystack (NIAH) and Synthetic Benchmarks, Research Area 2023-2024,
  - GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis, Liu et al.,
    Agent Observability and Telemetry

Long-running agent systems need more than offline benchmark scores. They require trace-level visibility into plans, tool calls, memory reads and writes, approvals, retries, and failure modes. Observability is increasingly the verification layer for context engineering in production.

Observability and Telemetry References

LangSmith Observability Quickstart, LangChain,
OpenTelemetry Semantic Conventions for Generative AI, OpenTelemetry,
Google ADK Evaluation and Observability, Google,
OpenAI Agents and Tools, OpenAI,

🚀 Applications and Systems

Complex Research Systems

Hypothesis Generation and Data-Driven Discovery

Hypothesis Generation with Large Language Models, Liu et al.,
Literature Meets Data: A Synergistic Approach to Hypothesis Generation, Liu et al.,

Automated Scientific Discovery

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al.,
Can Large Language Models Replace Humans in Systematic Reviews?, Khraisha et al.,
Solving Olympiad Geometry without Human Demonstrations, Trinh et al.,

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis, Liu et al.,

AI for Science Integration and Future Directions

AI for Science 2025: Convergence of AI Innovation and Scientific Discovery, Fink et al.,
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges, Anonymous et al.,

Deep Research Applications

Accelerating scientific discovery with AI, MIT News,
Accelerating scientific breakthroughs with an AI co-scientist, Google Research,
Bridging AI and Science: Implications from a Large-Scale Literature Analysis of AI4Science, Various,

Production Systems

Context Engineering as a Core Discipline

From Prompt Craft to System Design: Context Engineering as a Core Discipline for AI-Driven Delivery, Forte Group Team,
Context Engineering: A Framework for Enterprise AI Operations, Shelly Palmer,
How MCP Handles Context Management in High-Throughput Scenarios, Portkey.ai Team,

Enterprise AI Case Studies

Case Study: JPMorgan's COiN Platform – Agentic AI for Financial Analysis, AI Mindset Research,
Case Study: EY's Agentic AI Integration in Microsoft 365 Copilot, AI Mindset Research,
Context Is Everything: The Massive Shift Making AI Actually Work in the Real World, Phil Mora,

Enterprise Applications and Infrastructure

The Context Layer for Enterprise RAG Applications, Contextual AI Team,
Navigating AI Model Deployment: Challenges and Solutions, Dean Lancaster,
2024: The State of Generative AI in the Enterprise, Menlo Ventures,
How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025, Andreessen Horowitz,

Developer Tools with Context Engineering

Autohand Code CLI: Autonomous Coding Agent with Semantic Search, Memory, and Context Management, Autohand AI,

Coding Agents and Project Memory

Coding agents are one of the clearest production settings in which context engineering becomes agent engineering. Here, context is no longer just a prompt: it becomes repository instructions, project memory, task plans, file diffs, test results, and tool traces.

Introducing Codex, OpenAI,
Claude Code Memory, Anthropic,
Claude Code Subagents, Anthropic,
Letta Memory Blocks, Letta,
LangChain Deep Agents, LangChain,

Platform Stacks and Hosted Agent Runtimes

The production ecosystem is increasingly organized around full agent stacks rather than isolated models or prompts. These stacks combine tools, memory, runtime orchestration, sessions, observability, and interoperability in a single platform surface.

OpenAI Agents Guide, OpenAI,
Google Agent Development Kit (ADK), Google,
Vertex AI Agent Engine, Google,
LangGraph Memory Overview, LangChain,
Microsoft Agent Framework, Microsoft,

🔮 Limitations and Future Directions

Current Limitations

Context Window Constraints: Despite improvements, context length remains a bottleneck
Computational Overhead: Processing large contexts requires significant resources
Context Coherence: Maintaining coherence across extended contexts
Dynamic Adaptation: Real-time context updating challenges

Future Research Directions

Infinite Context: Developing truly unlimited context capabilities
Context Compression: Efficient representation of large contexts
Multimodal Integration: Seamless integration of diverse data types
Adaptive Context: Self-optimizing context management
Context Privacy: Securing sensitive information in context pipelines

🤝 Contributing

We welcome contributions to this survey! Please follow these guidelines:

Fork the repository
Create a feature branch
Add relevant papers with proper formatting
Submit a pull request with a clear description

Paper Formatting Guidelines

<li><i><b>Paper Title</b></i>, Author et al., <a href="URL" target="_blank"><img src="https://img.shields.io/badge/SOURCE-YEAR.MM-COLOR" alt="SOURCE Badge"></a></li>

Badge Colors

red for arXiv papers
blue for conference/journal papers
white for GitHub repositories
yellow for HuggingFace resources

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📑 Citation

If you find this survey helpful in your research, please consider citing:

@misc{mei2025surveycontextengineeringlarge,
      title={A Survey of Context Engineering for Large Language Models}, 
      author={Lingrui Mei and Jiayu Yao and Yuyao Ge and Yiwei Wang and Baolong Bi and Yujun Cai and Jiazhi Liu and Mingyu Li and Zhong-Zhi Li and Duzhen Zhang and Chenlin Zhou and Jiayi Mao and Tianze Xia and Jiafeng Guo and Shenghua Liu},
      year={2025},
      eprint={2507.13334},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.13334}, 
}

⚠️ Disclaimer

This project is ongoing and continuously evolving. While we strive for accuracy and completeness, there may be errors, omissions, or outdated information. We welcome corrections, suggestions, and contributions from the community. Please stay tuned for regular updates and improvements.

📧 Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

Lingrui Mei
📧 Email: meilingrui22@mails.ucas.ac.cn

You can also open an issue in this repository for general discussions and suggestions.

🙏 Acknowledgments

This survey builds upon the foundational work of the AI research community. We thank all researchers contributing to the advancement of context engineering and large language models.

Star History

Star ⭐ this repository if you find it helpful!

📖 Our Paper

A Survey of Context Engineering for Large Language Models

arXiv: https://arxiv.org/abs/2507.13334
Hugging Face Papers: https://huggingface.co/papers/2507.13334

This comprehensive survey provides the latest academic insights and theoretical foundations for context engineering in large language models.

Release History

Version	Changes	Urgency	Date
main@2026-05-28	Latest activity on main branch	High	5/28/2026
0.0.0	No release found — using repo HEAD	Medium	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026
main@2026-03-10	Latest activity on main branch	Low	3/10/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

llm-oss-landscapeData driven agentic landscapes and insights. Produced by Ant Open Source and inclusionAI.main@2026-06-02

casdoorAn open-source Agent-first Identity and Access Management (IAM) /LLM MCP & agent gateway and auth server with web UI supporting OpenClaw, MCP, OAuth, OIDC, SAML, CAS, LDAP, SCIM, WebAuthn, TOTP, MFA, v3.83.0

daily_stock_analysisLLM驱动的 A/H/美股智能分析器：多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送，零成本定时运行，纯白嫖. LLM-powered stock analysis system for A/H/US markets.v3.20.0

agentscope-javaAgentScope Java: Agent-Oriented Programming for Building LLM Applicationsv2.0.0-RC1

PageIndex📑 PageIndex: Document Index for Vectorless, Reasoning-based RAGmain@2026-06-02

More in AI Agents

@blockrun/franklinFranklin — The AI agent with a wallet. Spends USDC autonomously to get real work done. Pay per action, no subscriptions.

hermes-agentThe agent that grows with you

awesome-copilotCommunity-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.

e2bE2B SDK that give agents cloud environments