freshcrate
Home > AI Agents > Awesome-Context-Engineering

Awesome-Context-Engineering

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

Description

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

README

Awesome Context Engineering

Awesome Context Engineering Cover

💬 Join Our Community

WeChat Group

Join our WeChat group for discussions and updates!

Join our Discord server

Awesome License: MIT PRs WelcomePaper

📄 Our comprehensive survey paper on Context Engineering is now published! Check out our latest academic insights and theoretical foundations.

A comprehensive survey and collection of resources on Context Engineering - the evolution from static prompting to dynamic, context-aware AI systems, and increasingly to agent runtimes, memory systems, protocols, coding agents, and observability stacks.

📧 Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

Lingrui Mei
📧 Email: meilingrui25b@ict.ac.cn or meilingrui22@mails.ucas.ac.cn

I WROTE THE WRONG EMAIL ADDRESS IN THE FIRST VERSION OF MY PAPER!! You can also open an issue in this repository for general discussions and suggestions.


📰 News


🎯 Introduction

In the era of Large Language Models (LLMs), the limitations of static prompting have become increasingly apparent. Context Engineering represents the natural evolution to address LLM uncertainty and achieve production-grade AI deployment. Unlike traditional prompt engineering, context engineering encompasses the complete information payload provided to LLMs at inference time, including all structured informational components necessary for plausible task completion.

This repository serves as a comprehensive survey of context engineering techniques, methodologies, and applications.


🧭 2026 Agent Era Update

From Context Engineering to Agent Engineering

As of March 2026, context engineering remains a useful and necessary concept, but it is no longer the whole story. The center of gravity has shifted from "how to pack the best prompt" to how agent systems manage runtime state, memory, tools, protocols, approvals, and long-horizon execution. In practice, context engineering now sits inside a broader stack that also includes agent harnesses, interoperability protocols, project memory for coding agents, and trace-first observability.

What This Repository Now Covers

This repository still preserves its original survey structure on long context, RAG, memory, agent communication, tool use, evaluation, and applications. At the same time, this README is being reorganized to better reflect the agent era through additional coverage of:

  • Agent harnesses and runtime systems for planning, subagents, checkpoints, sandboxes, and human approval loops
  • Context management in production through compaction, caching, artifact-backed context, and scoped instruction loading
  • Memory artifacts and portability including persistent memory, memory interchange formats, persona packaging, and project memory
  • Open protocols such as MCP, A2A, AG-UI, ACP, and portable agent schemas
  • Coding agents and computer use as the most visible production setting for context engineering today
  • Evaluation, observability, and telemetry for long-running agent systems rather than only static benchmarks

Reading Guide for 2026 Topics

Readers primarily interested in the 2026 shift should jump to the expanded sections on:


📚 Table of Contents


🔗 Related Survey

General AI Survey Papers

  • A Survey of Large Language Models, Zhao et al.,arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge
  • A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models, Gao et al., arXiv Badge GitHub starsContext and Reasoning

    • A Survey on In-context Learning, Dong et al., EMNLP Badge GitHub starsarXiv Badge GitHub starsarXiv Badge
    • Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arXiv Badge GitHub starsarXiv Badge GitHub starsMemory Systems and Context Persistence

      Survey

      • A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al., arXiv Badge GitHub starsarXiv Badge
      • From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al., arXiv Badge
      • Survey on Evaluation of LLM-based Agents, Anonymous et al., arXiv Badge
      • A Survey of Personalized Large Language Models: Progress and Future Directions, Anonymous et al., arXiv Badge
      • Agentic Retrieval-Augmented Generation: A Survey, Anonymous et al., arXiv Badge
      • Retrieval-Augmented Generation with Graphs (GraphRAG), Anonymous et al., arXiv Badge GitHub starsarXiv Badge GitHub starsBenchmarks

        Memory-Augmented Transformers
        • Memorizing Transformers, Wu et al., arXiv Badge
        • Recurrent Memory Transformer, Bulatov et al., NeurIPS Badge GitHub starsarXiv Badge
        • Memformer: A Memory-Augmented Transformer for Sequence Modeling, Wu et al., arXiv Badge
        • Token Turing Machines, Ryoo et al., arXiv Badge
        • TransformerFAM: Feedback Attention is Working Memory, Irie et al., arXiv Badge

        Production Memory Systems

        Episodic and Working Memory
        • Larimar: Large Language Models with Episodic Memory Control, Goyal et al., ICML Badge
        • EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al., ICLR Badge GitHub starsarXiv Badge
        • Empowering Working Memory for Large Language Model Agents, Anonymous et al., arXiv Badge
        Conversational Memory
        • MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al., arXiv Badge
        • Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al., arXiv Badge
        • Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arXiv Badge
        • Self-Controlled Memory Framework for Large Language Models, Anonymous et al., arXiv Badge
        Foundational Survey Papers from Major Venues
        • AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Shin et al., EMNLP Badge GitHub starsEMNLP Badge GitHub starsACL Badge GitHub starsICLR Badge GitHub starsEMNLP Badge GitHub starsAdditional RAG and Retrieval Surveys


          🏗️ Definition of Context Engineering

          Context is not just the single prompt users send to an LLM. Context is the complete information payload provided to a LLM at inference time, encompassing all structured informational components that the model needs to plausibly accomplish a given task.

          LLM Generation

          To formally define Context Engineering, we must first mathematically characterize the LLM generation process. Let us model an LLM as a probabilistic function:

          $$P(\text{output} | \text{context}) = \prod_{t=1}^T P(\text{token}_t | \text{previous tokens}, \text{context})$$

          Where:

          • $\text{context}$ represents the complete input information provided to the LLM
          • $\text{output}$ represents the generated response sequence
          • $P(\text{token}_t | \text{previous tokens}, \text{context})$ is the probability of generating each token given the context

          Definition of Context

          In traditional prompt engineering, the context is treated as a simple string: $$\text{context} = \text{prompt}$$

          However, in Context Engineering, we decompose the context into multiple structured components:

          $$\text{context} = \text{Assemble}(\text{instructions}, \text{knowledge}, \text{tools}, \text{memory}, \text{state}, \text{query})$$

          Where $\text{Assemble}$ is a context assembly function that orchestrates:

          • $\text{instructions}$: System prompts and rules
          • $\text{knowledge}$: Retrieved relevant information
          • $\text{tools}$: Available function definitions
          • $\text{memory}$: Conversation history and learned facts
          • $\text{state}$: Current world/user state
          • $\text{query}$: User's immediate request

          Definition of Context Engineering

          Context Engineering is formally defined as the optimization problem:

          $$\text{Assemble}^* = \arg\max_{\text{Assemble}} \mathbb{E} [\text{Reward}(\text{LLM}(\text{context}), \text{target})]$$

          Subject to constraints:

          • $|\text{context}| \leq \text{MaxTokens} \text{(context window limitation)}$
          • $\text{knowledge} = \text{Retrieve}(\text{query}, \text{database})$
          • $\text{memory} = \text{Select}(\text{history}, \text{query})$
          • $\text{state} = \text{Extract}(\text{world})$

          Where:

          • $\text{Reward}$ measures the quality of generated responses
          • $\text{Retrieve}$, $\text{Select}$, $\text{Extract}$ are functions for information gathering

          Dynamic Context Orchestration

          The context assembly can be decomposed as:

          $$\text{context} = \text{Concat}(\text{Format}(\text{instructions}), \text{Format}(\text{knowledge}), \text{Format}(\text{tools}), \text{Format}(\text{memory}), \text{Format}(\text{query}))$$

          Where $\text{Format}$ represents component-specific structuring, and $\text{Concat}$ assembles them respecting token limits and optimal positioning.

          Context Engineering is therefore the discipline of designing and optimizing these assembly and formatting functions to maximize task performance.

          Mathematical Principles

          From this formalization, we derive four fundamental principles:

          1. System-Level Optimization: Context generation is a multi-objective optimization problem over assembly functions, not simple string manipulation.

          2. Dynamic Adaptation: The context assembly function adapts to each $\text{query}$ and $\text{state}$ at inference time: $\text{Assemble}(\cdot | \text{query}, \text{state})$.

          3. Information-Theoretic Optimality: The retrieval function maximizes relevant information: $\text{Retrieve} = \arg\max \text{Relevance}(\text{knowledge}, \text{query})$.

          4. Structural Sensitivity: The formatting functions encode structure that aligns with LLM processing capabilities.

          Theoretical Framework: Bayesian Context Inference

          Context Engineering can be formalized within a Bayesian framework where the optimal context is inferred:

          $$P(\text{context} | \text{query}, \text{history}, \text{world}) \propto P(\text{query} | \text{context}) \cdot P(\text{context} | \text{history}, \text{world})$$

          Where:

          • $P(\text{query} | \text{context})$ models query-context compatibility
          • $P(\text{context} | \text{history}, \text{world})$ represents prior context probability

          The optimal context assembly becomes:

          $$\text{context}^* = \arg\max_{\text{context}} P(\text{answer} | \text{query}, \text{context}) \cdot P(\text{context} | \text{query}, \text{history}, \text{world})$$

          This Bayesian formulation enables:

          • Uncertainty Quantification: Modeling confidence in context relevance
          • Adaptive Retrieval: Updating context beliefs based on feedback
          • Multi-step Reasoning: Maintaining context distributions across interactions

          Comparison

          Dimension Prompt Engineering Context Engineering
          Mathematical Model $\text{context} = \text{prompt}$ (static) $\text{context} = \text{Assemble}(...)$ (dynamic)
          Optimization Target $\arg\max_{\text{prompt}} P(\text{answer} \mid \text{query}, \text{prompt})$ $\arg\max_{\text{Assemble}} \mathbb{E}[\text{Reward}(...)]$
          Complexity $O(1)$ context assembly $O(n)$ multi-component optimization
          Information Theory Fixed information content Adaptive information maximization
          State Management Stateless function Stateful with $\text{memory}(\text{history}, \text{query})$
          Scalability Linear in prompt length Sublinear through compression/filtering
          Error Analysis Manual prompt inspection Systematic evaluation of assembly components

          🌐 Related Blogs

          Social Media & Talks


          🤔 Why Context Engineering?

          The Paradigm Shift: From Tactical to Strategic

          The evolution from prompt engineering to context engineering represents a fundamental maturation in AI system design. As influential figures like Andrej Karpathy, Tobi Lutke, and Simon Willison have argued, the term "prompt engineering" has been diluted to mean simply "typing things into a chatbot," failing to capture the complexity required for industrial-strength LLM applications.

          1. Fundamental Challenges with Current Approaches

          Human Intent Communication Challenges

          • Unclear Human Intent Expression: Human intentions are often unclear, incomplete, or ambiguous when expressed in natural language
          • AI's Incomplete Understanding of Human Intent: AI systems struggle to fully comprehend complex human intentions, especially those involving implicit context or cultural nuances
          • Overly Literal AI Interpretation: AI systems often interpret human instructions too literally, missing the underlying intent or contextual meaning

          Complex Knowledge Requirements

          Single models alone cannot solve complex problems that require:

          • (1) Large-scale External Knowledge: Vast amounts of external knowledge that exceed model capacity
          • (2) Accurate External Knowledge: Precise, up-to-date information that models may not possess
          • (3) Novel External Knowledge: Emerging knowledge that appears after model training

          Static Knowledge Limitations:

          • Static Knowledge Problem: Pre-trained models contain static knowledge that becomes outdated
          • Knowledge Cutoff: Models cannot access information beyond their training data
          • Domain-Specific Gaps: Models lack specialized knowledge for specific industries or applications

          Reliability and Trustworthiness Issues

          • AI Hallucination: LLMs generate plausible but factually incorrect information when lacking proper context
          • Lack of Provenance: Absence of clear source attribution for generated information
          • Confidence Calibration: Models often appear confident even when generating false information
          • Transparency Gaps: Inability to trace how conclusions were reached
          • Accountability Issues: Difficulty in verifying the reliability of AI-generated content

          2. Limitations of Static Prompting

          From Strings to Systems

          Traditional prompting treats context as a static string, but enterprise applications require:

          • Dynamic Information Assembly: Context created on-the-fly, tailored to specific users and queries
          • Multi-Source Integration: Combining databases, APIs, documents, and real-time data
          • State Management: Maintaining conversation history, user preferences, and workflow status
          • Tool Orchestration: Coordinating external function calls and API interactions

          The "Movie Production" Analogy

          If prompt engineering is writing a single line of dialogue for an actor, context engineering is the entire process of building the set, designing lighting, providing detailed backstory, and directing the scene. The dialogue only achieves its intended impact because of the rich, carefully constructed environment surrounding it.

          3. Enterprise and Production Requirements

          Context Failures Are the New Bottleneck

          Most failures in modern agentic systems are no longer attributable to core model reasoning capabilities but are instead "context failures". The true engineering challenge lies not in what question to ask, but in ensuring the model has all necessary background, data, tools, and memory to answer meaningfully and reliably.

          Scalability Beyond Simple Tasks

          While prompt engineering suffices for simple, self-contained tasks, it breaks down when scaled to:

          • Complex, multi-step applications
          • Data-rich enterprise environments
          • Stateful, long-running workflows
          • Multi-user, multi-tenant systems

          Reliability and Consistency

          Enterprise applications demand:

          • Deterministic Behavior: Predictable outputs across different contexts and users
          • Error Handling: Graceful degradation when information is incomplete or contradictory
          • Audit Trails: Transparency in how context influences model decisions
          • Compliance: Meeting regulatory requirements for data handling and decision making

          Economic and Operational Efficiency

          Context Engineering enables:

          • Cost Optimization: Strategic choice between RAG and long-context approaches
          • Latency Management: Efficient information retrieval and context assembly
          • Resource Utilization: Optimal use of finite context windows and computational resources
          • Maintenance Scalability: Systematic approaches to updating and managing knowledge bases

          Context Engineering provides the architectural foundation for managing state, integrating diverse data sources, and maintaining coherence across these demanding scenarios.

          4. Cognitive and Information Science Foundations

          Artificial Embodiment

          LLMs are essentially "brains in a vat" - powerful reasoning engines lacking connection to specific environments. Context Engineering provides:

          • Synthetic Sensory Systems: Retrieval mechanisms as artificial perception
          • Proxy Embodiment: Tool use as artificial action capabilities
          • Artificial Memory: Structured information storage and retrieval

          Information Retrieval at Scale

          Context Engineering addresses the fundamental challenge of information retrieval where the "user" is not human but an AI agent. This requires:

          • Semantic Understanding: Bridging the gap between intent and expression
          • Relevance Optimization: Ranking and filtering vast knowledge bases
          • Query Transformation: Converting ambiguous requests into precise retrieval operations

          5. The Future of AI System Architecture

          Context Engineering elevates AI development from a collection of "prompting tricks" to a rigorous discipline of systems architecture. It applies decades of knowledge in operating system design, memory management, and distributed systems to the unique challenges of LLM-based applications.

          This discipline is foundational for unlocking the full potential of LLMs in production systems, enabling the transition from one-off text generation to autonomous agents and sophisticated AI copilots that can reliably operate in complex, dynamic environments.


          🔧 Components, Techniques and Architectures

          Context Scaling

          Position Interpolation and Extension Techniques

In the agent era, context engineering increasingly means runtime context management rather than only prompt construction. Production systems now rely on compaction, caching, artifact-backed state, and scoped instruction loading to keep long-horizon agents efficient and controllable.

Runtime Context Management Patterns

  • OpenAI Agents Guide, OpenAI, OpenAI Badge
  • OpenAI Tools: Conversation State, Prompt Caching, and Compaction, OpenAI, OpenAI Badge
  • Google ADK: Context Caching and Context Compression, Google, Google Badge
  • Claude Code Memory and Scoped Project Instructions, Anthropic, Anthropic Badge
  • LangChain Deep Agents: Filesystem-Based Context Management, LangChain, LangChain Badge

Production Design Questions

  • When should state stay in the prompt versus move into files, memory stores, or external tools?
  • How should long-running threads be compacted without losing provenance, instructions, or active plans?
  • How should project rules be loaded conditionally by path, task, or subagent instead of globally?
  • How should prompt caching be combined with memory writes and retrieval freshness?

Structured Data Integration

Knowledge Graph-Enhanced Language Models

  • Learn Together: Joint Multitask Finetuning of Pretrained KG-enhanced LLM for Downstream Tasks, Martynova et al., ICCL Badge GitHub starsICLR Badge
  • Knowledge Graph-Guided Retrieval Augmented Generation, Zhu et al., arXiv Badge GitHub starsarXiv Badge

Graph Neural Networks Combined with Language Models

  • Are Large Language Models In-Context Graph Learners?, Li et al., arXiv Badge GitHub starsEMNLP Badge GitHub starsICLR Badge
  • NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models, Ji et al., arXiv Badge

Structured Data Integration

  • CoddLLM: Empowering Large Language Models for Data Analytics, Authors et al., arXiv Badge
  • Structure-Guided Large Language Models for Text-to-SQL Generation, Authors et al., arXiv Badge
  • StructuredRAG: JSON Response Formatting with Large Language Models, Authors et al., arXiv Badge GitHub starsFoundational KG-LLM Integration Methods

    Self-Generated Context

    Self-Supervised Context Generation and Augmentation

    • SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models, Chuang et al., arXiv Badge GitHub starsCoRR Badge GitHub starsICLR Badge GitHub starsReasoning Models That Generate Their Own Context

      • Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR Badge
      • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsIterative Context Refinement and Self-Improvement

        • Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arXiv Badge GitHub starsarXiv Badge
        • Large Language Models Can Self-Improve in Long-context Reasoning, Li et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsMeta-Learning and Autonomous Context Evolution

          Foundational Chain-of-Thought Research

          • Chain-of-thought prompting elicits reasoning in large language models, Wei et al., NeurIPS Badge

          🛠️ Implementation and Challenges

          0. Agent Harnesses and Runtime Systems

          In 2026, many of the most important advances in context engineering no longer live only inside the prompt. They live inside the agent harness: the runtime loop that manages plans, subagents, checkpoints, files, approvals, tool execution, and recovery from failure. This is where context engineering becomes agent engineering.

          Harness and Runtime Design References

          • Building Effective Agents, Anthropic, Anthropic Badge
          • OpenAI Agents Guide, OpenAI, OpenAI Badge
          • Google Agent Development Kit (ADK), Google, Google Badge
          • LangChain Deep Agents Overview, LangChain, LangChain Badge
          • Microsoft Agent Framework Overview, Microsoft, Microsoft Badge

          Core Runtime Concerns

          • Planning and decomposition: how long tasks are split into manageable units
          • Durable execution: how agent state is checkpointed, resumed, or replayed
          • Context isolation: how subagents and tools avoid polluting each other's working state
          • Sandboxing and artifacts: how file systems, shells, browsers, and outputs become part of the context pipeline
          • Human approvals and interrupts: how production agents remain controllable during risky or long-running actions

          1. Retrieval-Augmented Generation (RAG)

          survey

          • Retrieval-Augmented Generation for Large Language Models: A Survey, Yunfan Gao et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge
          • Evaluation of Retrieval-Augmented Generation: A Survey, Hao Yu et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge

          Naive RAG

          • Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Xindi Wang et al., arXiv Badge
          • In-context Examples Selection for Machine Translation, Sweta Agrawal et al., arXiv Badge
          • In Defense of RAG in the Era of Long-Context Language Models, Tan Yu et al., arXiv Badge
          • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis et al., arXiv Badge
          • LightRAG: Simple and Fast Retrieval-Augmented Generation, Zirui Guo et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsAdvanced RAG

            • Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity, Soyeong Jeong et al., arXiv Badge GitHub starsarXiv Badge
            • FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering, Tianchi Cai et al.
            • IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues, Diji Yang et al., arXiv Badge
            • RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation, Chao Jin et al., arXiv Badge
            • Corrective Retrieval Augmented Generation, Shi-Qi Yan et al., arXiv Badge GitHub starsarXiv Badge
            • Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models, Fei Wang et al., arXiv Badge
            • Learning to Filter Context for Retrieval-Augmented Generation, Zhiruo Wang et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge
            • IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions, Zhebin Zhang et al., arXiv Badge
            • Retrieval Meets Long Context Large Language Models, Peng Xu et al., arXiv Badge
            • Dense x retrieval: What retrieval granularity should we use?, Tong Chen et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsModular RAG

              Graph-Based RAG

              • Don't Forget to Connect! Improving RAG with Graph-based Reranking, Jialin Dong et al., arXiv Badge
              • From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Darren Edge et al., arXiv Badge
              • GRAG: Graph Retrieval-Augmented Generation, Yuntong Hu et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub stars GitHub stars GitHub stars GitHub stars GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsGitHub Badge GitHub starsAgentic RAG

                Real-Time and Streaming RAG

                • StreamingRAG: Real-time Contextual Retrieval and Generation Framework, Sankaradas et al., arXiv Badge GitHub starsarXiv Badge

                2. Memory Systems

                Runtime Memory Design Patterns

                Modern memory systems are no longer a single retrieval store. Production agents increasingly separate:

                • Session / thread state for active work in progress
                • Long-term semantic memory for user or project facts
                • Episodic memory for trajectories, past actions, and reusable experiences
                • Procedural memory for learned workflows, instructions, and stable operating preferences

                Memory Design References

                • LangGraph Memory Overview, LangChain, LangChain Badge
                • Letta Memory Blocks, Letta, Letta Badge
                • Claude Code Memory, Anthropic, Anthropic Badge

                Project Memory and Instruction Artifacts

                Coding agents have made project memory concrete. In practice, memory now often lives in artifacts such as repository instruction files, scoped rules, reusable skills, and long-lived project notes rather than only in vector stores.

                Project Memory References

                • Introducing Codex, OpenAI, OpenAI Badge
                • Claude Code Memory, Anthropic, Anthropic Badge
                • Claude Code Subagents, Anthropic, Anthropic Badge
                • LangChain Deep Agents Overview, LangChain, LangChain Badge

                Persistent Memory Architecture

                • MemGPT: Towards LLMs as Operating Systems, Packer et al., arXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge GitHub starsarXiv Badge
                • Memory-Augmented Generative Adversarial Transformers, Anonymous et al., arXiv Badge

                Memory Interchange Standards

                • PAM (Portable AI Memory): An Open Interchange Format for AI User Memories, Daniel Gines, Spec Badge GitHub starsMemory-Augmented Neural Networks

                  Episodic Memory and Context Persistence

                  • The Role of Memory in LLMs: Persistent Context for Smarter Conversations, Porcu, IJSRM Badge
                  • Episodic Memory in AI Agents Poses Risks that Should Be Studied and Mitigated, Christiano et al., arXiv Badge
                  • Larimar: Large Language Models with Episodic Memory Control, Goyal et al., ICML Badge
                  • EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al., ICLR Badge GitHub starsarXiv Badge
                  • Empowering Working Memory for Large Language Model Agents, Anonymous et al., arXiv Badge

                  Continual Learning and Memory Consolidation

                  • Prediction Error-Driven Memory Consolidation for Continual Learning, Anonymous et al., NeurIPS Badge
                  • Overcoming Catastrophic Forgetting in Continual Learning by Exploring Eigenvalues of Hessian Matrix, Anonymous et al., NeurIPS Badge
                  • Probabilistic Metaplasticity for Continual Learning with Memristors in Spiking Networks, Anonymous et al., arXiv Badge

                  Conversational Memory

                  • MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al., arXiv Badge
                  • Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al., arXiv Badge
                  • Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arXiv Badge
                  • Self-Controlled Memory Framework for Large Language Models, Anonymous et al., arXiv Badge

                  Personalization and Memory

                  • Personalized LLM Response Generation with Parameterized User Memory Injection, Anonymous et al., arXiv Badge
                  • Soul-Driven Interaction Design: A Position Paper on Declarative Persona Specifications for AI Agents, Lee, Zenodo Badge
                  • Soul Spec — Open Specification for AI Agent Persona Packages, ClawSouls, Spec Badge GitHub starsSafety and Alignment with Memory

                    Tool Integration and Memory

                    • WebGPT: Browser-assisted question-answering with human feedback, Nakano et al., arXiv Badge
                    • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., arXiv Badge

                    Learning and Reflection

                    • Language Models are Few-Shot Learners (GPT-3), Brown et al., arXiv Badge
                    • Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS Badge GitHub stars

                      3. Agent Communication

Survey

  • A Survey of AI Agent Protocols, Yingxuan Yang et al., arXiv Badge GitHub starsarXiv Badge
  • Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems, Bingyu Yan et al., arXiv Badge
  • Large Language Model based Multi-Agents: A Survey of Progress and Challenges, Taicheng Guo et al., arXiv Badge GitHub stars

    Open Agent Protocols and Interoperability

Open protocols have become a major part of agent engineering. In practice, modern agent systems increasingly separate:

  • agent-to-tool protocols such as MCP
  • agent-to-agent protocols such as A2A and ACP-style remote invocation
  • agent-to-UI protocols such as AG-UI
  • portable agent definitions such as AgentSchema

Official Protocol and Interoperability References

  • Model Context Protocol Specification, MCP Working Group, Spec Badge
  • Model Context Protocol Architecture, MCP Working Group, Docs Badge
  • Agent2Agent Protocol (A2A), Google, Protocol Badge
  • AG-UI Documentation, CopilotKit Team, Protocol Badge
  • ACP Connect, AGNTCY, Protocol Badge
  • AgentSchema, Microsoft, Schema Badge

Agent Interoperability Protocols

  • A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), and Agent-to-Agent Protocol (A2A), Zhang et al., arXiv Badge
  • Expressive Multi-Agent Communication via Identity-Aware Learning, Du et al., AAAI Badge
  • Context-aware Communication for Multi-agent Reinforcement Learning (CACOM), Li et al., arXiv Badge GitHub starsarXiv Badge
  • Agent Capability Negotiation and Binding Protocol (ACNBP), Ken Huang et al., arXiv Badge
  • A Scalable Communication Protocol for Networks of Large Language Models, Samuele Marro et al., arXiv Badge GitHub stars GitHub stars GitHub stars GitHub starsStructured Communication Frameworks

Context Quality Assessment

Foundational Long-Context Benchmarks

Long-running agent systems need more than offline benchmark scores. They require trace-level visibility into plans, tool calls, memory reads and writes, approvals, retries, and failure modes. Observability is increasingly the verification layer for context engineering in production.

Observability and Telemetry References


🚀 Applications and Systems

Complex Research Systems

Hypothesis Generation and Data-Driven Discovery

Automated Scientific Discovery