A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding โ they're redefining how software changes the world.
A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding โ they're redefining how software changes the world.
๐ฅ We are actively tracking the frontier research of code agents.
๐งน We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.
๐ Currently collected:516 papers โ (Last update: 2026-04-11)
AI agents that operate within terminal environments, executing shell commands, managing system operations, and automating command-line workflows through natural language interfaces and autonomous task execution.
AI agents that autonomously generate, scaffold, and synthesize code at the repository level, leveraging external tools and APIs to create new modules, build complete projects, and construct large-scale codebases.
AI agents for operating system development, low-level systems programming, compiler/toolchain engineering, and large-scale systems infrastructure.
This includes OS kernel code, runtime systems, device drivers, and system-level code generation.
๐๏ธ Database Engineering Agents
Autonomous agents for solving SQL challenges in real-world database systems (e.g., query generation and optimization, issue resolution).
Code agents for automated creation, modification, and optimization of backend services, APIs, and server-side logic.
BaxBench: Can LLMs Generate Correct and Secure Backends? Mark Vero, Niels Mรผndler, Victor Chibotaru, Veselin Raychev, Maximilian Baader, Nikola Jovanoviฤ, Jingxuan He, Martin Vechev. ICML 2025.
Agents designed to autonomously utilize specialized scientific softwareโsuch as simulation engines, data analysis suites, and visualization platformsโto automate and enhance domain-specific scientific workflows.
Autonomous agents that generate and execute code to interact with game environments, enabling tasks like gameplay, content creation, and environment manipulation through code.
Agents that interact with physical or simulated environments by executing code for embodied tasks including reasoning, navigation, and manipulation. These agents shift the representation of plans from action sequences to code and embed task queries, robot actions, solution samples, and fallback behaviors as programs.
We are a young team passionate about the future of code agents, and we look forward to discussing exciting ideas with the community.
This field sits at the intersection of software engineering, artificial intelligence (especially LLMs and agentic reasoning), and automated code development, experiencing extremely rapid evolution since 2023.
๐ Vision
Advancing toward general-purpose agents capable of understanding, modifying, and creating complex codebases, collaborating with humans, and autonomously driving end-to-end software engineering processesโfrom requirements, to implementation, to testing, deployment, and maintenance.
๐งฉ Open Problems
Long-horizon planning: Enabling agents to reason and act coherently over many steps in large, realistic codebases.
Robust evaluation: Designing benchmarks and metrics that reflect real-world complexity, generalizability, and value beyond short snippets.
Interpretability & safety: Ensuring agent actions are understandable, controllable, and safe for deployment on critical systems.
Collaboration: Seamlessly integrating multiple agents and human-in-the-loop workflows.
Repository-level grounding: Equipping agents with persistent context over evolving, multi-file software.
Resource efficiency: Addressing compute/memory requirements for large-scale agentic work.
Conferences and Workshops
ICSE โ International Conference on Software Engineering [SE]
FSE (ESEC/FSE) โ Foundations of Software Engineering [SE]
ASE โ Automated Software Engineering [SE]
ISSTA โ International Symposium on Software Testing and Analysis [SE/Testing]
ICLR โ International Conference on Learning Representations [ML]
ICML โ International Conference on Machine Learning [ML]
NeurIPS โ Conference on Neural Information Processing Systems [ML]
ACL โ Annual Meeting of the Association for Computational Linguistics [NLP]
EMNLP โ Empirical Methods in Natural Language Processing [NLP]
NAACL โ North American Chapter of the ACL [NLP]
TheWebConf (WWW) โ The Web Conference (formerly WWW) [Web]
๐งช Frontier Labs and Teams
OpenAI: Work on MLE-bench, large-scale evaluations, and agent architecture.
Google DeepMind: Pioneering code-centric models and embodied agent applications.
Microsoft Research: Advances in multi-agent collaboration, feature-benchmarks, and tool-assisted agents.
THUDM (Tsinghua): SWE-Dev, general SE agent architecture research.
Place items in the right category & order by reverse-chronology.
Include badges for GitHub stars, arXiv, website if available.
We're grateful to all our amazing contributors who have made this project what it is today!
If you have any questions or encounter issues, please feel free to reach out. For quick queries, you can also check our Issues page for common questions and solutions.
๐ Star History
๐ Acknowledgements
Thanks to all contributors and the research community.
We would also like to thank the maintainers of many inspiring awesome agent repositories, including:
modal-clientSDK libraries for Modalmain@2026-04-21
ai-dataset-generator๐ค Generate tailored AI training datasets quickly and easily, transforming your domain knowledge into essential training data for model fine-tuning.main@2026-04-21
dopEffectCSharp๐ Maximize your C# productivity with advanced techniques in strings, LINQ, and clean code, inspired by the book "Produtivo com C#."master@2026-04-21
a-evolveThe official repository of "Position: Agentic Evolution is the Path to Evolving LLMs".main@2026-04-20