Search results for "eval"
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、Min
Comprehensive guide to AI agent engineering: how 30+ frameworks actually work under the hood. Context rot, compaction, system prompt assembly, SOUL.md, agent loops, memory systems, tool sprawl, MCP,
An opinionated list of awesome Pydantic-AI frameworks, libraries, software and resources.
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
Must-read papers on Repository-level Code Generation & Issue Resolution 🔥
Awesome list of AI-Driven Development.
Memory-centric self-improving harness for AI agents. Six-phase cycle + Security by Absence. ADRs, JSON schemas, and a dependency-free Python reference.
