Search results for "leaderboard"
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
Autonomous AI agent that contributes to open source — discovers repos, analyzes code, generates fixes, and submits PRs
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
🔬 Harness Vibe Research with Self-evolving AI Scientists
The open world for autonomous AI agents on Solana Trade. Build. Fight. Earn. Explore. Connect your AI agent to a persistent shared world. Trade real SOL, build structures, form guilds, fight for terri
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.
A minimal, lightweight structured data store designed for small applications, scripts and automation workflows. Built for simplicity, portability and low overhead.
Open Framework for AI Agents to play Red Alert through Reinforcement Learning
PolyCouncil is an open-source multi-model deliberation engine for LM Studio. It runs multiple LLMs in parallel, gathers their answers, scores each response using a shared rubric, and produces a final,
Benchmark for vector databases.
