Search results for "leaderboard"
Embeddings, Retrieval, and Reranking
The Context Optimization Layer for LLM Applications
Open Source AI Platform - AI Chat with advanced features that works with every LLM
🔬 Harness Vibe Research with Self-evolving AI Scientists
Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
Open Framework for AI Agents to play Red Alert through Reinforcement Learning
PolyCouncil is an open-source multi-model deliberation engine for LM Studio. It runs multiple LLMs in parallel, gathers their answers, scores each response using a shared rubric, and produces a final,
Benchmark for vector databases.
A coding agent optimized to smaller LLMs
[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
Autonomous coding agent with web research (Recon), adversarial plan debate, 5-tier cognitive memory, multi-model routing (Gemini + DeepSeek + Ollama), 24/7 loops, and $0 local mode. Apache 2.0.
