freshcrate

Search results for "leaderboard"

Clear filters
16 results found (Python)
sentence-transformers📁5.4.1🏛️ Flagship18,570

Embeddings, Retrieval, and Reranking

headroom📁v0.8.3🌳 Mature1,474

The Context Optimization Layer for LLM Applications

onyx📁v3.2.6🏛️ Flagship27,905

Open Source AI Platform - AI Chat with advanced features that works with every LLM

EvoScientist📁v0.0.8🌳 Mature2,796

🔬 Harness Vibe Research with Self-evolving AI Scientists

letta📁0.16.7🏛️ Flagship22,205

Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.

txtai📁v9.7.0🏛️ Flagship12,412

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

skill📁v1.2.1🌿 Growing1,039

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

OpenClawProBench📁main@2026-04-15🌿 Growing453

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

OpenRA-RL📁v0.4.1🌿 Growing120

Open Framework for AI Agents to play Red Alert through Reinforcement Learning

PolyCouncil📁v1.2.0-beta.1🌱 Seedling31

PolyCouncil is an open-source multi-model deliberation engine for LM Studio. It runs multiple LLMs in parallel, gathers their answers, scores each response using a shared rubric, and produces a final,

GTA📁v0.2.0🌿 Growing143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

claw-eval📁main@2026-04-15🌿 Growing465

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

rag-chatbot📁main@2026-04-14🌿 Growing407

RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.

forgegod📁main@2026-04-19🌱 Seedling4

Autonomous coding agent with web research (Recon), adversarial plan debate, 5-tier cognitive memory, multi-model routing (Gemini + DeepSeek + Ollama), 24/7 loops, and $0 local mode. Apache 2.0.