freshcrate

Search results for "benchmark"

Clear filters
102 results found (Python)
trafilatura📁2.0.0🏛️ Flagship5,758

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.

faster-whisper📁1.2.1🏛️ Flagship22,327

Faster Whisper transcription with CTranslate2

browser-use📁0.12.6🏛️ Flagship89,240

Make websites accessible for AI agents

timm📁1.0.26🏛️ Flagship36,678

PyTorch Image Models

keras📁3.14.0🏛️ Flagship64,025

Multi-backend Keras

sentence-transformers📁5.4.1🏛️ Flagship18,570

Embeddings, Retrieval, and Reranking

graphene📁3.4.3🏛️ Flagship8,244

GraphQL Framework for Python

langsmith📁0.7.33🌳 Mature858

Client library to connect to the LangSmith Observability and Evaluation Platform.

arthur-engine📁2.1.529🌿 Growing77

Make AI work for Everyone - Monitoring and governing for your AI/ML

ISC-Bench📁v0.0.5🌳 Mature799

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

AutoRAG📁v0.3.22🌳 Mature4,713

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

headroom📁v0.8.3🌳 Mature1,474

The Context Optimization Layer for LLM Applications

onyx📁v3.2.6🏛️ Flagship27,905

Open Source AI Platform - AI Chat with advanced features that works with every LLM

cognithor📁v0.92.3🌿 Growing115

Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us

PraisonAI📁v4.6.27🏛️ Flagship6,969

PraisonAI 🦞 — Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous agents that research, plan, code, and execute tasks. Deployed in 5 lines of code with built-in memory, R

opik📁2.0.9🏛️ Flagship18,965

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

EvoScientist📁v0.0.8🌳 Mature2,796

🔬 Harness Vibe Research with Self-evolving AI Scientists

jcodemunch-mcp📁v1.71.0🌳 Mature1,636

The leading, most token-efficient MCP server for GitHub source code exploration via tree-sitter AST parsing

skill📁v1.2.1🌿 Growing1,039

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

mcp-memory-service📁v10.39.1🌳 Mature1,712

Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API + knowledge graph + autonomous consolidation.

Memori📁v3.3.0🏛️ Flagship13,450

Memori is agent-native memory infrastructure. A SQL-native, LLM-agnostic layer that turns agent execution and conversation into structured, persistent state for production systems.

mem0📁openclaw-v1.0.7🏛️ Flagship53,724

Universal memory layer for AI Agents

Auto-claude-code-research-in-sleep📁v0.4.4🏛️ Flagship7,173

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works wi

SmolVM📁v0.0.10🌿 Growing367

Open-source sandboxes for code execution, browser use, and AI agents.

zettelforge📁v2.4.0🌱 Seedling25

Agentic memory for CTI in Python — STIX knowledge graphs, threat-actor alias resolution, offline-first RAG, MCP server for Claude Code and LangChain agents

SmarterRouter📁2.2.5🌿 Growing113

SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

medusa📁v2026.5.5🌿 Growing256

AI-first security scanner with 76 analyzers, 9,600+ detection rules, and repo poisoning detection for AI/ML, LLM agents, and MCP servers. Scan any GitHub repo with: medusa scan --git user/repo

Vibe-Skills📁v3.0.4🌳 Mature1,645

Vibe-Skills is an all-in-one AI skills package. It seamlessly integrates expert-level capabilities and context management into a general-purpose skills package, enabling any AI agent to instantly upgr

synaptic-memory📁v0.16.0🌱 Seedling27

Brain-inspired knowledge graph: spreading activation, Hebbian learning, memory consolidation.

rasputin-memory📁v0.9.1🌱 Seedling30

The memory system your AI agent deserves. 4-stage hybrid retrieval — Vector + BM25 + Knowledge Graph + Neural Reranker — in <150ms. Self-hosted, $0/query, built for agents that need to actually rememb

AReaL📁v1.0.3🏛️ Flagship5,075

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

AutoGPT📁autogpt-platform-beta-v0.6.56🏛️ Flagship183,638

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

mcp-context-forge📁v1.0.0-RC-3🌳 Mature3,604

An AI Gateway, registry, and proxy that sits in front of any MCP, A2A, or REST/gRPC APIs, exposing a unified endpoint with centralized discovery, guardrails and management. Optimizes Agent & Tool call

ContextPilot📁v0.4.1🌿 Growing79

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.

vllm-mlx📁v0.2.8🌳 Mature917

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac

openakita📁v1.27.9🌳 Mature1,655

An open-source AI assistant framework with skills and agent architecture

ainativelang📁v1.4.6🌿 Growing72

AINL helps turn AI from "a smart conversation" into "a structured worker." It is designed for teams building AI workflows that need multiple steps, state and memory, tool use, repeatable execution, v

UltraRAG📁v0.3.0.2🌳 Mature5,510

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

LRAT📁0.0.0🌱 Seedling39

The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.

powermem📁v1.1.0🌳 Mature633

PowerMem: Your AI-Powered Long-Term Memory — Accurate, Agile, Affordable. Also friendly support for the OpenClaw Memory Plugin.

basic-memory📁v0.20.3🌳 Mature2,899

AI conversations that actually remember. Never re-explain your project to your AI again. Join our Discord: https://discord.gg/tyvKNccgqN

ai-agents-reality-check📁0.0.0🌿 Growing57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

claude-codex-settings📁v2.3.0🌳 Mature623

My personal Claude Code and OpenAI Codex setup with battle-tested skills, commands, hooks, agents and MCP servers that I use daily.

CodeGen📁0.0.0🌳 Mature774

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pr

GEA📁0.0.0🌱 Seedling26

Group Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

awesome-code-agents📁main@2026-04-20🌿 Growing98

A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding — they're redefining how software changes the world.

GTA📁v0.2.0🌿 Growing143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

CASSIA📁v1.3.1🌿 Growing89

CASSIA: A Multi-Agent LLM-Based Single-Cell Cell Type Annotation Framework

DeepClaude📁v1.0.1🌳 Mature2,794

Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. �

vector-db-benchmark📁master@2026-04-17🌿 Growing356

Framework for benchmarking vector search engines

Zen-Ai-Pentest📁v3.0.0🌿 Growing355

🛡⚔️AI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reporting🛡⚔️

Kiln📁v0.5.0🌱 Seedling17

Describe it or draw it. Kiln makes it real. — 461 MCP tools for AI-agent-controlled 3D printing. OctoPrint, Moonraker, Bambu Lab, Prusa Link, and Elegoo.

OpenClawProBench📁main@2026-04-15🌿 Growing453

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

OpenRA-RL📁v0.4.1🌿 Growing120

Open Framework for AI Agents to play Red Alert through Reinforcement Learning

AgenticX📁v0.3.7🌿 Growing114

AgenticX is a unified, production-ready multi-agent platform — Python SDK + CLI (agx) + Studio server + Machi desktop app. Features Meta-Agent orchestration, 15+ LLM providers, MCP Hub, hierarchical m

DeepCode📁v1.2.0🏛️ Flagship15,244

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

llmware📁v0.4.6🌿 Growing14,862

Unified framework for building enterprise RAG pipelines with small, specialized models

arag📁v0.1.0🌿 Growing252

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

OpenDQV📁v2.2.5🌱 Seedling10

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.

kuzu-memory📁v1.12.9🌱 Seedling23

Lightweight, embedded graph-based memory system for AI applications. Fast (<3ms recall), offline-first, with MCP server support for Claude and other AI tools.

ragas📁v0.4.3🌳 Mature13,570

Supercharge Your LLM Application Evaluations 🚀

awesome-opensource-ai📁main@2026-04-20🌿 Growing2,849

Curated list of the best truly open-source AI projects, models, tools, and infrastructure.

Dragon-Brain📁v1.1.0🌱 Seedling43

Dragon Brain — persistent long-term memory for AI agents via MCP (Model Context Protocol). Knowledge graph (FalkorDB) + vector search (Qdrant) + CUDA GPU embeddings. Works with Claude, Gemini CLI, Cur

vektori📁main@2026-04-19🌿 Growing111

Memory that remembers the story not just the facts. Three layer sentence graph for AI agents -> Facts, Episodes, raw Sentences. One DB. Zero config.

llm_context_benchmarks📁0.0.0🌱 Seedling59

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

skills-vote📁main@2026-04-19🌿 Growing50

The Next-Gen Agent-Native Skill Recommendation Engine

yao-meta-skill📁main@2026-04-19🌿 Growing297

YAO = Yielding AI Outcomes. A lightweight but rigorous system for creating, evaluating, packaging, and governing reusable agent skills.

SciAgent-Skills📁main@2026-04-17🌿 Growing122

Life sciences computational skills for scientific AI agents

maverick-mcp📁main@2026-04-17🌿 Growing497

MaverickMCP - Personal Stock Analysis MCP Server

claw-eval📁main@2026-04-15🌿 Growing465

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

cognitive-dissonance-dspy📁main@2026-04-14🌿 Growing276

A multi-agent LLM system for detecting and resolving cognitive dissonance.

rag-chatbot📁main@2026-04-14🌿 Growing407

RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.

PageIndex📁main@2026-04-10🌿 Growing25,597

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

hippograph-pro📁main@2026-04-10🌱 Seedling26

Description: Self-hosted graph-based associative memory for personal AI agents. Spreading activation, emotional weighting, zero LLM cost.

m3-memory📁v2026.4.20🌱 Seedling10

Local-first Agentic Memory Layer for MCP Agents • 25 tools • Hybrid search (FTS5 + vector + MMR) • GDPR • 100% local

Ultimate-Agent-Directory📁0.0.0🌱 Seedling51

🤖 The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil

synthadoc📁v0.1.0🌱 Seedling66

Synthadoc: An open-source LLM knowledge compilation engine that turns raw documents into structured, local-first wikis. A transparent, human-readable alternative to traditional RAG, which can be self-

moralstack📁v0.3.1🌱 Seedling8

MoralStack is a governance and safety layer for LLM applications. It analyzes user requests before generation, evaluates risk and intent, and decides whether the AI should answer normally, answer safe

Open-Sable📁v1.7.0🌱 Seedling19

Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int

Compiler📁v2🌱 Seedling20

A tool that compiles messy natural language prompts into a structured intermediate representation (IR) and optionally sends them to LLMs like ChatGPT for cleaner, more reliable responses.

vikramaditya📁main@2026-04-20🌱 Seedling5

Autonomous VAPT platform. Give it a target (FQDN, IP, CIDR) — it hunts, it reports. Inspired by the Obsidian Order.

AutoViralAI📁0.0.0🌱 Seedling11

Autonomous AI agent that researches viral content, generates posts, publishes them, measures engagement — and rewrites its own strategy based on what worked. Self-learning loop powered by LangGraph +

openclaw-model-bridge📁main@2026-04-21🌱 Seedling9

Connect any LLM to OpenClaw — production-tested middleware for Qwen3-235B and beyond

Geneclaw📁v0.1.0🌱 Seedling36

Self-evolving AI agent framework with 5-layer safety gatekeeper. Agents observe failures, propose fixes, and safely apply them. Built on HKUDS/nanobot.

llm-in-sandbox📁v0.2.0🌱 Seedling221

Computer Environments Elicit General Agentic Intelligence in LLMs

LLM-Agent-Paper-daily📁main@2026-04-21🌱 Seedling20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

Somi📁Mineralization🌱 Seedling20

Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features — fully self-hosted via Ollama.

KAG📁v0.8.0💤 Dormant8,688

KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge base

FlexRAG📁0.3.0💤 Dormant236

FlexRAG: A RAG Framework for Information Retrieval and Generation.

Qwen-Agent📁v0.0.26💤 Dormant16,132

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

PromptManager📁master@2026-04-12🌱 Seedling3

PromptManager is a desktop application for cataloguing, searching, and executing AI prompts, and much more.

seraph📁develop@2026-04-13🌱 Seedling1

An AI guardian that remembers, watches, and acts.

fastRAG📁v3.1.2💤 Dormant1,776

Efficient Retrieval Augmentation and Generation Framework

cogames0.25.7🌱 Seedling

Multi-agent cooperative games

pyannote-audio4.0.4🌱 Seedling

State-of-the-art speaker diarization toolkit

HealthFlow📁datasets💤 Dormant41

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research