freshcrate

Search results for "benchmarking"

Clear filters
20 results found (Python)
ai-agents-reality-checkπŸ“0.0.0🌿 Growing⭐57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

agent-frameworkπŸ“python-1.1.0🌳 Mature⭐9,325

A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.

LLM-Agents-Ecosystem-HandbookπŸ“0.0.0🌳 Mature⭐508

One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.

arthur-engineπŸ“2.1.529🌿 Growing⭐75

Make AI work for Everyone - Monitoring and governing for your AI/ML

vector-db-benchmarkπŸ“master@2026-04-17🌿 Growing⭐356

Framework for benchmarking vector search engines

mcp-client-for-ollamaπŸ“v0.28.0🌿 Growing⭐599

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l

LLM-Agent-Paper-dailyπŸ“main@2026-04-21🌱 Seedling⭐20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

awesome-code-agentsπŸ“main@2026-04-20🌿 Growing⭐94

A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding β€” they're redefining how software changes the world.

llm_context_benchmarksπŸ“0.0.0🌱 Seedling⭐59

πŸ“Š LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

skillπŸ“v1.2.1🌱 Seedling⭐978

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with πŸ¦€ by the humans at https://kilo.ai

AutoRAGπŸ“v0.3.22🌱 Seedling⭐4,693

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

claude-code-plugins-plus-skillsπŸ“v4.26.0🌱 Seedling⭐1,995

423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.

GTAπŸ“v0.2.0🌱 Seedling⭐143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

Open-SableπŸ“v1.7.0🌱 Seedling⭐18

Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int

StandardπŸ“0.0.0🌱 Seedling⭐18

JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2

devitoπŸ“v4.8.21🌱 Seedling⭐689

DSL and compiler framework for automated finite-differences and stencil computation

Zen-Ai-PentestπŸ“v3.0.0🌱 Seedling⭐279

πŸ›‘βš”οΈAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reportingπŸ›‘βš”οΈ

HealthFlowπŸ“datasetsπŸ’€ Dormant⭐40

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

asyncpgπŸ“0.31.0🌱 Seedling

An asyncio PostgreSQL driver