freshcrate — Search

Search results for "benchmarking"

20 results found (Python)

ai-agents-reality-check 📁0.0.0🌿 Growing⭐57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

agent-architecture agent-benchmark agent-evaluation agent-performance agentic-ai agentic-workflow ai-benchmarking architectural-evaluation llm-agent pythonby Cre4T3Tiv3Python

agent-framework 📁python-1.1.0🌳 Mature⭐9,325

A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET.

agent-framework agentic-ai agents ai dotnet multi-agent orchestration pythonby microsoftPython

LLM-Agents-Ecosystem-Handbook 📁0.0.0🌳 Mature⭐508

One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.

ai ai-agent ai-agents fine-tuning finetuning-llms freamework llm llmops pythonby oxbshwPython

arthur-engine 📁2.1.529🌿 Growing⭐75

Make AI work for Everyone - Monitoring and governing for your AI/ML

agentic benchmarking evaluation genai guardrails llm ml monitoring pythonby arthur-aiPython

vector-db-benchmark 📁master@2026-04-17🌿 Growing⭐356

Framework for benchmarking vector search engines

benchmark python vector-database vector-search vector-search-engineby qdrantPython

mcp-client-for-ollama 📁v0.28.0🌿 Growing⭐599

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l

agentic-ai ai command-line-tool generative-ai linux llm local-llm macos pythonby joniglPython

LLM-Agent-Paper-daily 📁main@2026-04-21🌱 Seedling⭐20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

llm llm-agent pythonby Lyz103Python

awesome-code-agents 📁main@2026-04-20🌿 Growing⭐94

A curated list of products, benchmarks, and research papers on autonomous code agents. Beyond coding — they're redefining how software changes the world.

pythonby EuniAIPython

llm_context_benchmarks 📁0.0.0🌱 Seedling⭐59

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

ai benchmarking llms pythonby ivanfioravantiPython

skill 📁v1.2.1🌱 Seedling⭐978

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

pythonby pinchbenchPython

AutoRAG 📁v0.3.22🌱 Seedling⭐4,693

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation pythonby Marker-Inc-KoreaPython

claude-code-plugins-plus-skills 📁v4.26.0🌱 Seedling⭐1,995

423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.

agent-skills ai ai-agents anthropic automation claude-code claude-code-plugins developer-tools mcp pythonby jeremylongshorePython

GTA 📁v0.2.0🌱 Seedling⭐143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

llm-agent llm-evaluation pythonby open-compassPython

Open-Sable 📁v1.7.0🌱 Seedling⭐18

Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int

agentic agentic-ai ai ai-assistant open-source pythonby IdeoaLabsPython

Standard 📁0.0.0🌱 Seedling⭐18

JSON Agents - A universal JSON-native standard for describing AI agents, their capabilities, tools, runtimes, and governance in a portable, framework-agnostic format. Based on RFC 8259, JSON Schema 2

agent-governance agent-manifest agent-orchestration agent-specification ai-agents ai-framework interoperability json pythonby JSON-AgentsPython

devito 📁v4.8.21🌱 Seedling⭐689

DSL and compiler framework for automated finite-differences and stencil computation

code-generation compiler dsl finite-difference fwi gpu hpc jit pythonby devitocodesPython

Zen-Ai-Pentest 📁v3.0.0🌱 Seedling⭐279

🛡⚔️AI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reporting🛡⚔️

ai automation compliance cybersecurity ethical-hacking framework penetration-testing pentesting pythonby SHAdd0WTAkaPython

VectorDBBench 📁v1.0.20🌱 Seedling⭐1,068

Benchmark for vector databases.

benchmark cost-effectiveness performance python vector-database vector-search vectordbby zilliztechPython

HealthFlow 📁datasets💤 Dormant⭐40

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

ai-for-healthcare ai-for-science ehr llm llm-agent multi-agent pythonby yhzhu99Python

asyncpg 📁0.31.0🌱 Seedling

An asyncio PostgreSQL driver

database postgres pypiby pypiPython