freshcrate — open source packages for agents

Browse: Testing

multi-agent-ralph-loopmain@2026-07-26

Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ai-orchestration automation bats-testing claude-code code-quality codex codex-cli dynamic-contexts shell by alfredolopez80

trulenstrulens-2.9.0

Evaluation and Tracking for LLM Experiments and AI Agents

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agent-evaluation agentops ai-agents ai-monitoring ai-observability evals explainable-ml llm-eval python by truera

Observalv1.10.1

Observal is an AI agent registry with first in class observabilty and eval framework

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agents claude-code cli-tool cursor evaluation gemini-cli kiro large-language-models python by BlazeUp-AI

pilotv2.243.0

#1 Terminal Benchmark 2.0 — AI that ships your tickets.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agentic agentic-workflow ai-agent ai-bots ai-tools autonomous-coding claude claude-code go by qf-studio

phoenixarize-phoenix-v19.0.0

AI Observability & Evaluation

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agents ai-monitoring ai-observability aiengineering anthropic datasets evals jupyter notebook langchain prompt-engineering by Arize-ai

vector-db-benchmarkmaster@2026-07-16

Framework for benchmarking vector search engines

Why this rank:Strong adoptionRecent releaseHealthy release cadence

benchmark python vector-database vector-search vector-search-engine by qdrant

ringmain@2026-07-17

89 skills and 38 specialized agents that enforce proven engineering practices for AI-assisted development. TDD, systematic debugging, parallel code review, and 10-gate development cycles — as a Claude

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ai-agents ai-coding claude-code code-quality code-review developer-tools devops finops prompt-engineering python by LerianStudio

fspecmain@2026-07-22

FSPEC: The Spec-Driven, Multi-Agent Coding Factory. It is infrastructure for the "Dark Factory"—the emerging model of fully autonomous software development where AI agents handle all implementation wh

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agentic-ai ai-guardrails bdd cucumber dark-factory ddd domain-driven-design example-mapping typescript by sengac

llm_context_benchmarksmaster@2026-07-16

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ai benchmarking llms python by ivanfioravanti

promptfoo0.121.19

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation typescript by promptfoo

ISC-Benchmain@2026-07-13

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

adversarial-attacks agent-safety ai-safety benchmark frontier-models jailbreak large-language-models llm-safety python by wuyoscar

giskard-ossgiskard-scan/v1.0.0b3

🐢 Open-Source Evaluation & Testing library for LLM Agents

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation python by Giskard-AI

Gitov4.4.0

An AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issues—such as security vulnerabilities, bugs, and maintainability concerns.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ai ai-code-analysis ai-code-review ai-code-reviewer ai-coding ai-coding-assistant code-analysis code-audit python by Nayjest

mxcliv0.16.0

Mendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agentic claude cli genai go mdl mendix by mendixlabs

little-coderv1.11.0

A coding agent optimized to smaller LLMs

Why this rank:Strong adoptionRecent releaseHealthy release cadence

ai-coding-assistant aider-polygot benchmark code-generation coding-agent coding-agents local-llm ollama python by itayinbarr

OpenClawProBenchmain@2026-06-28

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agent benchmark evaluation harness leaderboard llm openclaw python by suyoumo

mlflowv3.14.0

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agentops agents ai ai-governance apache-spark evaluation langchain llm-evaluation python by mlflow

DesignCodev1.2.2

Agent 驱动的专业级平面设计工作台 / Agent-powered graphic design workbench that uses HTML/CSS/SVG as the design medium, supporting vector-quality output, editable elements, multi-layer PSD export, lossless text ren

Why this rank:Strong adoptionRecent releaseHealthy release cadence

agent ai ai-agent ai-design code-generation cross-platform design design-tool javascript by Haruhiyuki

claw-evalmain@2026-05-17

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agent harness llm openclaw python by claw-eval

VectorDBBenchv1.0.22

Benchmark for vector databases.

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

benchmark cost-effectiveness performance python vector-database vector-search vectordb by zilliztech

ContribAIv6.8.0

Autonomous AI agent that contributes to open source — discovers repos, analyzes code, generates fixes, and submits PRs

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agent ai ai-agent automation autonomous-agent code-analysis code-quality contributions rust by tang-vu

chinese-llm-benchmarkv5.10

ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括359个大模型，覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型，以及step3.5-flash、kimi-k2.5、ernie4.5、Min

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agentic-ai artificial-intelligence llm-agent llm-evaluation by jeinlee1991

autospecmain@2026-05-15

Autospec is an open-source AI agent that takes a web app URL and autonomously QAs it, and saves its passing specs as E2E test code

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agent ai e2e end-to-end gemini gemini-flash gpt gpt-4 typescript by zachblume

GTAv0.2.0

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

llm-agent llm-evaluation python by open-compass

FastExpressionCompilerv5.4.1

Fast Compiler for C# Expression Trees and the lightweight LightExpression alternative. Diagnostic and code generation tools for the expressions.

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

benchmark c#closure code-generation compiler delegate delegates dryioc expression-tree by dadhi

Riverbed-Community-Toolkitv26.6

Riverbed Community Toolkit is a public toolkit for Riverbed Solutions engineering and integration

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

acceleration accelerator agent agentic agentic-ai agentic-skills agentic-workflow aiops shell by riverbed

agent-actionsv0.2.7

Declarative framework for orchestrating multi-model LLM pipelines with context engineering and quality gates.

Why this rank:Recent releaseStrong adoptionHealthy release cadence

ai-agents anthropic context-engineering-framework llm orchestration prompt-engineering prompt-engineering-tool python yaml by Muizzkolapo

watchtower1.0.2

Watchtower is a simple AI-powered penetration testing automation CLI tool that leverages LLMs and LangGraph to orchestrate agentic workflows that you can use to test your websites locally. Generate us

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

ai-cybersecurity automation-testing claude langgraph openrouter pentest pentesting python red-team by fzn0x

ruby_llm-contractv0.10.4

Handle LLM output variance for ruby_llm — retry on malformed JSON or rule violations, escalate to a smarter model, measure variance on datasets, gate CI on regressions.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai anthropic cost-tracking eval json-schema llm model-comparison openai prompt-engineering ruby by justi

DeepClaudev1.0.1

Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. �

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

ai claude-3-7-sonnet deepseek gemini python by ErlichLiu

ai-agents-reality-check0.0.0

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agent-architecture agent-benchmark agent-evaluation agent-performance agentic-ai agentic-workflow ai-benchmarking architectural-evaluation llm-agent python by Cre4T3Tiv3

skillfoundryv5.30.0

AI engineering framework with quality gates, persistent memory, and multi-platform support. Works inside Claude Code, Cursor, Copilot, Codex, and Gemini.

Why this rank:Recent releaseStrong adoptionHealthy release cadence

ai-agents ai-coding ai-framework claude-code code-quality copilot cursor developer-tools typescript by samibs

AgentLintv1.1.13

Lint your repo for AI agent compatibility.

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

agentic agents-md ai-agent ai-friendly ai-tools anthropic claude-code claude-code-plugin javascript by 0xmariowu

autonomous-agentic-research-swarmmain@2026-07-11

File-based autonomous agentic research swarm template (Planner/Worker/Judge) with contracts, workstreams, and deterministic quality gates.

Why this rank:Recent releaseStrong adoptionHealthy release cadence

agentic automation claude codex git-worktrees html reproducible-research research swarm by AysajanE

weave-cliv0.12.3

A universal CLI for Weaviate, Milvus, Chroma, Qdrant, and other vector DBs to help view, list, create, delete, and search collections and documents in collections for development, test, and debugging

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

ai-agents cli go golang vector-database by maximilien

awesome-agent-benchmarksmaster@2026-07-18

🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

agent-based-modeling agent-benchmark agentic agentic-ai ai ai-agent ai-models awesome by axxafo

rchub-qamain@2026-07-19

Provide token-efficient, distilled QA docs for AI coding agents to generate accurate test code quickly and reduce token usage significantly

Why this rank:Recent releaseHealthy release cadenceStrong adoption

ai astro chatbot chatgpt gpt-35-turbo javascript lstm nlp oicq rag by itzfarhanullah

ios-agentic-skillsmaster@2026-07-19

🔍 Discover and utilize agentic iOS/watchOS audit skills and playbooks for consistent quality assurance in your applications.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

ai ai-agent ai-agents appdevelopment claude-code claude-skills ios-development javascript llm-agent by mohahasan

structured-prompt-skillmain@2026-07-19

✍️ Write effective AI prompts with this structured prompt engineering library and Claude Code skill, featuring 300+ curated examples for high-quality results.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

ai ai-agents ai-prompts ai-tools anthropic chatgpt claude claude-code prompt-engineering by Marwane83930

octobenchmain@2026-07-19

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

agentic agents ai ai-workflow anthropic automation benchmark codex by xInfer123

agent-reviewmain@2026-07-19

Analyze git code changes to generate structured review reports using flexible AI models and integrated workflows.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

agent ai-writing aigc-detection automation chinese-novel cicd claude claude-code prompt-engineering typescript by ridwan230598

ComfyUI-None-upupmaster@2026-07-18

🎨 Enhance cinematic image quality with ComfyUI-None-upup. This AI engine offers nodes for clarity, brightness, and video processing to elevate your visuals.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

ai cemu-emulator comfy comfyui cqhttp document-analysis emulation flux javascript llm-agent by rudramishra4117

Reproducible-Photorealistic-Nano-Banana-Pro-JSON-Promptsmain@2026-07-18

🍌 Generate JSON prompts for ultra-photorealistic images of nano bananas and related subjects, ensuring reproducible and high-quality visual outputs.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

awesome gemini gemini-ai nano-banana nano-banana-pro nanobanana nanobanana-pro nanobanana2 prompt-engineering by vivekanandan22

awesome-nano-banana-promain@2026-06-13

🖼️ Master advanced techniques for Google's Nano Banana Pro to create stunning, professional-quality images up to 4K resolution.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

awesome banana chatgpt claude claude-code docs / meta gemini gemini-ai gemini3 prompt-engineering by yug69

ComfyUI-AudioSRmain@2026-06-15

🎶 Enhance audio quality with ComfyUI-AudioSR, a versatile tool for upscaling sounds to 48kHz for better clarity and listening experience.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

cemu comfy comfyui-nodes copilot cpp deepseek dit emulator llm-agent python by xaeksx

ai-lead-qualifiermain@2026-06-14

🧠 Qualify leads with an AI-driven system that understands intent, asks key questions, and structures quality leads without hardcoding processes.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai ai-agents automation conversational-ai fastapi groq large-language-models lead prompt-engineering python by Veeksha29

ai-test-casemain@2026-06-14

🤖 Generate automated test cases for your GitHub repositories using AI, ensuring comprehensive coverage with seamless integration and multi-language support.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai ai-powered-testing aitesting booking-system code-generation framework fuzzer golang-test javascript by iytfut

web-quality-skillsmain@2026-06-23

🌐 Optimize web projects with essential skills for performance, accessibility, and SEO, based on Google Lighthouse and Core Web Vitals guidelines.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

bme680 claude-skills embeded-systems frontend html hygraphcms lighthouse nextjs prompt-engineering shell by Bugrasemerkant

qa-agentv0.3.0

An automated, agentic exploratory testing tool that performs comprehensive QA testing on web applications, simulating human user interactions through various input methods (mouse, keyboard, TAB naviga

Why this rank:Release freshnessStrong adoptionHealthy release cadence

agentic agentic-qa playwright playwright-python python qa qa-automation qa-automation-test qaautomation by billrichards

Enhance-Promptmain@2026-06-11

Enhance prompts by injecting real project context to create clear, professional, and actionable instructions with quality and risk insights.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

agent-skills ai ai-agents awesome gemini github-copilot hunyuan hunyuan-image prompt-engineering by wtfhanin

maestro-skillmain@2026-06-11

Generate production-ready Maestro YAML test flows for mobile and web apps with accurate selectors, project setup, CI/CD configurations, and test reports.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai-agent android automation command-framework copilot developer-experience e2e-testing maestro by kaua433

sora2-free-watermark-removermain@2026-05-24

🛠 Remove watermarks from OpenAI Sora 2 videos using precise spectral analysis to keep video quality intact and watermark-free.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

addwatermark chatgpt cli discord openai prompt-engineering prompt-generator python reddit by rasytoun3399

Nightshiftv0.0.7

Autonomous overnight codebase improvement agent for Claude Code. Run it before bed, wake up to production-ready fixes.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai-agent anthropic automation claude claude-code code-quality developer-tools overnight-agent python by Recusive

codex-simplify-skillmain@2026-05-08

Provide a structured code refactoring process for OpenAI Codex with guardrails, decision gates, and parallelism awareness to simplify and improve code quality.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai ai-agents ai-coding cli codex codex-skill codex-skills copilot prompt-engineering by PEDRINHSOUZZX777

qodo-cover0.3.10

Qodo-Cover: An AI-Powered Tool for Automated Test Generation and Code Coverage Enhancement! 💻🤖🧪🐞

Why this rank:Strong adoptionHealthy release cadence

agents ai python test-automation testing by qodo-ai

kitv5.2.8

Trust-Grade AI Development Framework for software development — Zero dependencies.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

ai-agents ai-development ai-development-tools ai-framework antigravity antigravity-extension antigravity-ide claude-extensions javascript by devran-ai

selfmodelv0.3.0

A self-evolving AI Agent Team — agents that rewrite their own operating manual.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

agent-orchestration ai-agents ai-coding ai-framework ai-slop-detection autonomous-agents chaos-testing claude shell by VictorVVedtion

fastRAGv3.1.2

Efficient Retrieval Augmentation and Generation Framework

Why this rank:Strong adoptionHealthy release cadence

benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal python by IntelLabs

PromptgptV1.3

PromptGPT is an opensource framework that enables users to automatically generate high-quality prompts with zero installations, coding necessary or technical knowledge. Promptgpt follows industry best

Why this rank:Strong adoptionHealthy release cadence

by howard9192