freshcrate
Skin:/

Browse: Testing

vector-db-benchmarkmaster@2026-06-05

Framework for benchmarking vector search engines

Why this rank:Strong adoptionRecent releaseHealthy release cadence
Gitov4.1.0

An AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issues—such as security vulnerabilities, bugs, and maintainability concerns.

Why this rank:Strong adoptionRecent releaseHealthy release cadence
mxcliv0.12.0

Mendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v

Why this rank:Strong adoptionRecent releaseHealthy release cadence
llm_context_benchmarksmaster@2026-06-04

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz

Why this rank:Strong adoptionRecent releaseHealthy release cadence
aibenchmarkingllmspythonby ivanfioravanti
promptfoo0.121.14

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

Why this rank:Strong adoptionRecent releaseHealthy release cadence
phoenixarize-phoenix-v17.1.0

AI Observability & Evaluation

Why this rank:Strong adoptionRecent releaseHealthy release cadence
pilotv2.166.12

#1 Terminal Benchmark 2.0 — AI that ships your tickets.

Why this rank:Strong adoptionRecent releaseHealthy release cadence
ringmain@2026-06-03

89 skills and 38 specialized agents that enforce proven engineering practices for AI-assisted development. TDD, systematic debugging, parallel code review, and 10-gate development cycles — as a Claude

Why this rank:Strong adoptionRecent releaseHealthy release cadence
mlflowv3.13.0

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

Why this rank:Strong adoptionRecent releaseHealthy release cadence
Observalv1.4.0

Observal is an AI agent registry with first in class observabilty and eval framework

Why this rank:Strong adoptionRecent releaseHealthy release cadence
fspecmain@2026-05-31

FSPEC: The Spec-Driven, Multi-Agent Coding Factory. It is infrastructure for the "Dark Factory"—the emerging model of fully autonomous software development where AI agents handle all implementation wh

Why this rank:Strong adoptionRecent releaseHealthy release cadence

A coding agent optimized to smaller LLMs

Why this rank:Strong adoptionRecent releaseHealthy release cadence
ISC-Benchv0.0.6

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

Why this rank:Strong adoptionRecent releaseHealthy release cadence
giskard-ossgiskard-checks/v1.0.2b3

🐢 Open-Source Evaluation & Testing library for LLM Agents

Why this rank:Strong adoptionRecent releaseHealthy release cadence
OpenClawProBenchmain@2026-05-19

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

Why this rank:Strong adoptionRecent releaseHealthy release cadence
claw-evalmain@2026-05-17

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Benchmark for vector databases.

Why this rank:Strong adoptionRecent releaseHealthy release cadence
trulenstrulens-2.8.1

Evaluation and Tracking for LLM Experiments and AI Agents

Why this rank:Strong adoptionRecent releaseHealthy release cadence
autospecmain@2026-05-15

Autospec is an open-source AI agent that takes a web app URL and autonomously QAs it, and saves its passing specs as E2E test code

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Riverbed Community Toolkit is a public toolkit for Riverbed Solutions engineering and integration

Why this rank:Recent releaseStrong adoptionHealthy release cadence

Handle LLM output variance for ruby_llm — retry on malformed JSON or rule violations, escalate to a smarter model, measure variance on datasets, gate CI on regressions.

Why this rank:Recent releaseStrong adoptionHealthy release cadence
ContribAIv6.8.0

Autonomous AI agent that contributes to open source — discovers repos, analyzes code, generates fixes, and submits PRs

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、Min

Why this rank:Strong adoptionRelease freshnessHealthy release cadence
GTAv0.2.0

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

Why this rank:Strong adoptionRelease freshnessHealthy release cadence
multi-agent-ralph-loopmain@2026-04-20

Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Fast Compiler for C# Expression Trees and the lightweight LightExpression alternative. Diagnostic and code generation tools for the expressions.

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Agent 驱动的专业级平面设计工作台 / Agent-powered graphic design workbench that uses HTML/CSS/SVG as the design medium, supporting vector-quality output, editable elements, multi-layer PSD export, lossless text ren

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Watchtower is a simple AI-powered penetration testing automation CLI tool that leverages LLMs and LangGraph to orchestrate agentic workflows that you can use to test your websites locally. Generate us

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. �

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Declarative framework for orchestrating multi-model LLM pipelines with context engineering and quality gates.

Why this rank:Recent releaseStrong adoptionHealthy release cadence

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

Why this rank:Strong adoptionRelease freshnessHealthy release cadence
AgentLintv1.1.13

Lint your repo for AI agent compatibility.

Why this rank:Release freshnessStrong adoptionHealthy release cadence
weave-cliv0.12.3

A universal CLI for Weaviate, Milvus, Chroma, Qdrant, and other vector DBs to help view, list, create, delete, and search collections and documents in collections for development, test, and debugging

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

🖼️ Master advanced techniques for Google's Nano Banana Pro to create stunning, professional-quality images up to 4K resolution.

Why this rank:Recent releaseStrong adoptionHealthy release cadence
ComfyUI-AudioSRmain@2026-05-31

🎶 Enhance audio quality with ComfyUI-AudioSR, a versatile tool for upscaling sounds to 48kHz for better clarity and listening experience.

Why this rank:Recent releaseStrong adoptionHealthy release cadence
awesome-agent-benchmarksmaster@2026-05-31

🧠 Discover and evaluate advanced benchmark datasets for Large Language Model agents to enhance performance assessment in real-world tasks.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
ai-lead-qualifiermain@2026-05-31

🧠 Qualify leads with an AI-driven system that understands intent, asks key questions, and structures quality leads without hardcoding processes.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
ai-test-casemain@2026-06-06

🤖 Generate automated test cases for your GitHub repositories using AI, ensuring comprehensive coverage with seamless integration and multi-language support.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
rchub-qamain@2026-06-05

Provide token-efficient, distilled QA docs for AI coding agents to generate accurate test code quickly and reduce token usage significantly

Why this rank:Recent releaseHealthy release cadenceStrong adoption
ios-agentic-skillsmaster@2026-06-01

🔍 Discover and utilize agentic iOS/watchOS audit skills and playbooks for consistent quality assurance in your applications.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
Enhance-Promptmain@2026-06-04

Enhance prompts by injecting real project context to create clear, professional, and actionable instructions with quality and risk insights.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
maestro-skillmain@2026-06-04

Generate production-ready Maestro YAML test flows for mobile and web apps with accurate selectors, project setup, CI/CD configurations, and test reports.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
qa-agentv0.2.3

An automated, agentic exploratory testing tool that performs comprehensive QA testing on web applications, simulating human user interactions through various input methods (mouse, keyboard, TAB naviga

Why this rank:Recent releaseHealthy release cadenceStrong adoption
web-quality-skillsmain@2026-05-31

🌐 Optimize web projects with essential skills for performance, accessibility, and SEO, based on Google Lighthouse and Core Web Vitals guidelines.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

✍️ Write effective AI prompts with this structured prompt engineering library and Claude Code skill, featuring 300+ curated examples for high-quality results.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
ComfyUI-None-upupmaster@2026-06-06

🎨 Enhance cinematic image quality with ComfyUI-None-upup. This AI engine offers nodes for clarity, brightness, and video processing to elevate your visuals.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
octobenchmain@2026-06-02

Benchmark and compare LLM tool, configuration, and prompt setups using a shared case framework with automated scoring and telemetry.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

🛠 Remove watermarks from OpenAI Sora 2 videos using precise spectral analysis to keep video quality intact and watermark-free.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
agent-reviewmain@2026-06-04

Analyze git code changes to generate structured review reports using flexible AI models and integrated workflows.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

🍌 Generate JSON prompts for ultra-photorealistic images of nano bananas and related subjects, ensuring reproducible and high-quality visual outputs.

Why this rank:Recent releaseHealthy release cadenceStrong adoption
codex-simplify-skillmain@2026-05-08

Provide a structured code refactoring process for OpenAI Codex with guardrails, decision gates, and parallelism awareness to simplify and improve code quality.

Why this rank:Recent releaseHealthy release cadenceStrong adoption

AI engineering framework with quality gates, persistent memory, and multi-platform support. Works inside Claude Code, Cursor, Copilot, Codex, and Gemini.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

File-based autonomous agentic research swarm template (Planner/Worker/Judge) with contracts, workstreams, and deterministic quality gates.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

Autonomous overnight codebase improvement agent for Claude Code. Run it before bed, wake up to production-ready fixes.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

Qodo-Cover: An AI-Powered Tool for Automated Test Generation and Code Coverage Enhancement! 💻🤖🧪🐞

Why this rank:Strong adoptionHealthy release cadence
kitv5.2.8

Trust-Grade AI Development Framework for software development — Zero dependencies.

Why this rank:Release freshnessStrong adoptionHealthy release cadence
selfmodelv0.3.0

A self-evolving AI Agent Team — agents that rewrite their own operating manual.

Why this rank:Release freshnessStrong adoptionHealthy release cadence
fastRAGv3.1.2

Efficient Retrieval Augmentation and Generation Framework

Why this rank:Strong adoptionHealthy release cadence

PromptGPT is an opensource framework that enables users to automatically generate high-quality prompts with zero installations, coding necessary or technical knowledge. Promptgpt follows industry best

Why this rank:Strong adoptionHealthy release cadence
by howard9192