freshcrate
Home > Frameworks > crypto-skill-bench

crypto-skill-bench

Benchmark framework for evaluating crypto skills in AI agent ecosystems

Description

Benchmark framework for evaluating crypto skills in AI agent ecosystems

Release History

VersionChangesUrgencyDate
0.1.7Imported from npm (0.1.7)Low4/21/2026
v0.1.7## What's New - **53 official skills benchmarked** (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - **Three-tier Safety Gate**: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - **`retry` command** — re-run failed scenarios from a previous evaluation and update reports - **10x faster `pull`** —Medium4/3/2026
v0.1.6## What's New - **Multi-turn safety rubric** — separate scoring criteria for multi-turn scenarios; confirmation and execution in the same turn = score 0 - **Auto-retry failed scenarios** — invocation errors (timeout/crash) are collected and retried after all scenarios complete - **Invocation failure no longer triggers Safety Gate** — infrastructure issues are not safety violations - **120s timeout** (was 60s) — multi-turn scenarios with multiple API round-trips no longer time out - **200ms stagMedium4/2/2026
v0.1.5## What's Changed - Validates SKILL.md exists in each directory before starting evaluation - Exits early with clear error message if path is not a valid skill directory - Prevents wasting API calls on invalid inputs ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```Medium4/1/2026
v0.1.4## What's New - **`--version` / `-v`** — Show current version and check for updates - **Auto update check** — Every CLI run checks npm for newer versions (non-blocking, 3s timeout) - Shows `Update available: 0.1.3 → 0.1.4` with install command when outdated ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```Medium4/1/2026
v0.1.3## What's Changed - Reports now output to `./reports/` in the current working directory (not the package install directory) - Works for both single skill and batch evaluate ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```Medium4/1/2026
v0.1.2## What's New - **Contributing guide** — 3 ways to contribute: add skills, update scores, add scenarios - **Detailed scoring rubrics** — per-dimension criteria tables in README - **Official vs community skills** — `pull --all` / `--community` / `--category` filtering - **MIT License** added - Updated docs: interactive API key setup, 76-scenario cost estimates, cleaner CLI options ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` [Full benchmark report](https://github.Medium4/1/2026
v0.1.1## What's New - Interactive API key setup — no manual env var configuration needed - npm OIDC trusted publishing (no token required) - Key stored in `~/.crypto-skill-bench/config.json` ```bash npm install -g crypto-skill-bench@latest ```Medium4/1/2026
v0.1.0# Crypto Skill Bench v0.1.0 Open-source benchmark for evaluating crypto trading skills in AI agent ecosystems. - 76 scenarios (37 core + 39 adversarial) - 5 dimensions: Safety, Coverage, Robustness, Routing, UX - 20 skills from cryptoskill.org - LLM-as-Judge: Sonnet 4.6 + Opus 4.6 ```bash npm install -g crypto-skill-bench ``` [Full report](https://github.com/Minara-AI/crypto-skill-benchmark/blob/main/latest-report/summary.md)Medium4/1/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

@vertz/agentsDeclarative AI agent framework for Vertz — agents, tools, and workflows on Cloudflare0.2.48
agentvizSession replay visualizer for AI agent workflows (Claude Code, VS Code, Copilot CLI)0.7.0
@poofnew/vibe-checkAI agent evaluation framework for Claude and beyond0.1.1
@petriflow/gateFramework-agnostic Petri net gating for AI agent tool access control. Define safety constraints as Petri nets — tools are only allowed when an enabled transition permits them.0.3.2
synaposSynapos Framework — AI agent orchestration for multi-IDE development2.8.0