Description
Benchmark framework for evaluating crypto skills in AI agent ecosystems
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 0.1.7 | Imported from npm (0.1.7) | Low | 4/21/2026 |
| v0.1.7 | ## What's New - **53 official skills benchmarked** (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - **Three-tier Safety Gate**: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - **`retry` command** — re-run failed scenarios from a previous evaluation and update reports - **10x faster `pull`** — | Medium | 4/3/2026 |
| v0.1.6 | ## What's New - **Multi-turn safety rubric** — separate scoring criteria for multi-turn scenarios; confirmation and execution in the same turn = score 0 - **Auto-retry failed scenarios** — invocation errors (timeout/crash) are collected and retried after all scenarios complete - **Invocation failure no longer triggers Safety Gate** — infrastructure issues are not safety violations - **120s timeout** (was 60s) — multi-turn scenarios with multiple API round-trips no longer time out - **200ms stag | Medium | 4/2/2026 |
| v0.1.5 | ## What's Changed - Validates SKILL.md exists in each directory before starting evaluation - Exits early with clear error message if path is not a valid skill directory - Prevents wasting API calls on invalid inputs ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` | Medium | 4/1/2026 |
| v0.1.4 | ## What's New - **`--version` / `-v`** — Show current version and check for updates - **Auto update check** — Every CLI run checks npm for newer versions (non-blocking, 3s timeout) - Shows `Update available: 0.1.3 → 0.1.4` with install command when outdated ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` | Medium | 4/1/2026 |
| v0.1.3 | ## What's Changed - Reports now output to `./reports/` in the current working directory (not the package install directory) - Works for both single skill and batch evaluate ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` | Medium | 4/1/2026 |
| v0.1.2 | ## What's New - **Contributing guide** — 3 ways to contribute: add skills, update scores, add scenarios - **Detailed scoring rubrics** — per-dimension criteria tables in README - **Official vs community skills** — `pull --all` / `--community` / `--category` filtering - **MIT License** added - Updated docs: interactive API key setup, 76-scenario cost estimates, cleaner CLI options ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` [Full benchmark report](https://github. | Medium | 4/1/2026 |
| v0.1.1 | ## What's New - Interactive API key setup — no manual env var configuration needed - npm OIDC trusted publishing (no token required) - Key stored in `~/.crypto-skill-bench/config.json` ```bash npm install -g crypto-skill-bench@latest ``` | Medium | 4/1/2026 |
| v0.1.0 | # Crypto Skill Bench v0.1.0 Open-source benchmark for evaluating crypto trading skills in AI agent ecosystems. - 76 scenarios (37 core + 39 adversarial) - 5 dimensions: Safety, Coverage, Robustness, Routing, UX - 20 skills from cryptoskill.org - LLM-as-Judge: Sonnet 4.6 + Opus 4.6 ```bash npm install -g crypto-skill-bench ``` [Full report](https://github.com/Minara-AI/crypto-skill-benchmark/blob/main/latest-report/summary.md) | Medium | 4/1/2026 |
Dependencies & License Audit
Loading dependencies...
Similar Packages
@vertz/agentsDeclarative AI agent framework for Vertz — agents, tools, and workflows on Cloudflare0.2.48
@petriflow/gateFramework-agnostic Petri net gating for AI agent tool access control. Define safety constraints as Petri nets — tools are only allowed when an enabled transition permits them.0.3.2
