crypto-skill-bench

Home > Frameworks > crypto-skill-bench

Benchmark framework for evaluating crypto skills in AI agent ecosystems

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Description

Benchmark framework for evaluating crypto skills in AI agent ecosystems

Release History

Version	Changes	Urgency	Date
0.1.7	Imported from npm (0.1.7)	Low	4/21/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Medium	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.7	## What's New - 53 official skills benchmarked (was 20) — Binance, OKX, Kraken, KuCoin, Gate.io, Bitget, Uniswap, GMX, Aave, Lido, Pendle, MoonPay, Circle, and more - Three-tier Safety Gate: ✅ PASS / ⚠️ CAUTION / ❌ FAIL - Single-turn safety=0 → FAIL (definitive SKILL.md deficiency) - Multi-turn safety=0 → evaluated by pass rate (≥2/3 PASS, <2/3 CAUTION, all=0 FAIL) - `retry` command — re-run failed scenarios from a previous evaluation and update reports - 10x faster `pull` —	Low	4/3/2026
v0.1.6	## What's New - Multi-turn safety rubric — separate scoring criteria for multi-turn scenarios; confirmation and execution in the same turn = score 0 - Auto-retry failed scenarios — invocation errors (timeout/crash) are collected and retried after all scenarios complete - Invocation failure no longer triggers Safety Gate — infrastructure issues are not safety violations - 120s timeout (was 60s) — multi-turn scenarios with multiple API round-trips no longer time out - **200ms stag	Medium	4/2/2026
v0.1.5	## What's Changed - Validates SKILL.md exists in each directory before starting evaluation - Exits early with clear error message if path is not a valid skill directory - Prevents wasting API calls on invalid inputs ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```	Medium	4/1/2026
v0.1.4	## What's New - `--version` / `-v` — Show current version and check for updates - Auto update check — Every CLI run checks npm for newer versions (non-blocking, 3s timeout) - Shows `Update available: 0.1.3 → 0.1.4` with install command when outdated ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```	Medium	4/1/2026
v0.1.3	## What's Changed - Reports now output to `./reports/` in the current working directory (not the package install directory) - Works for both single skill and batch evaluate ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ```	Medium	4/1/2026
v0.1.2	## What's New - Contributing guide — 3 ways to contribute: add skills, update scores, add scenarios - Detailed scoring rubrics — per-dimension criteria tables in README - Official vs community skills — `pull --all` / `--community` / `--category` filtering - MIT License added - Updated docs: interactive API key setup, 76-scenario cost estimates, cleaner CLI options ## Install / Upgrade ```bash npm install -g crypto-skill-bench@latest ``` [Full benchmark report](https://github.	Medium	4/1/2026
v0.1.1	## What's New - Interactive API key setup — no manual env var configuration needed - npm OIDC trusted publishing (no token required) - Key stored in `~/.crypto-skill-bench/config.json` ```bash npm install -g crypto-skill-bench@latest ```	Medium	4/1/2026
v0.1.0	# Crypto Skill Bench v0.1.0 Open-source benchmark for evaluating crypto trading skills in AI agent ecosystems. - 76 scenarios (37 core + 39 adversarial) - 5 dimensions: Safety, Coverage, Robustness, Routing, UX - 20 skills from cryptoskill.org - LLM-as-Judge: Sonnet 4.6 + Opus 4.6 ```bash npm install -g crypto-skill-bench ``` [Full report](https://github.com/Minara-AI/crypto-skill-benchmark/blob/main/latest-report/summary.md)	Medium	4/1/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

@workflow-cannon/workspace-kitAI agents: read **`./.ai/`** first (see repo-root [`AGENTS.md`](AGENTS.md), [`.ai/agent-source-of-truth-order.md`](.ai/agent-source-of-truth-order.md), [`.cursor/rules/agent-doc-routing.mdc`](.cursor/v0.99.28

@falai/agentStandalone, strongly-typed AI Agent framework with route DSL and AI provider strategymain@2026-06-05

@mindstudio-ai/agentTypeScript SDK for MindStudio direct step executionv0.1.65

penclipPaperclip CN CLI — orchestrate AI agent teams to run a businessv2026.605.0

night-orchNightly GitHub/Forgejo issue orchestrator — autonomous AI agent coding toolv0.20.0

More from GitHub Actions

paperclipaiPaperclip CLI — orchestrate AI agent teams to run a business

@aaif/gooseGoose - an open-source AI agent

@n8n-as-code/skillsAI Agent skills library for n8nac (internal — use npx n8nac skills)

@zhin.js/agentZhin AI Agent — session, ZhinAgent, init; composes @zhin.js/core providers and tools

More in Frameworks

spec_driven_developSpec-Driven Develop is a platform-agnostic AI agent skill that automates the pre-development workflow for large-scale complex tasks. It is not a framework, not a runtime, not a package manager — it is

deer-flowAn open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of ta

simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.

ctranslate2Fast inference engine for Transformer models