freshcrate
Skin:/
Home > Testing > AgentLint

AgentLint

Lint your repo for AI agent compatibility.

Why this rank:Release freshnessStrong adoptionHealthy release cadence

Description

Lint your repo for AI agent compatibility.

README

AgentLint

AgentLint

Your AI agent is only as good as your repo.
33 checks. 5 dimensions. Evidence-backed.

CI Release License: MIT Checks

Docs ยท Checks ยท Scoring ยท Evidence ยท Contributing


AgentLint finds what's broken โ€” file structure, instruction quality, build setup, session continuity, security posture โ€” and fixes it.

We analyzed 265 versions of Anthropic's Claude Code system prompt, documented the hard limits, audited thousands of real repos, and reviewed the academic research. The result: a single command that tells you exactly what your AI agent is struggling with and why.

Install

npm install -g @0xmariowu/agent-lint

Then start a new Claude Code session:

/al

That's it. AgentLint scans your projects, scores them, shows what's wrong, and fixes what it can.

What you get

$ /al

AgentLint โ€” Score: 68/100

Findability      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  7/10
Instructions     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  8/10
Workability      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  6/10
Safety           โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  5/10
Continuity       โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  7/10

Fix Plan (7 items):
  [guided]   Pin 8 GitHub Actions to SHA (supply chain risk)
  [guided]   Add .env to .gitignore (AI exposes secrets)
  [assisted] Generate HANDOFF.md
  [guided]   Reduce IMPORTANT keywords (7 found, Anthropic uses 4)

Select items โ†’ AgentLint fixes โ†’ re-scores โ†’ saves HTML report

The HTML report shows a segmented gauge, expandable dimension breakdowns with per-check detail, and a prioritized issues list. Before/after comparison when fixes are applied.

AgentLint HTML report

Why this matters

AI coding agents read your repo structure, docs, CI config, and handoff notes. They git push, trigger pipelines, and write files. A well-structured repo gets dramatically better AI output. A poorly structured one wastes tokens, ignores rules, repeats mistakes, and may expose secrets.

AgentLint is built on data most developers never see:

  • 265 versions of Anthropic's Claude Code system prompt โ€” every word added, deleted, and rewritten
  • Claude Code internals โ€” hard limits (40K char max, 256KB file read limit, pre-commit hook behavior) that silently break your setup
  • Production security audits across open-source codebases โ€” the gaps AI agents walk into
  • 6 academic papers on instruction-following, context files, and documentation decay

What it checks

Findability โ€” can AI find what it needs?

Check What Why
F1 Entry file exists No CLAUDE.md = AI starts blind
F2 Project description in first 10 lines AI needs context before rules
F3 Conditional loading guidance "If working on X, read Y" prevents context bloat
F4 Large directories have INDEX >10 files without index = AI reads everything
F5 All references resolve Broken links waste tokens on dead-end reads
F6 Standard file naming README.md, CLAUDE.md are auto-discovered
F7 @include directives resolve Missing targets are silently ignored โ€” you think it's loaded, it isn't

Instructions โ€” are your rules well-written?

Check What Why
I1 Emphasis keyword count Anthropic cut IMPORTANT from 12 to 4 across 265 versions
I2 Keyword density More emphasis = less compliance. Anthropic: 7.5 โ†’ 1.4 per 1K words
I3 Rule specificity "Don't X. Instead Y. Because Z." โ€” Anthropic's golden formula
I4 Action-oriented headings Anthropic deleted all "You are a..." identity sections
I5 No identity language "Follow conventions" removed โ€” model already does this
I6 Entry file length 60-120 lines is the sweet spot. Longer dilutes priority
I7 Under 40,000 characters Claude Code hard limit. Above this, your file is truncated

Workability โ€” can AI build and test?

Check What Why
W1 Build/test commands documented AI can't guess your test runner
W2 CI exists Rules without enforcement are suggestions
W3 Tests exist (not empty shell) A CI that runs pytest with 0 test files always "passes"
W4 Linter configured Mechanical formatting frees AI from guessing style
W5 No files over 256 KB Claude Code cannot read them โ€” hard error
W6 Pre-commit hooks are fast Claude Code never uses --no-verify. Slow hooks = stuck commits

Continuity โ€” can next session pick up?

Check What Why
C1 Document freshness Stale instructions are worse than no instructions
C2 Handoff file exists Without it, every session starts from zero
C3 Changelog has "why" "Updated INDEX" says nothing. "Fixed broken path" says everything
C4 Plans in repo Plans in Jira don't exist for AI
C5 CLAUDE.local.md not in git Private per-user file. Claude Code requires .gitignore

Safety โ€” is AI working securely?

Check What Why
S1 .env in .gitignore AI's Glob tool ignores .gitignore by default โ€” secrets are visible
S2 Actions SHA pinned AI push triggers CI. Floating tags = supply chain attack vector
S3 Secret scanning configured AI won't self-check for accidentally written API keys
S4 SECURITY.md exists AI needs security context for sensitive code decisions
S5 Workflow permissions minimized AI-triggered workflows shouldn't have write access by default
S6 No hardcoded secrets Detects sk-, ghp_, AKIA, private key patterns in source
S7 No personal paths /Users/xxx/ in source = AI copies and spreads the leak
S8 No pull_request_target AI pushes trigger CI. Elevated permissions = attack vector

Optional: AI Deep Analysis

Spawns AI subagents to find what mechanical checks can't:

  • Contradictory rules that confuse the model
  • Dead-weight rules the model would follow without being told
  • Vague rules without decision boundaries

Optional: Session Analysis

Reads your Claude Code session logs to find:

  • Instructions you repeat across sessions (should be in CLAUDE.md)
  • Rules AI keeps ignoring (need rewriting)
  • Friction hotspots by project

How scoring works

Each check produces a 0-1 score, weighted by dimension, scaled to 100.

Dimension Weight Why?
Instructions 30% Unique value. No other tool checks CLAUDE.md quality
Findability 20% AI can't follow rules it can't find
Workability 20% Can AI actually run your code?
Safety 15% Is AI working without exposing secrets or triggering vulnerabilities?
Continuity 15% Does knowledge survive across sessions?

Scores are measurements, not judgments. Reference values come from Anthropic's own data. You decide what to fix.

Update

claude plugin update agent-lint@agent-lint

Evidence

Every check cites its source. Full citations in standards/evidence.json.

Source Type
Anthropic 265 versions Primary dataset
Claude Code internals Hard limits and observed behavior
IFScale (NeurIPS) Instruction compliance at scale
ETH Zurich Do context files help coding agents?
Codified Context Stale content as #1 failure mode
Agent READMEs Concrete vs abstract effectiveness

Requirements

License

MIT

Release History

VersionChangesUrgencyDate
v1.1.13Hotfix: revert v1.1.12 F003 (the branch-protection.yml realignment that re-added `CodeQL` + `check-test-pairing`). The v1.1.12 release.yml run timed out at attempt 20/20 with `Still missing: CodeQL check-test-pairing`, confirming the rationale for re-adding them was wrong. **Key insight (worth reading)**: - `CodeQL` is a **workflow** name. The check-runs API returns **job** names. The CodeQL workflow's only job is `analyze` โ€” that's what lands as a check-run on every commit. Listing `CodeQL` inHigh4/26/2026
v0.8.6### New checks (4) - **W7** You can now see when `CLAUDE.md` is missing a documented local fast test command โ€” AI agents need a single runnable command (e.g. `pytest tests/unit/` or `npm test`) to verify before pushing. - **W8** You can now detect Node.js projects where `package.json` has no `scripts.test` entry โ€” `npm test` silently fails with "missing script" when agents try to run it. - **H7** You can now detect gate workflows (`test-required`, `*-check`, etc.) that always `exit 0` โ€” warn-onHigh4/23/2026
v0.8.5### Infrastructure - **`scripts/sanitize.sh`** โ€” new read-only pre-release PII audit. Eight checks cover author emails (git log), personal paths (tracked files + commit messages + recent history), Tailscale and mDNS machine hostnames, and optional `.internal-codenames` enforcement across files / commits / branches. Scans tracked-only so untracked test artifacts don't create noise. Mirrors the placeholder filter from `.husky/pre-commit` so documentation examples don't trip it. - **Commit-messageHigh4/18/2026
v0.8.4### Fixed - **`docs/ship-boundary.md`** no longer references artifacts that don't exist in this repo. The v0.8.3 import from VibeKit left behind pointers to `standards/ship-boundary.json`, `.ship-boundary-deny.local`, `bootstrap.sh`, `tests/e2b/`, `configs/**`, `hooks/**`, and rule IDs like `SB-L-01` / `SB-N-05`. SHIP / LOCAL / NEVER examples now match agent-lint's actual layout, and a new "How this is enforced today" section points to the real enforcement surface (`.husky/pre-commit`, `hygieneHigh4/18/2026
v0.8.3### Infrastructure - **Public Repo Hygiene workflow.** New `hygiene.yml` enforces codename, personal-path, and container-image-pin checks on every PR โ€” complementing the existing `author-email.yml` commit-identity gate. - **Workflow Sanity workflow.** New `workflow-sanity.yml` runs actionlint (with shellcheck) plus no-tabs and no-conflict-marker checks whenever `.github/workflows/**` changes. - **CodeQL analysis.** New `codeql.yml` runs `javascript-typescript` scans on every PR, push to `main`,High4/18/2026
v0.8.2### Security - **Symlink attack protection across scanner, fixer, and analyzers.** A malicious repository could place a symlinked `CLAUDE.md` (or `AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) pointing to sensitive host files like `~/.ssh/id_rsa` or `/etc/passwd`. Running scanner, fixer, deep-analyzer, or session-analyzer on such a repo would read, leak (to LLM prompts or output), or overwrite the symlink target. All entry-file resolution now uses `lstat`-based checks that reject symlinks.High4/16/2026
v0.8.1### Fixed - **S7 personal paths check no longer silently fails on git < 2.40.** The `:!__tests__/*` pathspec exclusion triggered `fatal: Unimplemented pathspec magic '_'` on git 2.39 (Debian 12 default). The error was swallowed by `|| true`, causing the check to always report "no personal paths" โ€” even when files contained `/Users/` or `/home/` paths. Fix moves the exclusion from git pathspec to a grep pipe filter. - **I1 emphasis keywords are no longer counted inside code blocks.** `IMPORTANT`High4/16/2026
v0.8.0### Added - **You can now get AgentLint findings in GitHub's Security tab and as inline PR annotations.** Enable with `sarif-upload: 'true'` in your workflow. Findings appear alongside CodeQL and Dependabot alerts โ€” persistent, trackable, and integrated with your existing security notification workflow. SARIF upload requires Code scanning enabled (free for public repos, GHAS for private). - **Inline PR annotations now appear on every run** โ€” even without SARIF/Code scanning. AgentLint emits High4/16/2026
v0.7.1### Added - **You can now install AgentLint on Windows** from inside Git Bash or WSL (#82). `npm install -g @0xmariowu/agent-lint` previously rejected Windows with `EBADPLATFORM` before the installer could even run; that block is gone. - **Clear guidance when bash is missing on Windows.** Running the installer from `cmd.exe` or PowerShell now exits with a message pointing to Git for Windows or WSL instead of a cryptic shell error. ### Fixed - Postinstall detects `claude` cross-platform (`wherHigh4/16/2026
v0.7.0Audit-driven minor release. Dimension count grows from 6 to 8, check count from 42 to 49 โ€” your repo score will shift, because the scanner is now finally counting checks it had been silently dropping. ### Added โ€” two new dimensions, seven new checks `deep-analyzer` and `session-analyzer` were already emitting `D1`-`D3` and `SS1`-`SS4` results, but `weights.json` had no dimension entry for either prefix, so `scorer.js` silently dropped every contribution. The audit caught this and the dimensionHigh4/15/2026
v0.6.2Scanner correctness fixes. Your repo score may change โ€” it is now what scanner says it is. ### Fixed โ€” six silent-failure bugs in scanner All of these made scanner report higher scores than reality: - **S2 (SHA pinning)** now actually checks standard workflow syntax. The grep only matched bare `uses:` keys; the `- uses:` list-item form used by every normal workflow slipped through, so wf_total stayed at 0 and S2 always scored 1 regardless of pinning. - **W6 (pre-commit hook speed)** noHigh4/14/2026
v0.6.1Docs patch. No functional changes. The npm listing for v0.6.0 carried the pre-release README which still said "33 checks / 5 dimensions" and only mentioned Claude Code. This publishes the updated README that documents all 42 checks, 6 dimensions, and multi-platform support (Claude Code, Cursor, Copilot, Gemini, Windsurf, Cline). High4/14/2026
v0.6.0You can now check Claude Code hook and permission config, scan more AI platforms, and wire AgentLint into CI as a GitHub Action. **42 checks across 6 dimensions** (up from 33 across 5). ### New dimension: Harness - H1 โ€” Hook event names valid (catches typos like `preCommit`, `sessionStart` that silently never fire) - H2 โ€” PreToolUse hooks have matcher field (91% of corpus hooks fire on every tool call โ€” major perf tax) - H3 โ€” Stop hook has loop-protection guard (only 5/92 corpus Stop hoMedium4/14/2026
v0.5.1Release pipeline verification. No functional changes. - Chore: CI โ€” allow `dependabot[bot]` in author name check (#55) - Chore: bump actions/setup-node 4.4.0 โ†’ 6.3.0 (#51) - Chore: bump actions/upload-artifact 4.6.2 โ†’ 7.0.0 (#52) - Chore: bump actions/checkout 4.2.2 โ†’ 6.0.2 (#53) - Chore: bump actions/labeler 5.0.0 โ†’ 6.0.1 (#54) - Chore: untrack HANDOFF.md as local dev notes (#56) High4/11/2026
v0.5.0You can now measure scanner accuracy against 4,533 real repos. - New: Corpus-wide accuracy benchmark โ€” 149,589 labeled data points (4,533 repos x 33 checks) - New: Deterministic + LLM labeling pipeline (auto-label-full.js + DashScope qwen-plus batch) - New: Cross-validation merge with conflict detection (merge-labels.js) - New: Per-check precision/recall/F1 comparison with regression detection (compare-results.js) - New: CI accuracy workflow โ€” blocks PRs if scanner precision or recall drops >5%Medium4/10/2026
v0.4.3Release pipeline test. No functional changes. - Test: full release pipeline verification (bump โ†’ CI โ†’ docs โ†’ website) Medium4/6/2026
v0.4.2Docs site consolidated, release pipeline simplified. - Changed: Docusaurus docs moved into main repo (was separate AgentLint repo) - Changed: Push to main auto-deploys docs via GitHub Pages workflow - Changed: Release pipeline no longer needs cross-repo sync for docs - Changed: Release validates check_count consistency (weights.json vs metadata vs README) - New: SVG favicon (green A on brand color) - Fix: MDX angle-bracket parsing (switched to markdown format) Medium4/6/2026
v0.4.1Docs site, npm fixes, release automation. - New: Docusaurus docs site at docs.agentlint.app (replaces Jekyll) - New: Ionic-inspired theme with SCSS component partials, dark mode, custom Prism syntax colors - New: `release-metadata.json` โ€” single source of truth for version, check counts, dimension data - New: `scripts/generate-metadata.sh` โ€” auto-derives counts from weights.json - New: Cross-repo release sync โ€” tag push auto-updates docs site and website - Changed: `scripts/bump-version.sh` nowMedium4/6/2026
v0.4.033 checks. Two new safety checks, hardened dev workflow. - New: S7 โ€” detects personal filesystem paths in source files - New: S8 โ€” detects `pull_request_target` trigger in GitHub Actions workflows - New: pre-commit hook with author whitelist, codename scan, PII scan, secret detection, shellcheck - New: CI author-email check โ€” validates commit author uses noreply email and pseudonym - Fix: `set -euo pipefail` in all shell scripts with guarded pipe exits - Fix: gitleaks allowlist for documentatioMedium4/4/2026
v0.3.2Security hardening + privacy cleanup. - Fix: eliminate RCE in W6 hook check โ€” static analysis replaces direct execution of user repo hooks - Fix: `pull_request_target` โ†’ `pull_request` in PR lint workflow - Fix: path traversal guard in fixer (rejects non-git directories) - Fix: file probe oracle in scanner (skips absolute paths in reference resolution) - Fix: XSS escaping for all HTML report template values - Fix: Python injection in bump-version.sh (environment variables, not string interpolatMedium4/4/2026
v0.3.1HTML report redesign. - New: HTML report matches approved visual design โ€” segmented arc gauge, expandable dimension rows, check items with status dots, numbered issues list - New: Before/after comparison in HTML โ€” ghost gauge segments, delta pills, fixed/improved badges on checks - New: HTML escaping for all user-provided content (XSS safety) - New: Version badge in report header (read from package.json) - Removed: radar chart, metric cards grid, data table, topbar from HTML report Medium4/4/2026
v0.1.4You can now see what needs fixing before choosing. Scanner finds nested repos. - Fix plan prints a readable summary before asking which items to fix (was hidden in collapsed output) - Scanner discovers projects up to 3 levels deep (was 1 โ€” missed nested repos) - Fix: plan-generator now includes score in output items (was dropped, showed as -1) - Fix: severity thresholds in docs now match code (<0.5 = high) - Fix: 3 security workflows using nonexistent actions/checkout@v6 - Fix: broken referenceMedium4/4/2026
v0.1.3Fix: plugin was not discoverable โ€” missing marketplace.json. - You can now actually install with `extraKnownMarketplaces` + `enabledPlugins` - Fix: added `.claude-plugin/marketplace.json` (tells Claude Code this repo is a marketplace with a plugin) Medium4/4/2026
v0.1.2Fix: `/al` is a user command, not an internal skill. - You can now `/al` in any Claude Code session after install - Fix: moved `skills/hh/SKILL.md` โ†’ `commands/al.md` (command = user-invocable, skill = internal) - Fix: simplified plugin.json to match official plugins (name + description + author only) - Fix: added `allowed-tools` to command frontmatter Medium4/4/2026
v0.1.1Fix: plugin format was wrong, users couldn't install from GitHub. - You can now install via `extraKnownMarketplaces` and it actually works - Fix: moved `plugin.json` to `.claude-plugin/plugin.json` - Fix: moved `skills/al.md` to `skills/hh/SKILL.md` (directory format) - Fix: removed explicit skills array from plugin.json (auto-discovered) Medium4/4/2026
v0.1.0First release. You can now: - Run `/al` in Claude Code to diagnose all your projects - See a score out of 100 across 4 dimensions (Findability, Instructions, Workability, Continuity) - Get a fix plan grouped by severity with auto/assisted/guided actions - Execute fixes automatically (broken references, missing files) or get guidance - Optionally run AI Deep Analysis to find contradictions, dead-weight rules, and vague rules - Optionally run Session Analysis to find repeated instructions and friMedium4/4/2026
v0.3.0## v0.3.0 (2026-04-04) Medium4/4/2026
v0.2.0## v0.2.0 (2026-04-04) Medium4/4/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

pilot#1 Terminal Benchmark 2.0 โ€” AI that ships your tickets.v2.166.12
maestro-orchestrateMulti-agent orchestration platform for Gemini CLI and Claude Code โ€” 22 specialists, parallel subagents, persistent sessions, and built-in code review, debugging, security, SEO, accessibility, and compv1.6.4
AutosearchSelf-evolving deep research system for Claude Code. Zero API keys.v2026.04.26.2
flow-nextPlan-first AI workflow plugin for Claude Code, OpenAI Codex, and Factory Droid. Zero-dep task tracking, worker subagents, Ralph autonomous mode, cross-model reviews.flow-next-v1.6.0
learn-claude-codeBuild a minimal Claude-style AI agent from scratch that processes messages, interacts with tools, and generates dynamic responses.main@2026-06-04

More from 0xmariowu

AutosearchSelf-evolving deep research system for Claude Code. Zero API keys.

More in Testing

vector-db-benchmarkFramework for benchmarking vector search engines
GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issuesโ€”such as security vulnerabilities, bugs, and maintainability concerns.
mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v
llm_context_benchmarks ๐Ÿ“Š LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz