Home > Testing > promptfoo

promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation typescript

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

README

Promptfoo: LLM evals & red teaming

promptfoo is a CLI and library for evaluating and red-teaming LLM apps. Stop the trial-and-error approach - start shipping secure, reliable AI apps.

Website · Getting Started · Red Teaming · Documentation · Discord

Promptfoo is now part of OpenAI. Promptfoo remains open source and MIT licensed. Read the company update.

Quick Start

npm install -g promptfoo
promptfoo init --example getting-started

Also available via brew install promptfoo and pip install promptfoo. You can also use npx promptfoo@latest to run any command without installing.

Most LLM providers require an API key. Set yours as an environment variable:

export OPENAI_API_KEY=sk-abc123

Once you're in the example directory, run an eval and view results:

cd getting-started
promptfoo eval
promptfoo view

See Getting Started (evals) or Red Teaming (vulnerability scanning) for more.

What can you do with Promptfoo?

Test your prompts and models with automated evaluations
Secure your LLM apps with red teaming and vulnerability scanning
Compare models side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and more)
Automate checks in CI/CD
Review pull requests for LLM-related security and compliance issues with code scanning
Share results with your team

Here's what it looks like in action:

It works on the command line too:

It also can generate security vulnerability reports:

Why Promptfoo?

Developer-first: Fast, with features like live reload and caching
Private: LLM evals run 100% locally - your prompts never leave your machine
Flexible: Works with any LLM API or programming language
Battle-tested: Powers LLM apps serving 10M+ users in production
Data-driven: Make decisions based on metrics, not gut feel
Open source: MIT licensed, with an active community

Learn More

Contributing

We welcome contributions! Check out our contributing guide to get started.

Join our Discord community for help and discussion.

Release History

Version	Changes	Urgency	Date
0.121.14	## [0.121.14](https://github.com/promptfoo/promptfoo/compare/0.121.13...0.121.14) (2026-06-02) ### Features * add A2A provider ([#9586](https://github.com/promptfoo/promptfoo/issues/9586)) ([963b264](https://github.com/promptfoo/promptfoo/commit/963b264ba22d621282d0bf82efdae2b5defe6d59)) * assertions: add agent-rubric grader ([#9453](https://github.com/promptfoo/promptfoo/issues/9453)) ([cadb3c5](https://github.com/promptfoo/promptfoo/commit/cadb3c500277464f05244c8bc8525c2725aa5c22)) * **	High	6/2/2026
code-scan-action-0.1.7	## [0.1.7](https://github.com/promptfoo/promptfoo/compare/code-scan-action-0.1.6...code-scan-action-0.1.7) (2026-05-29) ### Bug Fixes * code-scan: emit structured fork PR skip output ([#9426](https://github.com/promptfoo/promptfoo/issues/9426)) ([61c624c](https://github.com/promptfoo/promptfoo/commit/61c624c7f91808a6f59d8b837dbb3896dd9a74c0)) * code-scan: honor minimum-severity alias when min-severity is unset ([#9433](https://github.com/promptfoo/promptfoo/issues/9433)) ([ea5ea9e](ht	High	5/29/2026
code-scan-action-0.1.6	## [0.1.6](https://github.com/promptfoo/promptfoo/compare/code-scan-action-0.1.5...code-scan-action-0.1.6) (2026-05-21) ### Features * code-scan: add SARIF output support ([#9161](https://github.com/promptfoo/promptfoo/issues/9161)) ([4da26e9](https://github.com/promptfoo/promptfoo/commit/4da26e95e4837ad9fd3363dfb52a86e5e1ceb66d)) * code-scan: refine SARIF output ergonomics ([#9159](https://github.com/promptfoo/promptfoo/issues/9159)) ([ea3a655](https://github.com/promptfoo/promptfoo/	High	5/21/2026
0.121.11	## [0.121.11](https://github.com/promptfoo/promptfoo/compare/0.121.10...0.121.11) (2026-05-08) ### Features * quiverai: add Arrow 1.1 models, vectorize endpoint, and GPT Image-2 pipeline ([#9139](https://github.com/promptfoo/promptfoo/issues/9139)) ([ce2c62d](https://github.com/promptfoo/promptfoo/commit/ce2c62d4f9cfd92bd8e48f45db2314271946c467)) ### Bug Fixes * redteam: handle MCP target prompt materialization ([#9149](https://github.com/promptfoo/promptfoo/issues/9149)) ([a050023	High	5/8/2026
0.121.9	## [0.121.9](https://github.com/promptfoo/promptfoo/compare/0.121.8...0.121.9) (2026-04-27) ### Features * providers: add gpt-5.5 model support ([#8884](https://github.com/promptfoo/promptfoo/issues/8884)) ([8c5dc92](https://github.com/promptfoo/promptfoo/commit/8c5dc929a15e3f9c859f930cc71a6f7093bf666e)) ### Bug Fixes * cli: align command-line reference with CLI ([#8900](https://github.com/promptfoo/promptfoo/issues/8900)) ([c4ce0d4](https://github.com/promptfoo/promptfoo/commit/c4	High	4/27/2026
0.121.8	## [0.121.8](https://github.com/promptfoo/promptfoo/compare/0.121.7...0.121.8) (2026-04-24) ### Features * claude-agent-sdk: bump to 0.2.116 and add title option ([#8858](https://github.com/promptfoo/promptfoo/issues/8858)) ([9bca53a](https://github.com/promptfoo/promptfoo/commit/9bca53a2502be2395690019fad65b5a008f14c05)) * providers: add GPT-5.5 OpenAI support ([#8873](https://github.com/promptfoo/promptfoo/issues/8873)) ([6488623](https://github.com/promptfoo/promptfoo/commit/648862	High	4/24/2026
0.121.6	## [0.121.6](https://github.com/promptfoo/promptfoo/compare/0.121.5...0.121.6) (2026-04-18) ### Features * anthropic: add support for Claude Opus 4.7 ([#8763](https://github.com/promptfoo/promptfoo/issues/8763)) ([bcde21d](https://github.com/promptfoo/promptfoo/commit/bcde21d90731ca20781c3d7ebb34567de13e3044)) * claude-agent-sdk: bump to 0.2.112 and expose exclude_dynamic_sections ([#8767](https://github.com/promptfoo/promptfoo/issues/8767)) ([7abb3b7](https://github.com/promptfoo/pro	High	4/22/2026
code-scan-action-0.1.5	## [0.1.5](https://github.com/promptfoo/promptfoo/compare/code-scan-action-0.1.4...code-scan-action-0.1.5) (2026-04-14) ### Bug Fixes * app: clarify attack success rate label ([#8387](https://github.com/promptfoo/promptfoo/issues/8387)) ([7482eff](https://github.com/promptfoo/promptfoo/commit/7482eff88f193e857822b43da040638eb4ae1565)) * code-scan: avoid npm before env for MCP npx ([#8515](https://github.com/promptfoo/promptfoo/issues/8515)) ([7d2eacd](https://github.com/promptfoo/prom	High	4/14/2026
0.121.5	## [0.121.5](https://github.com/promptfoo/promptfoo/compare/0.121.4...0.121.5) (2026-04-14) ### Features * providers: add Abliteration provider ([b29fa9a](https://github.com/promptfoo/promptfoo/commit/b29fa9a475315cc97d57a5616d08e9b099d8f66b)) * providers: add OpenAI Codex app-server provider ([#8578](https://github.com/promptfoo/promptfoo/issues/8578)) ([a403dd1](https://github.com/promptfoo/promptfoo/commit/a403dd17b012029bbd4323e3d95e44e5366d08a3)) * providers: let anthropic:me	Medium	4/14/2026
0.121.4	## [0.121.4](https://github.com/promptfoo/promptfoo/compare/0.121.3...0.121.4) (2026-04-10) ### Features * allow per-test opt-out of defaultTest assertions ([5e5959e](https://github.com/promptfoo/promptfoo/commit/5e5959ecc6984fe34df0c3fa74aa231fdc9ea972)) * codex: expand Codex SDK eval controls and docs ([#8433](https://github.com/promptfoo/promptfoo/issues/8433)) ([80c3f7f](https://github.com/promptfoo/promptfoo/commit/80c3f7f25431e7a6319df54b46b4cd283f4b6b8c)) * eval: group serial g	High	4/11/2026
0.121.3	## [0.121.3](https://github.com/promptfoo/promptfoo/compare/0.121.2...0.121.3) (2026-03-24) ### Features * add block-no-verify PreToolUse hook to .claude/settings.json ([#8234](https://github.com/promptfoo/promptfoo/issues/8234)) ([29a856a](https://github.com/promptfoo/promptfoo/commit/29a856a8fa2defba5bc8362ea6e14364b7e624ce)) * add new config options to composite jailbreak strategy ([#7693](https://github.com/promptfoo/promptfoo/issues/7693)) ([071d345](https://github.com/promptfoo/promptfo	Medium	3/24/2026
0.121.2	## [0.121.2](https://github.com/promptfoo/promptfoo/compare/0.121.1...0.121.2) (2026-03-12) ### Bug Fixes * add node-addon-api to devDependencies for sharp build ([#8102](https://github.com/promptfoo/promptfoo/issues/8102)) ([1d4e959](https://github.com/promptfoo/promptfoo/commit/1d4e9596f2199ade67e4b65207b8f99b7c2b1b3b)) * deps: update dependency @tanstack/react-virtual to ^3.13.20 ([#8083](https://github.com/promptfoo/promptfoo/issues/8083)) ([5e5f774](https://github.com/promptfoo/promp	Low	3/12/2026
0.121.1	## [0.121.1](https://github.com/promptfoo/promptfoo/compare/0.121.0...0.121.1) (2026-03-09) ### Bug Fixes * providers: support newer opencode sdk api ([#8060](https://github.com/promptfoo/promptfoo/issues/8060)) ([7ec80b2](https://github.com/promptfoo/promptfoo/commit/7ec80b2e173dc99438002c8f5d16feb7b6643aa1))	Low	3/9/2026
0.121.0	## [0.121.0](https://github.com/promptfoo/promptfoo/compare/0.120.27...0.121.0) (2026-03-09) ### ⚠ BREAKING CHANGES * providers: resolve relative config paths against config dir in claude-agent-sdk ([#8030](https://github.com/promptfoo/promptfoo/issues/8030)) ### Features * redteam: generalize insurance plugins for all insurance types ([#8002](https://github.com/promptfoo/promptfoo/issues/8002)) ([945c3bc](https://github.com/promptfoo/promptfoo/commit/945c3bc6725ca8bc7369f7d0efed6f8	Low	3/9/2026
0.120.27	## [0.120.27](https://github.com/promptfoo/promptfoo/compare/0.120.26...0.120.27) (2026-03-06) ### Features * add promptfoo-evals agent skill for Claude Code and Codex ([#7985](https://github.com/promptfoo/promptfoo/issues/7985)) ([71160fe](https://github.com/promptfoo/promptfoo/commit/71160fea6aaa3471de9c6027b929830bdd7acfb0)) * app: add media library page ([#6901](https://github.com/promptfoo/promptfoo/issues/6901)) ([4eba85a](https://github.com/promptfoo/promptfoo/commit/4eba85aac7a310	Low	3/6/2026
0.120.26	## [0.120.26](https://github.com/promptfoo/promptfoo/compare/0.120.25...0.120.26) (2026-03-03) ### Features * Add financial:sox-compliance plugin ([#7780](https://github.com/promptfoo/promptfoo/issues/7780)) ([b7cfc8e](https://github.com/promptfoo/promptfoo/commit/b7cfc8e47c5594a498c8a472460896424e5ada52)) * add model-identification plugin ([#7883](https://github.com/promptfoo/promptfoo/issues/7883)) ([a2ac7c6](https://github.com/promptfoo/promptfoo/commit/a2ac7c6139aaed061f41fd7ab83c7562e167	Low	3/3/2026
0.120.25	## [0.120.25](https://github.com/promptfoo/promptfoo/compare/0.120.24...0.120.25) (2026-02-18) ### Features * add regenerate button for suggested policies ([#7652](https://github.com/promptfoo/promptfoo/issues/7652)) ([2b09693](https://github.com/promptfoo/promptfoo/commit/2b096935c26ee148ad28f7bfe47ddad292ce9bd3)) * app: add renderOption prop to Combobox component ([#7723](https://github.com/promptfoo/promptfoo/issues/7723)) ([a609016](https://github.com/promptfoo/promptfoo/commit/a60901	Low	2/18/2026
0.120.24	## [0.120.24](https://github.com/promptfoo/promptfoo/compare/0.120.23...0.120.24) (2026-02-10) ### Features * add --filter-prompts option with MCP alignment ([#7451](https://github.com/promptfoo/promptfoo/issues/7451)) ([e9b53e2](https://github.com/promptfoo/promptfoo/commit/e9b53e2ac83df1f6e98bf9561a6a3c8d87d271af)) * eval: add hidden column indicators and schema-based column visibility persistence ([#7536](https://github.com/promptfoo/promptfoo/issues/7536)) ([8fbeb60](https://github.co	Low	2/10/2026
0.120.23	## [0.120.23](https://github.com/promptfoo/promptfoo/compare/0.120.22...0.120.23) (2026-02-06) ### Bug Fixes * blobs: restore cloud blob upload for shared evals ([#7484](https://github.com/promptfoo/promptfoo/issues/7484)) ([7eb1009](https://github.com/promptfoo/promptfoo/commit/7eb100939c07b0b414459682cdd057e024b39814)) * deps: update dependency @opencode-ai/sdk to ^1.1.48 ([#7499](https://github.com/promptfoo/promptfoo/issues/7499)) ([b081a54](https://github.com/promptfoo/promptfoo/	Low	2/6/2026
0.120.22	## [0.120.22](https://github.com/promptfoo/promptfoo/compare/0.120.21...0.120.22) (2026-02-04) ### Features * redteam: enable multilingual support for audio/video/image strategies ([#7485](https://github.com/promptfoo/promptfoo/issues/7485)) ([01b62ce](https://github.com/promptfoo/promptfoo/commit/01b62cee55c4c06edc510d9684a4696bbc62633b)) ### Bug Fixes * app: move rows useMemo after table declaration to fix build ([#7475](https://github.com/promptfoo/promptfoo/issues/7475)) ([d1c2	Low	2/4/2026
0.120.21	## [0.120.21](https://github.com/promptfoo/promptfoo/compare/0.120.20...0.120.21) (2026-02-03) ### Features * app: add print styles to DataTable for light mode printing ([#7365](https://github.com/promptfoo/promptfoo/issues/7365)) ([167b27c](https://github.com/promptfoo/promptfoo/commit/167b27c4b9483173cebe6eb7b3467e72a723fd3f)) * app: improve HTTP endpoint request body editor ([#7438](https://github.com/promptfoo/promptfoo/issues/7438)) ([cfadb37](https://github.com/promptfoo/promptf	Low	2/3/2026
0.120.20	## [0.120.20](https://github.com/promptfoo/promptfoo/compare/0.120.19...0.120.20) (2026-01-29) ### Features * redteam: add email validation to generate command ([#7314](https://github.com/promptfoo/promptfoo/issues/7314)) ([4fffc3a](https://github.com/promptfoo/promptfoo/commit/4fffc3a2827cfadc9213f06167755fa880a2099d)) ### Bug Fixes * deps: update dependency @openai/agents to ^0.4.3 ([#7352](https://github.com/promptfoo/promptfoo/issues/7352)) ([7fbb175](https://github.com/promptf	Low	1/29/2026
0.120.19	## [0.120.19](https://github.com/promptfoo/promptfoo/compare/0.120.18...0.120.19) (2026-01-28) ### Features * app: enhance DataTable with column alignment and styling improvements ([#7349](https://github.com/promptfoo/promptfoo/issues/7349)) ([8b8b122](https://github.com/promptfoo/promptfoo/commit/8b8b1223cef96257783caf69079976090db0062c)) * app: extend UI component interfaces for data-testid support ([#7339](https://github.com/promptfoo/promptfoo/issues/7339)) ([d9dc48a](https://gith	Low	1/28/2026
0.120.18	## [0.120.18](https://github.com/promptfoo/promptfoo/compare/0.120.17...0.120.18) (2026-01-28) ### Features * eval: support multiple --filter-metadata flags with AND logic ([#7317](https://github.com/promptfoo/promptfoo/issues/7317)) ([61d8d17](https://github.com/promptfoo/promptfoo/commit/61d8d174ee756881edac31dcad9861bb14530803)) * providers: add collaboration_mode support to OpenAI Codex SDK ([#7275](https://github.com/promptfoo/promptfoo/issues/7275)) ([a3e6d58](https://github.com	Low	1/28/2026
0.120.17	## [0.120.17](https://github.com/promptfoo/promptfoo/compare/0.120.16...0.120.17) (2026-01-23) ### Features * redteam: add telecom vertical red team plugins ([#7182](https://github.com/promptfoo/promptfoo/issues/7182)) ([678fd1e](https://github.com/promptfoo/promptfoo/commit/678fd1e9828aeece905f6d17984f3478846749ee)) * redteam: add VLSU compositional safety plugin ([#6855](https://github.com/promptfoo/promptfoo/issues/6855)) ([3e30cb0](https://github.com/promptfoo/promptfoo/commit/3e3	Low	1/23/2026
0.120.16	## [0.120.16](https://github.com/promptfoo/promptfoo/compare/0.120.15...0.120.16) (2026-01-21) ### Features * config: add per-test structured output support ([#6239](https://github.com/promptfoo/promptfoo/issues/6239)) ([4629892](https://github.com/promptfoo/promptfoo/commit/4629892c14c8df37d298229209a8a932d607de9f)) * eval: re-enable SIGINT graceful shutdown for eval pause/resume ([#7012](https://github.com/promptfoo/promptfoo/issues/7012)) ([06364ef](https://github.com/promptfoo/pro	Low	1/21/2026
0.120.15	## [0.120.15](https://github.com/promptfoo/promptfoo/compare/0.120.14...0.120.15) (2026-01-20) ### Features * app: add NavigationSidebar component and enhance Tabs ([#7073](https://github.com/promptfoo/promptfoo/issues/7073)) ([55a3125](https://github.com/promptfoo/promptfoo/commit/55a3125e5627c2f1a4981744f536fde71b21a5de)) * app: add Storybook with stories for all UI components ([#7066](https://github.com/promptfoo/promptfoo/issues/7066)) ([53f51cf](https://github.com/promptfoo/promp	Low	1/20/2026
0.120.14	## [0.120.14](https://github.com/promptfoo/promptfoo/compare/0.120.13...0.120.14) (2026-01-14) ### Features * redteam: add numTests config option for strategy test capping ([#7030](https://github.com/promptfoo/promptfoo/issues/7030)) ([0ca5ded](https://github.com/promptfoo/promptfoo/commit/0ca5deda234c22482d43fd0a07f7855a444696ff)) ### Bug Fixes * deps: update @actions/github to v7 and fix workspace config ([#7037](https://github.com/promptfoo/promptfoo/issues/7037)) ([c6b2496](htt	Low	1/14/2026
0.120.13	## [0.120.13](https://github.com/promptfoo/promptfoo/compare/0.120.12...0.120.13) (2026-01-13) ### Features * ui: Add Ink-based interactive list UI foundation ([#7013](https://github.com/promptfoo/promptfoo/issues/7013)) ([84a2ac7](https://github.com/promptfoo/promptfoo/commit/84a2ac7b2f8697d4cdafd0f868778348e6211c0e)) ### Bug Fixes * ui: preserve exact getRowId values in DataTable row selection ([#7032](https://github.com/promptfoo/promptfoo/issues/7032)) ([e78f083](https://github	Low	1/13/2026
0.120.12	## [0.120.12](https://github.com/promptfoo/promptfoo/compare/0.120.11...0.120.12) (2026-01-12) ### Features * app: show provider config details on hover in eval results ([#6757](https://github.com/promptfoo/promptfoo/issues/6757)) ([c790f80](https://github.com/promptfoo/promptfoo/commit/c790f809a0baab2c186ea5be4a787229baad6d2d)) * assertions: add word-count assertion type ([#7028](https://github.com/promptfoo/promptfoo/issues/7028)) ([d21f7a0](https://github.com/promptfoo/promptfoo/co	Low	1/12/2026
0.120.11	## [0.120.11](https://github.com/promptfoo/promptfoo/compare/0.120.10...0.120.11) (2026-01-10) ### Features * app: add Combobox component ([#6946](https://github.com/promptfoo/promptfoo/issues/6946)) ([a1fb9ed](https://github.com/promptfoo/promptfoo/commit/a1fb9ed64d49d4fc4cf58c96dfa89454eddc59d0)) * codeScan: add fork PR authentication support ([#6958](https://github.com/promptfoo/promptfoo/issues/6958)) ([9c0fee4](https://github.com/promptfoo/promptfoo/commit/9c0fee4904af3135492545	Low	1/10/2026
0.120.10	## [0.120.10](https://github.com/promptfoo/promptfoo/compare/0.120.9...0.120.10) (2026-01-06) ### Features * evaluator: enrich error results with provider context and metadata ([#6913](https://github.com/promptfoo/promptfoo/issues/6913)) ([a004182](https://github.com/promptfoo/promptfoo/commit/a0041825b8149e94d25828a43b77787896ba8dc6)) * providers: add Azure AI Foundry video provider (Sora) ([#6890](https://github.com/promptfoo/promptfoo/issues/6890)) ([1479e74](https://github.com/pro	Low	1/6/2026
0.120.9	## [0.120.9](https://github.com/promptfoo/promptfoo/compare/0.120.8...0.120.9) (2025-12-30) ### Features - app: add apiBaseUrl field to provider configuration UI ([#6884](https://github.com/promptfoo/promptfoo/issues/6884)) — @mldangelo - app: add design system, navigation, model audit, and eval creator ([#6823](https://github.com/promptfoo/promptfoo/issues/6823)) — @faizanminhas - cli: add wildcard support for prompt filters ([#6853](https://github.com/promptfoo/promptfoo/issues/6	Low	12/30/2025
0.120.8	## [0.120.8](https://github.com/promptfoo/promptfoo/compare/0.120.7...0.120.8) (2025-12-21) ### Features * redteam: add --description flag to redteam run command ([#6796](https://github.com/promptfoo/promptfoo/issues/6796)) ([95cc2ff](https://github.com/promptfoo/promptfoo/commit/95cc2ffe1075b00620647369beb9bb331af95858)) * server: add configurable base path support ([#6758](https://github.com/promptfoo/promptfoo/issues/6758)) ([9395a28](https://github.com/promptfoo/promptfoo/commit/9	Low	12/21/2025
0.120.7	## [0.120.7](https://github.com/promptfoo/promptfoo/compare/0.120.6...0.120.7) (2025-12-19) ### Features * blob storage ([#6708](https://github.com/promptfoo/promptfoo/issues/6708)) ([73fcd51](https://github.com/promptfoo/promptfoo/commit/73fcd5183bfaa37b76326f21eaeaaaddee264bb9))	Low	12/19/2025
0.120.6	## [0.120.6](https://github.com/promptfoo/promptfoo/compare/0.120.5...0.120.6) (2025-12-19) ### Features * auth: add interactive team selection during login ([#6760](https://github.com/promptfoo/promptfoo/issues/6760)) ([11c7037](https://github.com/promptfoo/promptfoo/commit/11c7037d229d0cfb335fc2e3419de6c210fec7bc)) * bedrock: configurable numberOfResults for Bedrock Knowledge Base ([#6738](https://github.com/promptfoo/promptfoo/issues/6738)) ([f8f0b8b](https://github.com/promptfoo/p	Low	12/19/2025
0.120.5	## [0.120.5](https://github.com/promptfoo/promptfoo/compare/0.120.4...0.120.5) (2025-12-16) ### Features * cli: support multiple --env-file flags ([#6622](https://github.com/promptfoo/promptfoo/issues/6622)) ([015f2df](https://github.com/promptfoo/promptfoo/commit/015f2dfb76be0710a2c98d87fe957060e18de162)) * esm: add resolvePackageEntryPoint for ESM-only packages ([#6586](https://github.com/promptfoo/promptfoo/issues/6586)) ([fbc0eca](https://github.com/promptfoo/promptfoo/commit/fbc0	Low	12/16/2025
0.120.4	## [0.120.4](https://github.com/promptfoo/promptfoo/compare/0.120.3...0.120.4) (2025-12-11) ### Features * providers: add ElevenLabs provider integration ([#6022](https://github.com/promptfoo/promptfoo/issues/6022)) ([8d54faa](https://github.com/promptfoo/promptfoo/commit/8d54faa1c240e28557b1eb652c358a3f8eb4b0a2)) * providers: add GPT-5.2 model support ([#6628](https://github.com/promptfoo/promptfoo/issues/6628)) ([b105980](https://github.com/promptfoo/promptfoo/commit/b105980f121d1b0	Low	12/11/2025
0.120.3	## [0.120.3](https://github.com/promptfoo/promptfoo/compare/0.120.2...0.120.3) (2025-12-10) ### Features * providers: add multi-turn session persistence to browser provider ([#6585](https://github.com/promptfoo/promptfoo/issues/6585)) ([873241e](https://github.com/promptfoo/promptfoo/commit/873241ee0b5692edc74fcb33815b99adfab68a52)) ### Bug Fixes * build: exclude Nunjucks template fixture from TypeScript ([#6588](https://github.com/promptfoo/promptfoo/issues/6588)) ([6f02eec](https	Low	12/10/2025
0.120.2	## [0.120.2](https://github.com/promptfoo/promptfoo/compare/0.120.1...0.120.2) (2025-12-09) ### Features * assertions: tool calling f1 score ([#6548](https://github.com/promptfoo/promptfoo/issues/6548)) ([1327195](https://github.com/promptfoo/promptfoo/commit/13271958b5a48b7d26586daf5f06d98bcdf4d063)) * providers: add Amazon Nova 2 model support with reasoning capabilities ([#6531](https://github.com/promptfoo/promptfoo/issues/6531)) ([3a99c2b](https://github.com/promptfoo/promp	Low	12/9/2025
0.120.1	## [0.120.1](https://github.com/promptfoo/promptfoo/compare/0.120.0...0.120.1) (2025-12-08) ### Features * providers: update claude-agent-sdk to ^0.1.60 with betas and dontAsk support ([#6557](https://github.com/promptfoo/promptfoo/issues/6557)) ([cc3d857](https://github.com/promptfoo/promptfoo/commit/cc3d85763606facb615965ad9288c33650e01512)) ### Bug Fixes * ci: trigger Docker build from release-please workflow ([#6572](https://github.com/promptfoo/promptfoo/issues/6572)) ([6b1790	Low	12/8/2025
0.120.0	## [0.120.0](https://github.com/promptfoo/promptfoo/compare/0.119.14...0.120.0) (2025-12-08) ### Features - build: migrate to ESM (ECMAScript Modules) ([#5594](https://github.com/promptfoo/promptfoo/issues/5594)) ([9cdf09b](https://github.com/promptfoo/promptfoo/commit/9cdf09b1c681454ed3fa047dee41a43fea48028a)) - cli: toggle debug log live ([#6517](https://github.com/promptfoo/promptfoo/issues/6517)) ([6beebce](https://github.com/promptfoo/promptfoo/commit/6beebce4134f0e0dfd54e7f1	Low	12/8/2025
0.119.14	## [0.119.14](https://github.com/promptfoo/promptfoo/compare/0.119.13...0.119.14) (2025-12-01) ### Features * Add web search assertion type ([#5111](https://github.com/promptfoo/promptfoo/issues/5111)) ([11c01cc](https://github.com/promptfoo/promptfoo/commit/11c01cc637efd6867a1e99e44bc8633d324ac66a)) * examples: add Strands Agents SDK example ([#6384](https://github.com/promptfoo/promptfoo/issues/6384)) ([28c3d58](https://github.com/promptfoo/promptfoo/commit/28c3d584f2f820de40a17e6	Low	12/1/2025
0.119.13	## [0.119.13](https://github.com/promptfoo/promptfoo/compare/promptfoo-v0.119.12...promptfoo-v0.119.13) (2025-11-25) ### Features * ecommerce plugin pack ([#6168](https://github.com/promptfoo/promptfoo/issues/6168)) ([152b1ff](https://github.com/promptfoo/promptfoo/commit/152b1ff3f3fdb6ca43a0a5718d463757f63a1814)) ### Bug Fixes * deps: bump posthog-node from 5.13.2 to 5.14.0 for sha1-hulud mitigation ([6a44eda](https://github.com/promptfoo/promptfoo/commit/6a44eda819f48273230853cc8692b	Low	11/25/2025
0.119.12	## [0.119.12](https://github.com/promptfoo/promptfoo/compare/promptfoo-v0.119.11...promptfoo-v0.119.12) (2025-11-24) ### Features * changelog automation and validation ([#6252](https://github.com/promptfoo/promptfoo/issues/6252)) ([ee74c4a](https://github.com/promptfoo/promptfoo/commit/ee74c4ae7dc01c35dd52d835a19188f06a334a1a)) * providers: add Anthropic structured outputs support ([#6226](https://github.com/promptfoo/promptfoo/issues/6226)) ([1b1b9d2](https://github.com/promptfoo/promptf	Low	11/24/2025
0.119.11	## What's Changed Bug Fixes - fix(deps): update dependency @apidevtools/json-schema-ref-parser to v15 by @renovate[bot] in https://github.com/promptfoo/promptfoo/pull/6300 - fix(redteam): fix template bug in agentic strategies by @mldangelo in https://github.com/promptfoo/promptfoo/pull/6240 - fix: avoid sending target output to cloud if excludeTargetOutputFromAgenticAttackGeneration is set to true @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6320 Chores - revert:	Low	11/24/2025
0.119.10	## What's Changed ### Bug Fixes - fix(providers): LiteLLM API key authentication with LITELLM_API_KEY env var by @mldangelo in #6322 - fix(webui): Basic strategy checkbox behavior in red team setup by @minhle1291 in #6313 - fix(code-scan): prevent GitHub API error when startLine equals line by @yash2998chhabria in #6314 - fix(app): Test generation tooltips remain visible after dialog is rendered by @will-holley in #6309 - fix(webui): allow thumbs up/down ratings to toggle off and remov	Low	11/23/2025
0.119.9	## What's Changed ### Features - feat(webui): add custom policy generation to red team setup by @typpo in https://github.com/promptfoo/promptfoo/pull/6181 - feat(webui): add strategy test generation to red team setup by @will-holley in https://github.com/promptfoo/promptfoo/pull/6005 - feat(webui): add visibility button for PFX passphrase field in red team target configuration by @faizanminhas in https://github.com/promptfoo/promptfoo/pull/6258 ### Bug Fixes - fix(auth): allow CI e	Low	11/20/2025
0.119.8	## What's Changed ### Features - feat(providers): add Gemini 3 Pro support with thinking configuration by @mldangelo in #6241 - feat(plugins): organize domain-specific risks into vertical suites by @typpo in #6215 ### Bug Fixes - fix(code-scan): point at correct cloud production url by @danenania in #6247 - fix(code-scan): ensure no non-json output in 'code-scans run' command with --json flag by @danenania in #6248 - fix: exclude source maps from npm package to reduce bundle size	Low	11/19/2025
0.119.7	## Features - feat(assertions): add dot product and euclidean distance metrics for similarity assertion - use `similar:dot` and `similar:euclidean` assertion types to match production vector database metrics and support different similarity use cases in [#6202](https://github.com/promptfoo/promptfoo/pull/6202) - feat(webui): expose Hydra strategy configuration (max turns and stateful toggle) in red team setup UI in [#6165](https://github.com/promptfoo/promptfoo/pull/6165) - **feat(p	Low	11/18/2025
0.119.6	## What's Changed ### Bug Fixes - fix(redteam): respect redteam.provider configuration for local grading by @mldangelo in #5959 - fix(cli): correct port type handling in view command by @iitslamaa in #6071 - fix(redteam): dynamically update crescendo system prompt with currentRound and successFlag by @yash2998chhabria in #6133 - fix(cli): format object and array variables with pretty-printed JSON by @zsarkis in #6175 - fix(webui): filter hidden metadata keys from metadata filter dropdo	Low	11/12/2025
0.119.5	## What's Changed Features - feat: FERPA red team plugin by @typpo in https://github.com/promptfoo/promptfoo/pull/6130 - feat(redteam): show granular subcategory metrics for harmful plugins by @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6134 - feat: hydra the new advanced multi-turn red team strategy by @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6151 - feat(providers): add variable templating support for initialMessages in simulated-user provider by @mld	Low	11/10/2025
0.119.4	## What's Changed Features - feat(redteam): make meta agent a default strategy by @typpo in https://github.com/promptfoo/promptfoo/pull/6109 Bug Fixes - fix(redteam): make intent for policy more accurate by @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6116 Chores - chore: bump @aws-sdk/client-bedrock-runtime from 3.922.0 to 3.925.0 by @dependabot[bot] in https://github.com/promptfoo/promptfoo/pull/6117 - chore: bump version 0.119.4 by @MrFlounder in h	Low	11/6/2025
0.119.3	## What's Changed Features - feat(webui): add eval copy functionality by @mldangelo in https://github.com/promptfoo/promptfoo/pull/6079 - feat(redteam): add timestamp context to all grading rubrics by @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6110 - feat(redteam): add gradingGuidance UI for plugin-specific grading rules by @MrFlounder in https://github.com/promptfoo/promptfoo/pull/6108 - feat(model-audit): add revision tracking and deduplication for model scans by @	Low	11/5/2025
0.119.2	## [0.119.2] - 2025-11-03 ### Added - feat(integrations): add Microsoft SharePoint dataset support with certificate-based authentication for importing CSV files (#6080) by @tanyapylat - feat(providers): add `initialMessages` support to simulated-user provider for starting conversations from specific states, with support for loading from JSON/YAML files via `file://` syntax (#6090) by @mldangelo - feat(providers): add local config override support for cloud providers - merge local configu	Low	11/3/2025
0.119.1	## What's Changed Bug Fixes - fix(csv): handle primitive values directly in red team CSV export by @sklein12 in https://github.com/promptfoo/promptfoo/pull/6040 - fix(build): removing axios as a runtime dependency in google provider by @jameshiester in https://github.com/promptfoo/promptfoo/pull/6050 - fix(init): cleanup directory and show error message when example fails to download by @LizzHale in https://github.com/promptfoo/promptfoo/pull/6051 - fix(redteam): validate custom str	Low	10/29/2025
0.119.0	# What's Changed ## Features - feat(webui): filter eval results by metric values with numeric operators (EQ, GT, LTE, etc.) by @will-holley in #6011 - feat(providers): 10-100x performance improvement for Python providers with persistent worker pools by @mldangelo in #5968 - feat(providers): add OpenAI Agents SDK integration with support for agents, tools, and handoffs by @mldangelo in #6009 - feat(providers): add function calling/tool support for Ollama by @mldangelo in #5977 - feat(pr	Low	10/28/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

langfuse🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 v3.178.0

giskard-oss🐢 Open-Source Evaluation & Testing library for LLM Agentsgiskard-checks/v1.0.2b3

agent-reviewAnalyze git code changes to generate structured review reports using flexible AI models and integrated workflows.main@2026-06-04

opikDebug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.2.0.56

agentaThe open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.v0.100.9

More in Testing

vector-db-benchmarkFramework for benchmarking vector search engines

GitoAn AI-powered GitHub code review tool that uses LLMs to detect high-confidence, high-impact issues—such as security vulnerabilities, bugs, and maintainability concerns.

mxcliMendix cli tool, a headless way to work with Mendix projects. Enables Mendix projects for use with 3rd party agentic coding tools like Claude Code and Copilot. Includes a starlark linter for quality v

llm_context_benchmarks 📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz