# deepeval

> The LLM Evaluation Framework

- **URL**: https://www.freshcrate.ai/projects/deepeval
- **Author**: confident-ai
- **Category**: Frameworks
- **Latest version**: `v4.0.5` (2026-05-28)
- **License**: Apache-2.0
- **Source**: https://github.com/confident-ai/deepeval
- **Homepage**: https://deepeval.com
- **Language**: Python
- **GitHub**: 14,911 stars, 1,374 forks
- **Registry**: github
- **Tags**: `evaluation-framework`, `evaluation-metrics`, `llm-evaluation`, `llm-evaluation-framework`, `llm-evaluation-metrics`, `python`

## Description

The LLM Evaluation Framework

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v4.0.5` | 2026-05-28 | High | ### New Feature  - Add support for the `claude-opus-4-8` model preset, including multimodal and structured output capabilities with updated pricing metadata. ([#2698](https://github.com/confident-ai/deepeval/pull/2698)) ([Vamshi Adimalla](https://github.com/A-Vamshi)) |
| `v4.0.3` | 2026-05-21 | High | ### New Features  - Add a simulation graph API to control how user turns are generated during conversation simulation. `ConversationSimulator` now accepts `simulation_graph`, and `controller` is deprecated in favor of `stopping_controller` with a warning for legacy usage. ([#2678](https://github.com/confident-ai/deepeval/pull/2678)) ([Jeffrey Ip](https://github.com/penguine-ip)) - Add support for `retrieval_context` entries as `RetrievedContextData` with `context` and `source`, enabling conte |
| `v4.0.2` | 2026-05-13 | High | DeepEval 4.0 introduces an agent-native evaluation workflow designed for coding agents, rapid debugging, and production AI systems.  If you're vibe coding agents, on something like claude code, this release is for you.  ## Eval Harness for Coding Agents  Coding agents can now run eval-driven iterations directly in context.  - Agents see metric failures, scores, and reasoning inline - Supports iterative patch → eval → retry workflows - Built for Cursor, Claude Code, Codex, and agentic d |
| `v3.9.5` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |
| `v3.9.7` | 2025-12-01 | Low | # Full support for agentic evals :)  If you're building agents, DeepEval can now analyze and give you metric scores based on the trace of your LLM app.  ## 🎯 1. Task Completion Evaluate whether an agent *actually completes the intended task*, not just whether its final output “looks correct.”  Captures: - Goal completion   - Intermediate step correctness   - Error recovery   - Procedural accuracy    Docs: https://deepeval.com/docs/metrics-task-completion  ---  ## 🔧 2. Tool Cor |

## Citation

- HTML: https://www.freshcrate.ai/projects/deepeval
- Markdown: https://www.freshcrate.ai/projects/deepeval.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/deepeval/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._
