#1langfusev3.194.0Best overall observability stack⭐25,291 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Best for: teams that want tracing, prompt/version visibility, evals, and operational debugging in one place
A strong fit when you need a broad open source platform instead of one narrow tracing view.
#2mlflowv3.14.0Best for eval-heavy production loops⭐25,479 The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin
Best for: teams that need experiment tracking, evaluation, and agent-quality monitoring across production systems
Useful when agent work must connect to a broader ML and evaluation operating model.
#3phoenixarize-phoenix-v17.6.0Best for tracing and diagnosis⭐9,377 AI Observability & Evaluation
Best for: builders who need fast visibility into spans, prompts, retrieval paths, and failure cases
Great when the immediate bottleneck is understanding what the agent actually did and where it went wrong.
#4agentopsv3.1.0Best for coding-agent feedback loops⭐307 The operational layer for coding agents. Memory, validation, and feedback loops that compound between sessions.
Best for: operators who want validation, memory, and session-level operational feedback around agent runs
Good fit when the real need is operational control and compounding feedback rather than generic logging alone.