freshcrate — Search

Search results for "evals"

38 results found

phoenix 📁arize-phoenix-v14.9.1🌳 Mature⭐9,209

AI Observability & Evaluation

agents ai-monitoring ai-observability aiengineering anthropic datasets evals jupyter notebook langchain prompt-engineeringby Arize-aiJupyter Notebook

jarvis 📁v1.28.0🌿 Growing⭐174

Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects

ai assistant health machine-learning mcp nutrition privacy private pythonby isairPython

pydantic-ai 📁v1.84.1🌳 Mature⭐16,274

AI Agent Framework, the Pydantic way

agent-framework genai llm pydantic pythonby pydanticPython

langchain 📁langchain-core==1.3.0🌳 Mature⭐133,178

The agent engineering platform

agents ai ai-agents anthropic chatgpt deepagents enterprise framework pythonby langchain-aiPython

RAGHub 📁main@2026-04-17🌳 Mature⭐1,712

A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.

ai artificial-intelligence large-language-models llm machine-learning natural-language-processing nlp open-sourceby Andrew-Jang

langfuse 📁v3.169.0🌿 Growing⭐24,578

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics autogen evaluation langchain large-language-models llama-index llm llm-evaluation prompt-engineering typescriptby langfuseTypeScript

logfire 📁v4.32.1🌿 Growing⭐4,161

AI observability platform for production LLM and agent systems.

agent-observability ai ai-observability ai-tools evals fastapi llm-observability logging pythonby pydanticPython

sample-agentic-frameworks-on-aws 📁main@2026-04-17🌿 Growing⭐250

Build Agentic AI solutions on AWS, using latest OSS Agentic Frameworks.

a2a-protocol agentic-ai arize-phoenix crewai jupyter notebook langgraph llamaindex mem0 pipecatby aws-samplesJupyter Notebook

fast-agent 📁v0.6.17🌿 Growing⭐3,740

Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support

acp agent agent-framework agent-skills cli mcp mcp-client mcp-server pythonby evalstatePython

voratiq 📁main@2026-04-21🌿 Growing⭐65

Agent ensembles to design, generate, and select the best code for every task.

agent-orchestration claude-code cli code-generation codex coding-agents evals gemini-cli typescriptby voratiqTypeScript

evals 📁v0.1.15🌿 Growing⭐103

A comprehensive evaluation framework for AI agents and LLM applications.

agentic agentic-ai ai evaluation machine-learning python strands-agentsby strands-agentsPython

latitude-llm 📁claude-code-telemetry-0.0.5🌿 Growing⭐3,955

Latitude is the open-source agent engineering platform

typescriptby latitude-devTypeScript

chinese-llm-benchmark 📁v5.9🌿 Growing⭐5,841

ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括359个大模型，覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型，以及step3.5-flash、kimi-k2.5、ernie4.5、Min

agentic-ai artificial-intelligence llm-agent llm-evaluationby jeinlee1991

vobase 📁create-vobase@0.6.2🌱 Seedling⭐43

The app framework built for AI coding agents. Own every line. Your AI already knows how to build on it.

better-auth bun claude drizzle-orm flyio honojs mcp rag typescriptby vobaseTypeScript

promptfoo 📁code-scan-action-0.1.5🌿 Growing⭐19,943

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation typescriptby promptfooTypeScript

awesome-prompts 📁main@2026-04-21🌿 Growing⭐7,572

Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.

awesome awesome-list chatgpt gpt4 gpts gptstore papers prompt prompt-engineeringby ai-boost

honcho 📁main@2026-04-21🌿 Growing⭐2,030

Memory library for building stateful agents

agent-memory ai ai-agents ai-memory anthropic context-engineering continual-learning embeddings pythonby plastic-labsPython

LLM-Agent-Paper-daily 📁main@2026-04-21🌱 Seedling⭐20

Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)

llm llm-agent pythonby Lyz103Python

Cogitator-AI 📁main@2026-04-21🌱 Seedling⭐35

🤖 Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.

agent agentic-ai agentic-framework agentic-workflow ai ai-framework automation gemini typescriptby cogitator-aiTypeScript

trulens 📁trulens-2.7.2🌱 Seedling⭐3,237

Evaluation and Tracking for LLM Experiments and AI Agents

agent-evaluation agentops ai-agents ai-monitoring ai-observability evals explainable-ml llm-eval pythonby trueraPython

langgraphjs 📁@langchain/langgraph-sdk@1.8.9🌿 Growing⭐2,775

Framework to build resilient language agents as graphs.

agents ai artificial-intelligence generative-ai llm node typescriptby langchain-aiTypeScript

Awesome-Agent-Memory 📁main@2026-04-16🌿 Growing⭐333

Curated systems, benchmarks, and papers etc. on memory for LLMs/MLLMs --- long-term context, retrieval, and reasoning.

agent-memory ai-agent ai-agent-memory awesome-agent-memory llm-memory memory memory-management multimodal-llm-memoryby TeleAI-UAGI

mastra 📁@mastra/core@1.24.0🌱 Seedling⭐22,899

From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.

agents ai chatbots evals javascript llm mcp nextjs typescriptby mastra-aiTypeScript

voltagent 📁@voltagent/server-elysia@2.0.7🌿 Growing⭐7,851

AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework

agents ai ai-agents ai-agents-framework aiagentframework chatbots chatgpt framework typescriptby VoltAgentTypeScript

sv-excel-agent 📁0.0.0🌱 Seedling⭐179

An Excel AI agent that uses MCP tools to let LLMs read, edit, and automate Excel spreadsheets.

pythonby SylvianAIPython

deepeval 📁v3.9.5🌳 Mature⭐14,701

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics pythonby confident-aiPython

agent-skills-standard 📁php-v1.3.2🌱 Seedling⭐391

A collection of Agent Skills Standard and Best Practice for Programming Languages, Frameworks that help our AI Agent follow best practies on frameworks and programming laguages

agent-agentic-ai android angular best-practices coding-standards cursor-rules flutter typescriptby HoangNguyen0403TypeScript

everything-claude-code 📁v1.10.0🌱 Seedling⭐151,139

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

ai-agents anthropic claude claude-code developer-tools javascript llm mcp productivityby affaan-mJavaScript

instructor 📁v1.15.1🌱 Seedling⭐12,743

structured outputs for llms

openai openai-function-calli openai-functions pydantic-v2 python validationby jxnlPython

tensorzero 📁2026.4.0🌱 Seedling⭐11,204

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt rustby tensorzeroRust

agent2 📁v0.1.0🌱 Seedling⭐25

The production runtime for AI agents. Schema in, API out. Built on PydanticAI + FastAPI.

agent-runtime ai-agents ai-framework developer-tools docker enterprise-ai fastapi llm pythonby duozokkerPython

membrane 📁v0.2.0🌱 Seedling⭐75

A selective learning and memory substrate for agentic systems — typed, revisable, decayable memory with competence learning and trust-aware retrieval.

agent agent-framework agent-memory agent-skills agentic ai-agents autonomous-agents collaborate goby GustyCubeGo

mattermost-plugin-agents 📁v1.14.0🌱 Seedling⭐217

Mattermost Agents plugin supporting multiple LLMs

ai go llm mattermost mattermost-pluginby mattermostGo

vassiliylakhonin.github.io 📁v0.2.0🌱 Seedling⭐4

AI-indexed portfolio and CV site with machine-readable profile data, evidence-backed case studies, verification signals, and a live MCP endpoint for agent access.

agent-discovery ai-search cv geo-seo html json-resume llms-txt machine-readable mcpby vassiliylakhoninHTML

sec-edgar-mcp 📁v1.0.8🌱 Seedling⭐245

A SEC EDGAR MCP (Model Context Protocol) Server

ai artificial-intelligence edgar edgar-database finance genai llm mcp pythonby stefanoamorelliPython

ragas 📁v0.4.3🌱 Seedling⭐13,329

Supercharge Your LLM Application Evaluations 🚀

evaluation llm llmops pythonby explodinggradientsPython

gptme 📁v0.31.0🌱 Seedling⭐4,266

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonomous agent on top!

agent agents ai-agents ai-assistant anthropic chatbot chatgpt cli pythonby gptmePython

Agentic-AI-Pipeline 📁v1.0.0💤 Dormant⭐57

🦾 A production‑ready research outreach AI agent that plans, discovers, reasons, uses tools, auto‑builds cited briefings, and drafts tailored emails with tool‑chaining, memory, tests, and turnkey Dock

agent agentic-ai anthropic anthropic-ai aws chromadb docker duckduckgo pythonby hoangsonwwPython