freshcrate — Search

Search results for "lua"

40 results found

opik 📁2.0.6🌳 Mature⭐18,767

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

evaluation hacktoberfest hacktoberfest2025 langchain llama-index llm llm-evaluation llm-observability pythonby comet-mlPython

agenta 📁v0.96.7🌳 Mature⭐4,011

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

agents evaluation llm-as-a-judge llm-evaluation llm-framework llm-monitoring llm-observability llm-platform prompt-engineering typescriptby Agenta-AITypeScript

ouroboros 📁v0.28.8🌳 Mature⭐2,107

Stop prompting. Start specifying.

ai-agent claude-code codex-cli devtools evaluation llm mcp multi-agent prompt-engineering pythonby Q00Python

WeKnora 📁v0.4.0🌳 Mature⭐13,819

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

agent agentic ai chatbot chatbots embeddings evaluation generative-ai goby TencentGo

ai-agents-reality-check 📁0.0.0🌿 Growing⭐57

Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica

agent-architecture agent-benchmark agent-evaluation agent-performance agentic-ai agentic-workflow ai-benchmarking architectural-evaluation llm-agent pythonby Cre4T3Tiv3Python

chinese-llm-benchmark 📁v5.9🌿 Growing⭐5,841

ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括359个大模型，覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型，以及step3.5-flash、kimi-k2.5、ernie4.5、Min

agentic-ai artificial-intelligence llm-agent llm-evaluationby jeinlee1991

arthur-engine 📁2.1.529🌿 Growing⭐75

Make AI work for Everyone - Monitoring and governing for your AI/ML

agentic benchmarking evaluation genai guardrails llm ml monitoring pythonby arthur-aiPython

langfuse 📁v3.169.0🌿 Growing⭐24,578

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

analytics autogen evaluation langchain large-language-models llama-index llm llm-evaluation prompt-engineering typescriptby langfuseTypeScript

promptfoo 📁code-scan-action-0.1.5🌿 Growing⭐19,943

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation typescriptby promptfooTypeScript

arag 📁v0.1.0🌿 Growing⭐247

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

agent agentic-ai agenticrag deepresearch evaluation graphrag llm llmagents pythonby Ayanami0730Python

evals 📁v0.1.15🌿 Growing⭐103

A comprehensive evaluation framework for AI agents and LLM applications.

agentic agentic-ai ai evaluation machine-learning python strands-agentsby strands-agentsPython

langwatch 📁skills@v0.3.0🌿 Growing⭐3,193

The platform for LLM evaluations and AI agent testing

ai analytics datasets dspy evaluation gpt llm llm-ops typescriptby langwatchTypeScript

AI-Infra-Guard 📁v4.1.4🌿 Growing⭐3,428

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

agent agent-security ai-infra ai-red-teaming ai-security llm llm-evaluation llm-jailbreak pythonby TencentPython

OpenClawProBench 📁main@2026-04-15🌿 Growing⭐340

OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

agent benchmark evaluation harness leaderboard llm openclaw pythonby suyoumoPython

oh-my-pi 📁v14.1.2🌿 Growing⭐2,872

⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more

ai-agent ai-coding-agent anthropic bun claude cli coding-assistant llm typescriptby can1357TypeScript

magenta.nvim 📁main@2026-04-21🌿 Growing⭐435

A tool-use-focused LLM plugin for neovim.

typescriptby dlantsTypeScript

deepeval 📁v3.9.5🌳 Mature⭐14,701

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics pythonby confident-aiPython

medusa 📁v2026.5.5🌿 Growing⭐252

AI-first security scanner with 76 analyzers, 9,600+ detection rules, and repo poisoning detection for AI/ML, LLM agents, and MCP servers. Scan any GitHub repo with: medusa scan --git user/repo

agent-security ai-security code-analysis cve-detection devsecops llm-security mcp nextjs pythonby Pantheon-SecurityPython

Matryoshka 📁main@2026-04-18🌿 Growing⭐119

MCP server for token-efficient large document analysis via the use of REPL state

ai-assistant document-analysis llm llm-tools mcp mcp-server model-context-protocol typescriptby yogthosTypeScript

giskard-oss 📁giskard-checks/v1.0.2b1🌱 Seedling⭐5,225

🐢 Open-Source Evaluation & Testing library for LLM Agents

agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation pythonby Giskard-AIPython

trulens 📁trulens-2.7.2🌱 Seedling⭐3,237

Evaluation and Tracking for LLM Experiments and AI Agents

agent-evaluation agentops ai-agents ai-monitoring ai-observability evals explainable-ml llm-eval pythonby trueraPython

paiml-mcp-agent-toolkit 📁v3.14.0🌿 Growing⭐148

Pragmatic AI Labs MCP Agent Toolkit - An MCP Server designed to make code with agents more deterministic

agentic c deno kotlin mcp mcp-server paiml paiml-active-tool rustby paimlRust

mlflow 📁v3.11.1🌱 Seedling⭐25,285

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin

agentops agents ai ai-governance apache-spark evaluation langchain llm-evaluation pythonby mlflowPython

octocode 📁0.14.0🌿 Growing⭐319

Semantic code searcher and codebase utility

ai ai-tools cli cli-app code-search developer-tool developer-tools doc-search mcp-server rustby MuvonRust

AutoRAG 📁v0.3.22🌱 Seedling⭐4,693

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation pythonby Marker-Inc-KoreaPython

ai-gateway 📁v1.0.4🌿 Growing⭐59

One API for 25+ LLMs, OpenAI, Anthropic, Bedrock, Azure. Caching, guardrails & cost controls. Go-native LiteLLM & Kong AI Gateway alternative.

ai-gateway ai-infrastructure anthropic bedrock deepinfra gateway go guardrails kongby ferro-labsGo

awesome-ai-research-writing 📁main@2026-04-21🌱 Seedling⭐9

📝 Enhance your academic writing with tailored AI prompt templates and practical agent skills to boost efficiency and reduce repetitive tasks.

anthropic generative-ai generative-ui language-model llm lua machine-learning mcpby jeremiantonius

clawtrace 📁main@2026-04-16🌱 Seedling⭐10

Make your OpenClaw agents better, cheaper, and faster.

agent-observability agent-telemetry ai-agent ai-agent-observability ai-evaluation ai-observability automomous-agents claude-harness typescriptby epsilla-cloudTypeScript

memora 📁v0.2.27🌱 Seedling⭐386

Give your AI agents persistent memory.

ai-agent claude knowledge-graph llms mcp mcp-server memory python ragby agentic-boxPython

GuardianWAF 📁v0.1.0🌱 Seedling⭐18

Zero-dependency Web Application Firewall in Go. Single binary. Three deployment modes. Tokenizer-based detection.

bot-detection firewall go golang mcp middleware rate-limiting reverse-proxyby GuardianWAFGo

codexlens-search 📁v0.8.0🌱 Seedling⭐44

Lightweight semantic code search engine — 2-stage vector + FTS + RRF fusion + MCP server for Claude Code

pythonby catlog22Python

any-agent 📁1.18.0🌱 Seedling⭐1,141

A single interface to use and evaluate different agent frameworks

a2a agent-evaluation agents ai mcp pythonby mozilla-aiPython

ragas 📁v0.4.3🌱 Seedling⭐13,329

Supercharge Your LLM Application Evaluations 🚀

evaluation llm llmops pythonby explodinggradientsPython

smart-coding-mcp 📁main@2026-04-21🌱 Seedling⭐2

🔍 Enhance code search accuracy with Smart Coding MCP, an AI-driven server that uses intelligent embeddings for quick, relevant results.

agentic-ai ai-agent ai-development anthropic claude claude-code cluade code html vector-databaseby MoxnyyyHTML

ryvos 📁v0.9.0🌱 Seedling⭐2

Open-source autonomous AI assistant with 5-tier security, 62 tools, 14 LLM providers. Written in Rust. Single binary.

ai ai-agent ai-assistant assistant autonomous autonomous-agent chatbot cli rustby RyvosRust

CodeRAG 📁main@2026-04-21🌱 Seedling⭐1

Build semantic vector databases from code and docs to enable AI agents to understand and navigate your entire codebase effectively.

ai ai-tools code-analysis embeddings execution-based-evaluation game-development game-programming game-source rag typescriptby Eyram233TypeScript

harness 📁master@2026-04-21🌱 Seedling⭐1

Define and control AI agents in markdown with full prompt transparency, persistent memory, and integrated tools via the Claude Agent SDK.

ai claude claude-code claude-skills code-repository evaluation-framework gemini git llm-agent typescriptby heba-ramdanTypeScript

LettuceDetect 📁0.1.8💤 Dormant⭐545

Lightweight hallucination detection framework for RAG applications

bert hallucination-detection hallucination-evaluation information-extraction nlp python pytorch token-classificationby KRLabsOrgPython

RagaAI-Catalyst 📁v2.2.4💤 Dormant⭐16,130

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced anal

agentic-ai agentic-ai-development agentneo agents ai-agent-monitoring ai-application-debugging ai-evaluation-tools ai-performance-optimization pythonby raga-ai-hubPython

Dota2AIFramework 📁0.0.0⚰️ Archived⭐75

General Framework for Dota 2 AI Competitions

ai-competitions ai-framework custom-game dota-2 luaby ModDotaLua