Search results for "lua"
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
Stop prompting. Start specifying.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
Benchmarking the gap between AI agent hype and architecture. Three agent archetypes, 73-point performance spread, stress testing, network resilience, and ensemble coordination analysis with statistica
ReLE่ฏๆต๏ผไธญๆAIๅคงๆจกๅ่ฝๅ่ฏๆต๏ผๆ็ปญๆดๆฐ๏ผ๏ผ็ฎๅๅทฒๅๆฌ359ไธชๅคงๆจกๅ๏ผ่ฆ็chatgptใgpt-5.2ใo4-miniใ่ฐทๆญgemini-3-proใClaude-4.6ใๆๅฟERNIE-X1.1ใERNIE-5.0ใqwen3-maxใqwen3.5-plusใ็พๅทใ่ฎฏ้ฃๆ็ซใๅๆฑคsenseChat็ญๅ็จๆจกๅ๏ผ ไปฅๅstep3.5-flashใkimi-k2.5ใernie4.5ใMin
Make AI work for Everyone - Monitoring and governing for your AI/ML
๐ชข Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. ๐YC W23
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.
A comprehensive evaluation framework for AI agents and LLM applications.
The platform for LLM evaluations and AI agent testing
A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.
OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
โฅ AI Coding agent for the terminal โ hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
A tool-use-focused LLM plugin for neovim.
The LLM Evaluation Framework
AI-first security scanner with 76 analyzers, 9,600+ detection rules, and repo poisoning detection for AI/ML, LLM agents, and MCP servers. Scan any GitHub repo with: medusa scan --git user/repo
MCP server for token-efficient large document analysis via the use of REPL state
๐ข Open-Source Evaluation & Testing library for LLM Agents
Evaluation and Tracking for LLM Experiments and AI Agents
Pragmatic AI Labs MCP Agent Toolkit - An MCP Server designed to make code with agents more deterministic
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin
Semantic code searcher and codebase utility
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
One API for 25+ LLMs, OpenAI, Anthropic, Bedrock, Azure. Caching, guardrails & cost controls. Go-native LiteLLM & Kong AI Gateway alternative.
๐ Enhance your academic writing with tailored AI prompt templates and practical agent skills to boost efficiency and reduce repetitive tasks.
Make your OpenClaw agents better, cheaper, and faster.
Give your AI agents persistent memory.
Zero-dependency Web Application Firewall in Go. Single binary. Three deployment modes. Tokenizer-based detection.
Lightweight semantic code search engine โ 2-stage vector + FTS + RRF fusion + MCP server for Claude Code
A single interface to use and evaluate different agent frameworks
Supercharge Your LLM Application Evaluations ๐
๐ Enhance code search accuracy with Smart Coding MCP, an AI-driven server that uses intelligent embeddings for quick, relevant results.
Open-source autonomous AI assistant with 5-tier security, 62 tools, 14 LLM providers. Written in Rust. Single binary.
Build semantic vector databases from code and docs to enable AI agents to understand and navigate your entire codebase effectively.
Define and control AI agents in markdown with full prompt transparency, persistent memory, and integrated tools via the Claude Agent SDK.
Lightweight hallucination detection framework for RAG applications
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced anal
General Framework for Dota 2 AI Competitions
