Search results for "gguf"
A high-throughput and memory-efficient inference and serving engine for LLMs
Deploy any AI model, agent, database, RAG, and pipeline locally or remotely in minutes
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.
vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Unified framework for building enterprise RAG pipelines with small, specialized models
A coding agent optimized to smaller LLMs
A sovereign cognitive architecture with IIT 4.0 integrated information, residual-stream affective steering (CAA), Global Workspace Theory, active inference, and 72 consciousness modules β running loca
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
METAβAGENTIC Ξ±βAGI ποΈβ¨ β Mission π― Endβtoβend: Identify π β OutβLearn π β OutβThink π§ β OutβDesign π¨ β OutβStrategise βοΈ β OutβExecute β‘
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
MCP server for OpenAI's Deep Research APIs, Gemini Deep Research Agent, and Hugging Face's Open Deep Research
π LLM Context Benchmarks - A comprehensive benchmarking tool for testing LLMs with varying context sizes using Ollama. Features dual benchmark modes (API/CLI), automatic hardware detection (optimiz
Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine β zero data sent to external services.
A command-line interface tool for serving LLM using vLLM.
A local LLM-based autonomous agent orchestration platform featuring async background tasks, context-isolated sub-agents, dynamic knowledge injection, and strict security approval gates (Plan Mode).
Deploy a local, multi-user RAG system to query PDF and DOCX documents using a local LLM without cloud or API dependencies.
π€ Build intelligent, offline LLM agents with LangGraph and llama-cpp-python using this starter template for local, private tool-calling applications.
