Search results for "cuda"
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works wi
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.
Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)
The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.
"RAG-Anything: All-in-One RAG Framework"
High-Performance Engine for Multi-Vector Search
Dragon Brain — persistent long-term memory for AI agents via MCP (Model Context Protocol). Knowledge graph (FalkorDB) + vector search (Qdrant) + CUDA GPU embeddings. Works with Claude, Gemini CLI, Cur
A high-throughput and memory-efficient inference and serving engine for LLMs
🎬 AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.
A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.
A Multi-Agentic AI Assistant/Builder
[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Unified framework for building enterprise RAG pipelines with small, specialized models
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
MCP server for OpenAI's Deep Research APIs, Gemini Deep Research Agent, and Hugging Face's Open Deep Research
Lightweight semantic code search engine — 2-stage vector + FTS + RRF fusion + MCP server for Claude Code
Local First AI SEO Software on Nix, FastHTML & HTMX
⚡ Lightweight offline AI agent for local models. No cloud, no API keys — just your GPU.
A coding agent optimized to smaller LLMs
Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine — zero data sent to external services.
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features — fully self-hosted via Ollama.
A command-line interface tool for serving LLM using vLLM.
Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate — built to help studen
Builds an autonomous AI robot with vision, voice, and decision-making capabilities using Python, PyTorch, and CUDA technology.
A code generator for array-based code on CPUs and GPUs
FlashInfer: Kernel Library for LLM Serving
Faster Whisper transcription with CTranslate2
Fast inference engine for Transformer models
Embeddings, Retrieval, and Reranking
CUDA profiling tools runtime libs.
