freshcrate

Search results for "cuda"

Clear filters
40 results found (Python)
jarvis📁v1.28.0🌿 Growing174

Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works wi

cyllama📁0.2.11🌱 Seedling22

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

ContextPilot📁v0.4.1🌿 Growing79

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agentic AI.

Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) workshop, jointly held at (COLING 2022)

LRAT📁0.0.0🌱 Seedling34

The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.

RAG-Anything📁v1.2.10🏛️ Flagship16,761

"RAG-Anything: All-in-One RAG Framework"

fast-plaid📁1.4.5🌿 Growing245

High-Performance Engine for Multi-Vector Search

Dragon-Brain📁v1.1.0🌱 Seedling43

Dragon Brain — persistent long-term memory for AI agents via MCP (Model Context Protocol). Knowledge graph (FalkorDB) + vector search (Qdrant) + CUDA GPU embeddings. Works with Claude, Gemini CLI, Cur

vllm📁v0.19.1🌿 Growing76,155

A high-throughput and memory-efficient inference and serving engine for LLMs

VideoGraphAI📁0.0.0🌿 Growing54

🎬 AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.

PAI-RAG📁v0.4.3🌿 Growing455

An easy-to-use framework for modular RAG

arag📁v0.1.0🌿 Growing247

A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.

GTA📁v0.2.0🌿 Growing143

[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2

AReaL📁v1.0.3🌿 Growing5,017

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

llmware📁v0.4.6🌿 Growing14,857

Unified framework for building enterprise RAG pipelines with small, specialized models

rag-chatbot📁main@2026-04-14🌿 Growing402

RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.

deep-research-mcp📁main@2026-04-13🌿 Growing58

MCP server for OpenAI's Deep Research APIs, Gemini Deep Research Agent, and Hugging Face's Open Deep Research

codexlens-search📁v0.8.0🌱 Seedling44

Lightweight semantic code search engine — 2-stage vector + FTS + RRF fusion + MCP server for Claude Code

pipulate📁voice-synthesis-breakthrough🌱 Seedling11

Local First AI SEO Software on Nix, FastHTML & HTMX

qwe-qwe📁v0.17.6🌱 Seedling35

⚡ Lightweight offline AI agent for local models. No cloud, no API keys — just your GPU.

server-nexe📁v1.0.0-beta🌱 Seedling9

Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine — zero data sent to external services.

Open-Sable📁v1.7.0🌱 Seedling18

Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int

Somi📁Mineralization🌱 Seedling21

Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features — fully self-hosted via Ollama.

vllm-cli📁v0.2.5💤 Dormant491

A command-line interface tool for serving LLM using vLLM.

uniAI📁0.0.0🌱 Seedling1

Syllabus-aware RAG study assistant for university students. Answers strictly from your own notes & PDFs, unit-scoped retrieval, cross-encoder reranking, and a hallucination gate — built to help studen

enton📁main@2026-04-21🌱 Seedling1

Builds an autonomous AI robot with vision, voice, and decision-making capabilities using Python, PyTorch, and CUDA technology.

loopy📁v2025.2💤 Dormant630

A code generator for array-based code on CPUs and GPUs

clang-format📁22.1.4🌱 Seedling

Clang-Format is an LLVM-based code formatting tool

flashinfer-python📁0.6.8.post1🌱 Seedling

FlashInfer: Kernel Library for LLM Serving

pyannote-audio4.0.4🌱 Seedling

State-of-the-art speaker diarization toolkit

cupy-cuda12x📁14.0.1🌱 Seedling

CuPy: NumPy & SciPy for GPU

ctranslate2📁4.7.1🌱 Seedling

Fast inference engine for Transformer models

cuda-toolkit13.2.1🌱 Seedling

CUDA Toolkit meta-package

keras📁3.14.0🌱 Seedling

Multi-backend Keras