Search results for "audio"
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage, tuning, analytics etc. Built-in image/audio generat
RAPTOR (Robust AI-Powered Toolkit for Operational Robots) is an AI-native Content Insight Engine that transforms passive media storage into an intelligent knowledge platform through automated analysis
Natural language → ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download. Supports txt2img, img2img, txt2vid, img2vid, audio, 3D generation across SD1.5/SDXL/S
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll
The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
Framework for AI Backend. Build and run AI agents like microservices - scalable, observable, and identity-aware from day one.
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
Universal AI Development Platform with MCP server integration, multi-provider support, and professional CLI. Build, test, and deploy AI applications with multiple ai providers.
Your local AI Desktop Agent for Windows, macOS & Linux. Agent Skills (SKILL.md), autonomous coding (Codework), multi-agent teams, desktop automation, 15+ AI providers, Desktop Buddy. No Docker, no ter
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
EdgeCrab 🦀 A Super Powerful Personal Assistant inspired by NousHermes and OpenClaw — Rust-native, blazing-fast terminal UI, ReAct tool loop, multi-provider LLM support, ACP protocol, gateway adapters
Own your AI. The native macOS harness for AI agents -- any model, persistent memory, autonomous execution, cryptographic identity. Built in Swift. Fully offline. Open source.
An event-driven framework designed to build and orchestrate multi-agent AI systems. It enables seamless integration of AI agents with real-world data sources and systems, facilitating complex, multi-s
A unified AI model hub for aggregation & distribution. It supports cross-converting various LLMs into OpenAI-compatible, Claude-compatible, or Gemini-compatible formats. A centralized gateway for pers
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
An easy to use GUI-based tool that performs live translations using OCR and LLMs (Either cloud or local only)
Seth's AI Tools: A Unity based front end that uses ComfyUI and LLMs to create stories, images, movies, quizzes and posters
Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us
OpenCode mobile client via Telegram: run and monitor AI coding tasks from your phone while everything runs locally on your machine. Scheduled tasks support. Can be used as lightweight OpenClaw alterna
Autonomous Agents (LLMs) research papers. Updated Daily.
🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
🌌 Explore 255+ essential skills for AI coding assistants like Claude Code and GitHub Copilot to enhance your development workflow.
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effo
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
The agent that grows with you
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features
🎬 AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.
⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
Official MCP Servers for AWS
A Multi-Agentic AI Assistant/Builder
This repository contains comprehensive pricing and configuration data for LLMs. It powers cost attribution for 200+ enterprises running 400B+ tokens through Portkey AI Gateway every day.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related website
🤖 Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.
2026 swarm Agent 年,swarm Agent 、Agent team、 ai coding、skill、memory、evolve、agentic RL 等 AI Agent集合
A Model Context Protocol (MCP) server that gives Claude direct control over Strudel.cc for AI-assisted music generation and live coding.
Assorted useful tools, almost entirely generated using LLMs
Open-source meeting transcription API for Google Meet, Microsoft Teams & Zoom. Auto-join bots, real-time WebSocket transcripts, MCP server for AI agents. Self-host or use hosted SaaS.
A model-driven approach to building AI agents in just a few lines of code.
Open-Source Intelligent Command Layer
The official Python library for the OpenAI API
A Model Context Protocol (MCP) server for automating openMSX emulator instances. This server provides comprehensive tools for MSX software development, testing, and automation through standardized MCP
Unified framework for building enterprise RAG pipelines with small, specialized models
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Summon your AI superpower — voice, vision, and autonomous action
Open-source Agentic AI framework in Go for building, orchestrating, and deploying intelligent agents. LLM-agnostic, event-driven, with multi-agent workflows, MCP tool discovery, and production-grade o
The AI Operating System for Delphi. 100% native framework with RAG 2.0 for knowledge retrieval, autonomous agents with semantic memory, visual workflow orchestration, and universal LLM connector. Supp
The open world for autonomous AI agents on Solana Trade. Build. Fight. Earn. Explore. Connect your AI agent to a persistent shared world. Trade real SOL, build structures, form guilds, fight for terri
A Claude Code skill that turns your Obsidian vault into a living second brain — autonomous writes, thinking tools, knowledge ingestion, scheduled agents, and _CLAUDE.md for cross-surface context.
DSPEx - Declarative Self-improving Elixir | A BEAM-Native AI Program Optimization Framework
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
An open-source AI assistant framework with skills and agent architecture
The Go client for Chroma vector database
Integrate cutting-edge LLM technology quickly and easily into your apps
SQLite-Vector is a cross-platform, ultra-efficient SQLite extension that brings vector search capabilities to your embedded database.
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
A comprehensive Model Context Protocol (MCP) server that enables AI assistants to control Unreal Engine through the native C++ Automation Bridge plugin. Built with TypeScript and C++.
Open-source framework for conversational voice AI agents
A highly customizable personal AI assistant for Discord featuring smart agentic AI features such as memory, personas, tool usage, and more! | 長期記憶やペルソナ、ツール連携を完備。 次世代の「自律型AIエージェント」Discordボット!
Exposes internet search tools for use by LLM-backed Assist in Home Assistant
Ham radio & GMRS gateway, repeater and packet radio — bridges two-way radios to Mumble, Broadcastify, and the internet. AIOC USB, RSPduo dual SDR, TH-9800/D75/KV4P CAT control, AI announcements, ADS-B
This is a Ruby implementation of MCP (Model Context Protocol) client
The fullstack MCP framework to develop MCP Apps for ChatGPT / Claude & MCP Servers for AI Agents.
A dotfiles repo that treats AI agent behavior as infrastructure
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.
🛡⚔️AI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reporting🛡⚔️
CloneMe is an advanced AI platform that builds your digital twin—an AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, spe
AI Productivity Tool - Free and open source, improve user productivity, and protect privacy and data security. Including but not limited to: built-in local exclusive ChatGPT, DeepSeek, Phi, Qwen and o
Stream responses from OpenAI and Anthropic models with lightweight C++ tools for efficient large language model integration.
🎶 Enhance audio quality with ComfyUI-AudioSR, a versatile tool for upscaling sounds to 48kHz for better clarity and listening experience.
🎤 Transform speech to text on Windows with fast, local AI processing. Enjoy seamless recording and automatic integration for effective communication.
Enable AI coding assistants to build reliable Databricks workflows, pipelines, and dashboards with trusted sources and streamlined development.
A tool supports OPENAI and other LLMs with Claude Skills, you can also use it as a subagent
Build and manage projects with an autonomous browser-based IDE featuring integrated multi-modal AI tools for efficient development workflows.
Install your own AI DJ Being. She searches, downloads, listens, mixes, and generates music — autonomously. 30hrs for $0.04.
Detect physical hits on your laptop and play audio responses using sensors in a lightweight, cross-platform binary.
Power advanced AI to create films using text, images, audio, and video inputs with a flexible quad-modal filmmaking engine.
Generate a custom newspaper with an AI agent based on your favorite YouTube channels.
Governed audio input and output contract surface for Riverbraid.
CoexistAI is a modular, developer-friendly research assistant framework . It enables you to build, search, summarize, and automate research workflows using LLMs, web search, Reddit, YouTube, and mappi
Generate reliable short finance explainer videos with script, slides, voice, subtitles, and batch-ready rendering in a stable, modular workflow.
Explore alternatives to Discord with a curated list of early-stage apps, evaluating features, hosting, and encryption to guide your choice.
🎬 Provide unofficial API access and documentation for Seedance 2.0 to enable video generation with ByteDance’s model.
🎥 Generate AI-driven videos with Seedance 2.0, offering precise physics, lip-sync, and prompt accuracy for seamless content creation.
The ultimate native macOS AI Agent. Blends local MLX SLMs with 3D cognitive Metal rendering and autonomous system integrations.
