Search results for "audio"
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage, tuning, analytics etc. Built-in image/audio generat
RAPTOR (Robust AI-Powered Toolkit for Operational Robots) is an AI-native Content Insight Engine that transforms passive media storage into an intelligent knowledge platform through automated analysis
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll
Natural (2-way) voice conversations with Claude Code
ๅงๆฌๅ้ๆบ่ฝไฝ๏ผPenShot๏ผ๏ผๅงๆฌโๅ้โ็ๆฎตโprompt | ๅบไบ LangGraph+LLM๏ผ่ชๅจ่งฃๆไปปๆๆ ผๅผๅงๆฌ๏ผ็ๆ Sora/Veo/Runway ็ญๆจกๅๅฏ็จ็่ฟ่ดฏtext-to-videoๆ็คบ่ฏใไฟๆ่ง่ฒ/ๅงๆ ่ทจ็ๆฎตไธ่ด๏ผๆฏๆ MCP/REST API/ๅฝๆฐ่ฐ็จ | Pythonๅบ + A2A้ๆใ๏ผLLM-powered screenplay-to-video-prompt a
The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
An event-driven framework designed to build and orchestrate multi-agent AI systems. It enables seamless integration of AI agents with real-world data sources and systems, facilitating complex, multi-s
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.
An open-source AI assistant framework with skills and agent architecture
Open-source framework for conversational voice AI agents
Exposes internet search tools for use by LLM-backed Assist in Home Assistant
Cognithor - Agent OS: Local-first autonomous agent operating system. 16 LLM providers, 17 channels, 112+ MCP tools, 5-tier memory, A2A protocol, knowledge vault, voice, browser automation, Computer-us
๐ Explore 255+ essential skills for AI coding assistants like Claude Code and GitHub Copilot to enhance your development workflow.
๐กโ๏ธAI-Powered Penetration Testing Framework with automated vulnerability scanning, multi-agent system, and compliance reporting๐กโ๏ธ
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-l
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
The agent that grows with you
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features
๐ฌ AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.
Official MCP Servers for AWS
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, spe
A Multi-Agentic AI Assistant/Builder
Ambient intelligence that sees what you see, hears what you hear, and acts on your behalf
A model-driven approach to building AI agents in just a few lines of code.
Open-Source Intelligent Command Layer
The official Python library for the OpenAI API
Unified framework for building enterprise RAG pipelines with small, specialized models
A Claude Code skill that turns your Obsidian vault into a living second brain โ autonomous writes, thinking tools, knowledge ingestion, scheduled agents, and _CLAUDE.md for cross-surface context.
A desktop AI agent that controls your local machine โ runs commands, manages files, executes code, browses the web autonomously etc. Supports Claude, GPT, Gemini, Llama, DeepSeek, and more. .exe avail
Open-source multi-agent AI assistant powered by LangGraph, FastAPI & Next.js โ 16+ agents, Human-in-the-Loop, MCP integration, voice TTS, RAG, 500+ metrics, 6 languages.
A flexible multi-interface AI agent framework for building agents with reasoning, tool use, memory, deep research, blockchain interaction, MCP, and agents-as-a-service.
Claude Code skills, architectural principles, and alternative approaches for AI-assisted development
Search your files by talking to them - 100% offline
A coding agent optimized to smaller LLMs
๐๏ธ Hermes Gate โ Terminal TUI for managing remote Hermes Agent sessions with auto-reconnect, detach support, and zero config
Video editing MCP server for AI agents. 83 tools, 858 tests collected, 3 interfaces. Works with Claude Code, Cursor, and any MCP client. Local, fast, free.
The API layer for AI agents. Dashboard + 22K APIs + 18 Direct Call providers. MCP native.
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Ham radio & GMRS gateway, repeater and packet radio โ bridges two-way radios to Mumble, Broadcastify, and the internet. AIOC USB, RSPduo dual SDR, TH-9800/D75/KV4P CAT control, AI announcements, ADS-B
CloneMe is an advanced AI platform that builds your digital twinโan AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
๐ถ Enhance audio quality with ComfyUI-AudioSR, a versatile tool for upscaling sounds to 48kHz for better clarity and listening experience.
๐ค Transform speech to text on Windows with fast, local AI processing. Enjoy seamless recording and automatic integration for effective communication.
Second Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented generation (RAG), multimodal AI models, and a hybrid lexical/semantic search algorithm to intera
Install your own AI DJ Being. She searches, downloads, listens, mixes, and generates music โ autonomously. 30hrs for $0.04.
๐ฅ Generate AI-driven videos with Seedance 2.0, offering precise physics, lip-sync, and prompt accuracy for seamless content creation.
A tool to determine the content type of a file with deep learning
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Faster Whisper transcription with CTranslate2
Mistral-common is a library of common utilities for Mistral AI.
Python module for audio and music processing
PyTorch native Metrics
A package to repair broken json strings
Google Ai Generativelanguage API client library
Microsoft Azure Blob Storage Client Library for Python
Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
