Search results for "speech"
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Faster Whisper transcription with CTranslate2
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll
Open Source AI Platform - AI Chat with advanced features that works with every LLM
⚡️AI Cloud OS: Open-source enterprise-level AI knowledge base and MCP (model-context-protocol)/A2A (agent-to-agent) management platform with admin UI, user management and Single-Sign-On⚡️, supports Ch
Every meeting, every idea, every voice note — searchable by your AI. Open-source, privacy-first conversation memory layer.
The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effo
EdgeCrab 🦀 A Super Powerful Personal Assistant inspired by NousHermes and OpenClaw — Rust-native, blazing-fast terminal UI, ReAct tool loop, multi-provider LLM support, ACP protocol, gateway adapters
Natural (2-way) voice conversations with Claude Code
The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Build and run agents you can see, understand and trust.
Your local AI Desktop Agent for Windows, macOS & Linux. Agent Skills (SKILL.md), autonomous coding (Codework), multi-agent teams, desktop automation, 15+ AI providers, Desktop Buddy. No Docker, no ter
Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat.
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works wi
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
A self-hosted AI workspace with chat, code execution, parallel multi-agent orchestration, and a skill marketplace. Runs on macOS and Windows. Everything executes inside a secure Ubuntu sandbox — no Do
OpenCode mobile client via Telegram: run and monitor AI coding tasks from your phone while everything runs locally on your machine. Scheduled tasks support. Can be used as lightweight OpenClaw alterna
Markdown for the AI era
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more
Agent Zero AI framework
Operating System for your personal AI Agents with Security-first approach. Multi-channel (WhatsApp, Telegram, Discord, Slack, iMessage), multi-provider (Claude, GPT, Gemini, Ollama), fully self-hosted
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features
vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
565 AI-callable tools across 16 MCP servers. Full-pipeline AAA game asset production. Controls Blender, Substance Suite, Maya, Houdini, and Unreal Engine 5. 50 specialized AI agents. One prompt in, ga
An easy to use GUI-based tool that performs live translations using OCR and LLMs (Either cloud or local only)
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Open-source relational AI framework with identity persistence, memory, and MCP integration. Build relationship-aware AI agents that remember, grow, and maintain continuity. Built on Claude Agent SDK.
A tremendous feat of documentation, this guide covers Claude Code from beginner to power user, with production-ready templates for Claude Code features, guides on agentic workflows, and a lot of great
The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.
Open-source framework for conversational voice AI agents
Production-ready platform for agentic workflow development.
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, spe
Autonomous Agents (LLMs) research papers. Updated Daily.
🎬 AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.
The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.
OpenClaw reimagined in pure Python — autonomous AI agent with memory, RAG, skills, web dashboard, voice input, daemon, and multi-channel support.
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
We gave AI agents a brain. Memory, planning, continuity, and self-repair — the missing cognitive architecture layer. Runs on your Mac.
One API for 20+ LLM providers, your databases, and your files — self-hosted, open-source AI gateway with RAG, voice, and guardrails.
RAPTOR (Robust AI-Powered Toolkit for Operational Robots) is an AI-native Content Insight Engine that transforms passive media storage into an intelligent knowledge platform through automated analysis
Edge-Hardware (SoC/MCU) oriented Claw🦞
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related website
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
Assorted useful tools, almost entirely generated using LLMs
🔴 VERY LARGE AI TOOL LIST! 🔴 Curated list of AI Tools - Updated 2026
Open-Source Intelligent Command Layer
Open-source multi-tenant AI agent platform — 14 specialized agents, 195+ tools, 37+ AI models. Self-hosted. Fork and deploy your own AI operations team.
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin993
The AI Operating System for Delphi. 100% native framework with RAG 2.0 for knowledge retrieval, autonomous agents with semantic memory, visual workflow orchestration, and universal LLM connector. Supp
The open world for autonomous AI agents on Solana Trade. Build. Fight. Earn. Explore. Connect your AI agent to a persistent shared world. Trade real SOL, build structures, form guilds, fight for terri
Frontier self improving AI intern / coworker
Agentic AI assistant on Telegram, powered by Claude Code. Runs locally with shell access, spec-driven PR reviews, layered security, persistent memory, and scheduled jobs. Your machine, your data, your
A highly customizable personal AI assistant for Discord featuring smart agentic AI features such as memory, personas, tool usage, and more! | 長期記憶やペルソナ、ツール連携を完備。 次世代の「自律型AIエージェント」Discordボット!
🤖 Kubernetes for AI Agents. Self-hosted, production-grade runtime for orchestrating LLM swarms and autonomous agents. TypeScript-native.
Production-ready AI agent framework — semantic memory, multi-agent mesh, MCP server, intelligent routing, governance, and 67+ platform integrations.
AI Productivity Tool - Free and open source, improve user productivity, and protect privacy and data security. Including but not limited to: built-in local exclusive ChatGPT, DeepSeek, Phi, Qwen and o
Learn it. Build it. Ship it for others.
A flexible multi-interface AI agent framework for building agents with reasoning, tool use, memory, deep research, blockchain interaction, MCP, and agents-as-a-service.
Curated GPT-Image-2 prompts for the OpenAI API — portraits, posters, UI mockups, game screenshots, character sheets, and more. Ready-to-use prompts for gpt-image-2.
🤖 The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil
A Multi-Agentic AI Assistant/Builder
kbot — the AI agent that dreams, learns, and evolves. 764+ tools, 35 agents, 20 providers. Music production, iPhone control, financial analysis, cyber threat intel. Always-on daemon. Runs offline. npm
🎨 100+ selected GPT Image 1.5 prompts with images, multilingual support, and instant gallery preview. Open-source prompt engineering library
Ham radio & GMRS gateway, repeater and packet radio — bridges two-way radios to Mumble, Broadcastify, and the internet. AIOC USB, RSPduo dual SDR, TH-9800/D75/KV4P CAT control, AI announcements, ADS-B
CoexistAI is a modular, developer-friendly research assistant framework . It enables you to build, search, summarize, and automate research workflows using LLMs, web search, Reddit, YouTube, and mappi
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Personal OS agent that learns who you are, detects life patterns, and grows smarter about you every day. Memory + Cron + Atropos RL
The API layer for AI agents. Dashboard + 22K APIs + 18 Direct Call providers. MCP native.
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features — fully self-hosted via Ollama.
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
Ben — an autonomous digital entity that lives on Crustocean
CloneMe is an advanced AI platform that builds your digital twin—an AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
Open-source autonomous AI assistant with 5-tier security, 62 tools, 14 LLM providers. Written in Rust. Single binary.
Second Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented generation (RAG), multimodal AI models, and a hybrid lexical/semantic search algorithm to intera
This is a standalone MCP for Claude & Claude Code, which can be used alongside Obsidian MD. But I'm bad at naming things.
Autonomous AI Agent SDK for React Native & Expo — AI reads your live UI, acts via natural language, real-time voice agent (Gemini Live), and AI-powered testing via MCP (Model Context Protocol). One co
Turn natural language into executable code — right in your browser. Lightweight AI chat powered by GPT-4o with sandboxed JavaScript execution.
The localhost AI Agent Runtime -- Chat UI, Tools, RAG, and MCP in one pip install
Reusable AI agent chat input bar with mentions, speech, file upload, and workflow review
Autonomous local AI assistant in Go — 40+ tools, 20+ LLM providers, multi-agent orchestration, self-improving
llm-based robot that intervenes only when needed
Generate reliable short finance explainer videos with script, slides, voice, subtitles, and batch-ready rendering in a stable, modular workflow.
Provide open-source AI bots for Lark to automate tasks like brainstorming, project planning, content creation, and monitoring within a secure chat interface.
🎤 Transform speech to text on Windows with fast, local AI processing. Enjoy seamless recording and automatic integration for effective communication.
Generate a custom newspaper with an AI agent based on your favorite YouTube channels.
Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services
🎥 Generate AI-driven videos with Seedance 2.0, offering precise physics, lip-sync, and prompt accuracy for seamless content creation.
Builds an autonomous AI robot with vision, voice, and decision-making capabilities using Python, PyTorch, and CUDA technology.
Implement Recursive Language Models using Deno and Pyodide to enable scalable, code-driven prompt processing with modular sub-agent calls.
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
