Search results for "speech"
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Faster Whisper transcription with CTranslate2
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Oll
Open Source AI Platform - AI Chat with advanced features that works with every LLM
Natural (2-way) voice conversations with Claude Code
The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.
423 plugins, 2,849 skills, 177 agents for Claude Code. Open-source marketplace at tonsofskills.com with the ccpi CLI package manager.
Your AI assistant that never forgets and runs 100% privately on your computer. Leave it on 24/7 - it learns your preferences, helps with code, manages your health goals, searches the web, and connects
Build and run agents you can see, understand and trust.
ARIS βοΈ (Auto-Research-In-Sleep) β Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in β works wi
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi
Agent Zero AI framework
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context, group spaces for shared insight. Classify docs, reuse prompts, and extend with modular features
vMLX - Home of JANG_Q - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
Open-source framework for conversational voice AI agents
π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflows
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, spe
π¬ AI-powered YouTube Shorts automation tool using LLMs, real-time search, and text-to-speech. Create engaging short-form videos with automated research, voiceovers, and subtitles.
OpenClaw reimagined in pure Python β autonomous AI agent with memory, RAG, skills, web dashboard, voice input, daemon, and multi-channel support.
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
One API for 20+ LLM providers, your databases, and your files β self-hosted, open-source AI gateway with RAG, voice, and guardrails.
RAPTOR (Robust AI-Powered Toolkit for Operational Robots) is an AI-native Content Insight Engine that transforms passive media storage into an intelligent knowledge platform through automated analysis
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
Open-Source Intelligent Command Layer
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. π Official updates only via twitter @Martin993
Agentic AI assistant on Telegram, powered by Claude Code. Runs locally with shell access, spec-driven PR reviews, layered security, persistent memory, and scheduled jobs. Your machine, your data, your
Learn it. Build it. Ship it for others.
A flexible multi-interface AI agent framework for building agents with reasoning, tool use, memory, deep research, blockchain interaction, MCP, and agents-as-a-service.
π€ The most comprehensive directory of AI agent frameworks, platforms, tools, and resources - hundreds of curated entries covering open-source, no-code, enterprise, and autonomous solutions. NEW Boil
A Multi-Agentic AI Assistant/Builder
Ham radio & GMRS gateway, repeater and packet radio β bridges two-way radios to Mumble, Broadcastify, and the internet. AIOC USB, RSPduo dual SDR, TH-9800/D75/KV4P CAT control, AI announcements, ADS-B
Open-Sable is a local-first autonomous agent framework with AGI-inspired cognitive subsystems (goals, memory, metacognition, tool use). It can run continuously on your machine, integrate with chat int
Personal OS agent that learns who you are, detects life patterns, and grows smarter about you every day. Memory + Cron + Atropos RL
The API layer for AI agents. Dashboard + 22K APIs + 18 Direct Call providers. MCP native.
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features β fully self-hosted via Ollama.
Automatically Update LLM-Agent Papers Daily using Github Actions (Update Every 12th hours)
CloneMe is an advanced AI platform that builds your digital twinβan AI that chats like you, remembers details, and supports multiple platforms. Customizable, memory-driven, and hot-reloadable, it's th
Second Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented generation (RAG), multimodal AI models, and a hybrid lexical/semantic search algorithm to intera
The localhost AI Agent Runtime -- Chat UI, Tools, RAG, and MCP in one pip install
llm-based robot that intervenes only when needed
Provide open-source AI bots for Lark to automate tasks like brainstorming, project planning, content creation, and monitoring within a secure chat interface.
π€ Transform speech to text on Windows with fast, local AI processing. Enjoy seamless recording and automatic integration for effective communication.
Autonomous, multilingual AI voice agent using ElevenLabs, LangGraph, and RAG for government services
π₯ Generate AI-driven videos with Seedance 2.0, offering precise physics, lip-sync, and prompt accuracy for seamless content creation.
Builds an autonomous AI robot with vision, voice, and decision-making capabilities using Python, PyTorch, and CUDA technology.
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
