Built for Qwen 9B & Gemma 4B on a gaming laptop. No cloud required.
Quick Start β’ Why Small Models β’ Interfaces β’ Tool Search β’ Tools β’ Skills β’ MCP β’ Telegram β’ Doctor
A personal AI agent designed to squeeze maximum capability out of small local models (4-9B parameters). Chat via terminal, browser, or Telegram β with tools, semantic memory, browser control, MCP integration, scheduled tasks, and a customizable personality.
Optimized for Qwen 3.5 9B and Gemma 4 E4B running on a single consumer GPU (4-8GB VRAM). Cloud providers supported as fallback, but the architecture, prompts, and tool system are built for the constraints of small models.
Philosophy: every token is expensive. Don't make the model smarter β make the system around it smarter. Tool search, compact prompts, retry loops, JSON repair, and self-checks compensate for what the model lacks.
| Cloud (GPT, Claude) | Local (Qwen 9B) | |
|---|---|---|
| Latency | 2-10s network + inference | 1-5s local inference |
| Privacy | Data leaves your machine | Everything stays local |
| Cost | $20-200/month | Free after GPU purchase |
| Offline | No | Works without internet |
| Customization | System prompt only | Full control over everything |
| Reliability | API outages, rate limits | Always available |
qwe-qwe makes the trade-off worth it by working with the model's limitations instead of fighting them.
- Python 3.11+
- LM Studio or Ollama with a loaded model
- Recommended models:
- Qwen 3.5 9B Q4_K_M (~5.5GB) β best for tool calling and agents
- Gemma 4 E4B-IT (~4GB) β fast, good for simple tasks
- Embeddings: FastEmbed (ONNX, local) β multilingual-MiniLM (384d, 50+ languages) + SPLADE++
Runs natively on Linux, macOS (Intel & Apple Silicon) and Windows 10/11 β single pip install -e . pulls every runtime dep (including MarkItDown, python-docx/pptx, openpyxl, pdfminer.six, pypdf, fastembed, qdrant-client, uvicorn).
curl -fsSL https://raw.githubusercontent.com/deepfounder-ai/qwe-qwe/main/install.sh | bashThis clones the repo, creates a venv, installs everything, verifies critical deps, pre-downloads the embedding model, and drops qwe-qwe on your $PATH.
git clone https://github.com/deepfounder-ai/qwe-qwe.git
cd qwe-qwe
setup.batOn Windows shell commands are routed through Git Bash (auto-detected at install time β install Git for Windows if missing). Falls back to cmd.exe if not found.
git clone https://github.com/deepfounder-ai/qwe-qwe.git
cd qwe-qwe
# Create venv
python3 -m venv .venv # or `python -m venv .venv` on Windows
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows PowerShell / cmd
# Install package + all runtime deps
pip install -e .
# Verify everything is wired
qwe-qwe --doctor# Linux / macOS
curl -fsSL https://raw.githubusercontent.com/deepfounder-ai/qwe-qwe/main/install.sh | bash
# Any platform, inside the checkout:
git pull && pip install -e . --upgradeThe update script is idempotent β re-running it detects an existing checkout and refreshes deps.
qwe-qwe # terminal chat
qwe-qwe --web # web UI at http://localhost:7860
qwe-qwe --doctor # check everything worksLM Studio / Ollama are auto-detected on localhost during setup. If your server is on another machine:
export QWE_LLM_URL=http://<your-ip>:1234/v1| Component | Minimum | Recommended |
|---|---|---|
| GPU | 4GB VRAM (4B Q4) | 8GB VRAM (9B Q4_K_M) |
| RAM | 8GB | 16GB |
| Storage | 10GB | 20GB (models + memory) |
Works on: gaming laptops, desktop GPUs (RTX 3060+), Mac M1+ (via Ollama).
+-- Qdrant (semantic memory, hybrid search)
CLI (terminal) <--+ +-- RAG (file indexing & search)
Web UI (browser) <--+-- Agent -+-- SQLite (history, threads, state)
Telegram bot <--/ Loop +-- Tools (8 core + tool_search)
| +-- Skills (7 built-in, user-creatable)
| +-- Browser (Playwright/Chromium)
| +-- MCP (external tool servers)
| +-- Scheduler (cron tasks)
| +-- Vault (encrypted secrets)
v
LLM (local or cloud)
7 providers supported
- Tool Search β only 8 core tools loaded by default (~750 tokens); model calls
tool_search("keyword")to activate more. Saves 75% tokens vs loading all 46 tools - Compact system prompt (~1200 tokens) β no redundant tool descriptions
- JSON repair engine β fixes malformed tool calls (trailing commas, unclosed brackets, single quotes)
- Anti-hedge nudge β if model talks instead of acting, it gets pushed to use tools
- Self-check validation β validates tool args before execution, with required-field checks
- Smart compaction β summarizes old messages when context fills up, saves to memory
- Stuck detection β warns model after 5+ tool errors per turn
- Experience learning β agent remembers past task outcomes and adapts strategies
- Shell via Git Bash β UNIX commands work on Windows, auto-detected
qwe-qwe --web # http://localhost:7860
qwe-qwe --web --ssl --port 7861 # HTTPS (needed for mic/camera)Premium single-file SPA β zero runtime JS dependencies (no React, no CDN build). Linear / Vercel / Anthropic-Console aesthetic with Geist + Instrument Serif + Geist Mono type stack.
Shell
- 56-px icon rail (left) β chat / memory / scheduler / presets / settings
- 264-px thread list with rename + delete inline actions
- Editorial chat canvas (centered, 780 px)
- Right-side Inspector: context-window gauge, INPUT / OUTPUT token cards, sparkbars (tokens-per-turn), recalled memories (
/api/knowledge/searchon last user prompt), active tools, latency bars - βK command palette + Gmail-style Alt+letter nav shortcuts
- Keyboard cheatsheet modal (
Shift+?)
Chat fidelity
- Streaming without flicker β in-place DOM patches, targeted updates, never full re-render during a turn
- Tool calls grouped by 11 categories (memory / knowledge / files / shell / browser / web / vision / voice / automation / skills / orchestration), each expandable for full JSON input + output
- Markdown rendering (H1βH6, bold / italic / strike, inline code, blockquote, lists, links)
- Code blocks with line-number gutter, filename + language label, copy button
- Thinking block as collapsible
<details>after the turn ends - Regenerate = clean restart β server deletes the last userβassistant turn so the model has no idea it's a regeneration
- Persistent attachments β images + files saved to message meta, survive server restart
Memory / Knowledge
- Drag-drop upload supporting 50+ formats (see Knowledge ingest)
- URL scraping via MarkItDown
- Folder scan β preview + batch index
- Interactive knowledge graph (force-directed SVG) with hover edge highlights + search filter
Mobile
- iPhone safe-area insets on all 4 sides
- Bottom tab bar replaces rail
- Slide-in drawer for thread list
- Composer textarea at 16 px (no iOS auto-zoom)
100dvhviewport, honors URL bar + home indicator
Settings β 17 tabs grouped into Agent / I/O / Automation / System (Model, Soul, Tools, Memory, Voice, Camera, Telegram, MCP, Heartbeat, Inference, Network, Privacy, Appearance, Advanced, Account). Advanced sub-tabs expose all 30+ EDITABLE_SETTINGS as forms. Abort button stops runaway turns; login modal handles password-protected installs.
qwe-qweRich-formatted terminal chat with 20+ slash commands: /soul, /skills, /memory, /model, /thread, /cron, /logs, /stats, /doctor and more.
Full mobile access β streaming responses, slash commands, topic-to-thread mapping, image support, formatted messages. Setup guide below.
qwe-qwe uses a meta-tool architecture to minimize token usage. Only 8 core tools are loaded by default:
| Core Tool | Purpose |
|---|---|
memory_search |
Search saved memories |
memory_save |
Save to long-term memory |
read_file |
Read file contents |
write_file |
Write/create files |
shell |
Run bash commands |
http_request |
HTTP requests to any API |
spawn_task |
Run tasks in background |
tool_search |
Discover & activate more tools |
When the model needs more capabilities, it calls tool_search("browser") or tool_search("notes") β which activates the relevant tools for that turn.
Keywords: browser, notes, schedule, secret, mcp, profile, rag, skill, soul, timer, model, cron
This saves ~3000 tokens per request compared to loading all 46 tools.
46 tools total across core + extensions + skills:
| Category | Tools | Loaded |
|---|---|---|
| Memory | memory_search, memory_save, memory_delete |
Core |
| Files & Shell | read_file, write_file, shell |
Core |
| HTTP | http_request |
Core |
| Tasks | spawn_task, schedule_task, list_cron, remove_cron |
Core + Search |
| Vault | secret_save, secret_get, secret_list, secret_delete |
Search |
| RAG | rag_index, rag_search, rag_status |
Search |
| Browser | browser_open, browser_snapshot, browser_screenshot, browser_click, browser_fill, browser_eval, browser_close |
Search |
| Notes | create_note, list_notes, read_note, edit_note, delete_note |
Search |
| Model | switch_model |
Search |
| Profile | user_profile_update, user_profile_get |
Search |
Pluggable skill system β built-in skills + create your own from chat:
| Skill | Description |
|---|---|
browser |
Web browsing via Playwright (open, read, click, screenshot) |
mcp_manager |
Manage MCP tool servers (add, remove, restart) |
skill_creator |
Create new skills from chat (multi-step LLM pipeline) |
soul_editor |
AI-assisted personality tuning |
notes |
Note management |
timer |
Countdown timers |
weather |
Weather reports via wttr.in |
You: create a skill for tracking my daily habits
Agent: Skill 'habit_tracker' generation started...
plan -> tools -> code -> validate -> Created and enabled! (3 tools, 45s)
Built-in browser control via Playwright + headless Chromium:
You: open google.com and search for "qwen 3.5 benchmarks"
Agent: [tool_search("browser")] -> [browser_open] -> [browser_snapshot]
Found results: ...
Tools: browser_open, browser_snapshot, browser_screenshot, browser_click, browser_fill, browser_eval, browser_close
Activated via tool_search("browser"). The agent can navigate pages, read content, fill forms, click buttons, and take screenshots.
Model Context Protocol β connect external tool servers to extend the agent's capabilities:
You: add MCP server for filesystem access
Agent: [tool_search("mcp")] -> [mcp_add_server] Added 'filesystem' (14 tools)
Supports stdio (subprocess) and HTTP transports. Configured via Settings > System > MCP Servers or through chat using the mcp_manager skill.
MCP tools appear as mcp__servername__toolname and are automatically available through tool_search.
Primary target is local models via LM Studio or Ollama. Cloud providers supported as fallback:
| Provider | Type | Notes |
|---|---|---|
| LM Studio | Local | Primary target. Auto-loads models |
| Ollama | Local | Standard Ollama API |
| OpenAI | Cloud | GPT-4o, GPT-4.1, etc. |
| OpenRouter | Cloud | Multi-model gateway |
| Groq | Cloud | Fast inference |
| Together | Cloud | Open-source models |
| DeepSeek | Cloud | DeepSeek models |
Switch on the fly via /model (CLI/Telegram) or Settings (Web UI). Auto-discovers available models.
The knowledge base ingests 50+ formats via Microsoft MarkItDown (primary) with stdlib fallbacks (pinned as hard deps β no silent degradation on fresh installs):
| Category | Formats |
|---|---|
| Documents | PDF Β· DOCX Β· PPTX Β· XLSX Β· EPUB Β· ODT Β· RTF Β· Jupyter notebooks (.ipynb) |
| Web | HTML Β· any https://β¦ URL (MarkItDown handles fetch + markdown conversion) |
| Data | JSON Β· CSV Β· TSV Β· YAML Β· TOML Β· XML Β· INI Β· ENV |
| Code | Python, JS/TS, Go, Rust, Java/Kotlin/Scala, C/C++, Ruby, PHP, SQL, GraphQL, 40+ extensions total |
| Markup | Markdown Β· reStructuredText Β· AsciiDoc Β· TeX |
| Images | PNG Β· JPG Β· WEBP β via vision pipeline |
- Drop or pick files β Memory tab upload-zone β batch upload + index
- Paste URL β
POST /api/knowledge/urlfetches, converts to markdown, indexes undersource:urltag - Scan folder β preview first (lists indexable files with size/method), then index all in one pass
Each source is stored under ~/.qwe-qwe/uploads/kb/<slug>_<name>, chunked into ~800-char pieces, embedded + dense-vector-indexed in Qdrant, and queued for the nightly synthesis job that extracts entities + wiki pages from the content.
Three-layer knowledge system in a single Qdrant collection:
Layer 1: RAW Layer 2: ENTITIES Layer 3: WIKI
(saved immediately) (night synthesis) (night synthesis)
"FastAPI uses -> [FastAPI] --uses--> "FastAPI is a modern
Pydantic for [Pydantic] Python framework that
validation..." [Python] uses Pydantic for
[Starlette] automatic validation..."
During the day (fast, no LLM cost):
- Agent saves facts and knowledge via
memory_save - Long texts (>1000 chars) auto-chunked into ~800 char pieces
- Each chunk tagged
synthesis_status=pending
At night (configurable cron, default 03:00):
- Synthesis worker processes pending queue
- LLM extracts entities + relations from chunks
- Creates entity nodes with typed relations (uses, built_on, part_of, etc.)
- Generates wiki summaries stored as searchable chunks
- Writes markdown to
~/.qwe-qwe/wiki/as human-readable backup
During search (enriched context):
- Wiki chunks found first (synthesized = higher quality embeddings)
- Entity relations expanded (follow links to related knowledge)
- Raw chunks provide specifics
- Result: synthesized + structured + raw knowledge in one query
- Hybrid search: FastEmbed dense (384d, 50+ languages) + SPLADE++ sparse, fused via RRF
- Auto-chunking: long texts split on sentence boundaries with overlap
- Knowledge graph: entities with typed relations, built automatically
- Wiki pages: synthesized markdown, searchable and human-readable
- Graph visualization: interactive force-directed graph in Web UI (Knowledge > Graph tab)
- Thread isolation: each conversation has its own memory context
- Smart compaction: old messages summarized and saved to memory when context fills
- Auto-context: wiki + entities + memories injected into each turn
- Experience learning: past task outcomes inform future strategies
- Modes: in-memory (testing), disk (default), or remote Qdrant server
Cron-like task scheduling with natural syntax:
"in 5m" -> run once in 5 minutes
"every 2h" -> repeat every 2 hours
"daily 09:00" -> every day at 09:00
"14:30" -> once today/tomorrow at 14:30
Results delivered to Telegram and Web UI. Simple reminders bypass LLM for instant delivery.
- Create a bot via @BotFather -> copy the token
- Set the token:
/telegram token <TOKEN>(CLI) or Settings -> Telegram (Web) - Start the bot:
/telegram start - Generate activation code:
/telegram activate - Send the 6-digit code to your bot in Telegram
- One-time 6-digit codes, expire in 10 minutes
- 3 wrong attempts -> permanent ban (by Telegram user ID)
- Only verified owner can chat with the bot
- Streaming responses via editMessageText
- Topic isolation: supergroup topics -> separate threads
- Formatted messages: MarkdownV2 with HTML fallback
- Image support: send images for vision analysis
- Cron results: scheduled task output delivered to chat
- 12 slash commands:
/status,/model,/soul,/skills,/memory,/threads,/stats,/cron,/thinking,/doctor,/clear,/help
8 adjustable traits (low / moderate / high):
| Trait | Low | High |
|---|---|---|
| humor | serious | jokes around |
| honesty | diplomatic | brutally honest |
| curiosity | answers questions | asks follow-ups |
| brevity | verbose | concise |
| formality | casual | formal |
| proactivity | waits for requests | suggests ideas |
| empathy | rational | empathetic |
| creativity | practical | unconventional |
Plus custom traits, agent name, and language selection. Edit via /soul (CLI), Settings (Web), or /soul (Telegram).
qwe-qwe --doctorChecks 20+ system components: Python, dependencies, SQLite, Qdrant, provider, LLM API, model loaded, embeddings, inference latency, agent loop v2, MCP servers, browser skill, Telegram, threads, skills, tools, cron/heartbeat, STT/TTS, files indexed, knowledge graph (entities/wiki), synthesis cron, BM25 index, disk space, logs.
Environment variables:
QWE_LLM_URL=http://localhost:1234/v1 # LLM server URL
QWE_LLM_MODEL=qwen/qwen3.5-9b # Model name
QWE_LLM_KEY=lm-studio # API key
QWE_DB_PATH=~/.qwe-qwe/qwe_qwe.db # Database path
QWE_DATA_DIR=~/.qwe-qwe # Where threads / memory / uploads live
QWE_QDRANT_MODE=disk # memory | disk | server
QWE_PASSWORD= # Web UI password (shows login modal if set)
QWE_STT_DEVICE=cpu # STT inference device (cpu | cuda)Everything else (30+ knobs β context_budget, rag_chunk_size, synthesis_time, tts_api_url, etc.) lives in Settings β Advanced β Settings and persists in SQLite.
All user data in ~/.qwe-qwe/ (configurable via QWE_DATA_DIR):
qwe_qwe.db SQLite β messages, threads, KV, settings
memory/ Qdrant vectors (disk mode)
wiki/ Synthesized markdown pages
skills/ User-created skills
uploads/ Images, documents, camera captures
kb/ Knowledge-base files awaiting / done indexing
workspace/ Default CWD for relative paths (switches per-preset)
presets/<id>/ Installed presets (each with own workspace/, knowledge/, skills/)
logs/ qwe-qwe.log (INFO+), errors.log (WARNING+)
docker compose upLM Studio / Ollama should be running on the host. Persistent data in ./data/.
cli.py Terminal interface + entry point
server.py FastAPI web server + WebSocket
agent.py Core agent loop + JSON repair + self-check
config.py Settings (env-configurable)
db.py SQLite storage (WAL mode)
memory.py Qdrant semantic memory (hybrid search)
rag.py RAG file indexing & search
tools.py Tool definitions + tool_search + execution
mcp_client.py Model Context Protocol client
providers.py Multi-provider LLM management
soul.py Personality system + prompt generation
tasks.py Background task runner
scheduler.py Cron-like scheduler
threads.py Thread management
telegram_bot.py Telegram bot integration
vault.py Encrypted secrets (Fernet)
logger.py Structured logging
skills/ Pluggable skill modules
browser.py Web browsing (Playwright)
mcp_manager.py MCP server management
skill_creator.py Skill generation pipeline
soul_editor.py Personality editing
notes.py Note management
timer.py Countdown timers
weather.py Weather reports
static/ Web UI (single-file HTML/CSS/JS)
Join our Telegram community: @qwe_qwe_ai
MIT
Built with care by DeepFounder

