Architecture Overview¶
This page provides visual diagrams of Skippy's infrastructure and data flow.
System Architecture¶
flowchart TB
subgraph User["👤 User"]
KB[Keyboard/Mouse]
MIC[🎤 Microphone]
SCREEN[🖥️ Screen]
SPEAKER[🔊 Speaker]
end
subgraph Skippy["🍺 Skippy Desktop App"]
UI[PyQt6 GUI]
AVATAR[Animated Avatar]
CHAT[Chat Interface]
SIGNALS[Qt Signals<br/>Thread Safety]
PIPELINE[Skippy Pipeline<br/>Local-First]
AUDIO[Unified Audio Manager<br/>Streaming TTS]
end
subgraph LocalAI["🏠 Local AI (Ollama)"]
MISTRAL[Mistral 7B<br/>Conversation]
PHI[Phi-3 Mini<br/>Quick Acks]
end
subgraph CloudAI["☁️ Claude AI (Background)"]
SONNET[Claude Sonnet<br/>Heavy Lifting]
OPUS[Claude Opus<br/>Complex Tasks]
end
subgraph External["📱 External"]
WA[WhatsApp]
OPENCLAW[OpenClaw Gateway]
end
KB --> UI
MIC -->|Whisper STT| PIPELINE
UI --> CHAT
CHAT --> SIGNALS
SIGNALS --> PIPELINE
PIPELINE -->|Instant| MISTRAL
PIPELINE -->|Quick| PHI
PIPELINE -->|Background| SONNET
PIPELINE -->|Escalate| OPUS
MISTRAL --> SIGNALS
PHI --> SIGNALS
SONNET --> SIGNALS
OPUS --> SIGNALS
SIGNALS --> AUDIO
AUDIO -->|Streaming TTS| SPEAKER
SIGNALS --> AVATAR
AVATAR --> SCREEN
OPENCLAW <--> WA
PIPELINE <--> OPENCLAW
Local-First Pipeline¶
Skippy uses a local-first architecture for instant responses with optional cloud enhancement:
flowchart TD
INPUT[📝 User Input] --> DETECT[Detect Message Type]
DETECT --> STARTER[🎯 Instant Starter<br/>Template or Phi-3]
STARTER --> TTS1[🔊 Play Immediately]
DETECT --> LOCAL[🏠 Local Generation<br/>Mistral 7B]
LOCAL --> STREAM[📡 Stream Chunks]
STREAM --> TTS2[🔊 Streaming TTS]
DETECT --> CHECK{Needs Claude?}
CHECK -->|Code/Complex| CLAUDE[☁️ Claude Background]
CHECK -->|Simple| SKIP[Skip Cloud]
CLAUDE --> FOLLOWUP[📤 Follow-up Response]
FOLLOWUP --> TTS3[🔊 Speak Follow-up]
subgraph "Parallel Processing"
STARTER
LOCAL
CLAUDE
end
style STARTER fill:#4ade80
style LOCAL fill:#60a5fa
style CLAUDE fill:#a78bfa
Pipeline Benefits¶
| Feature | Benefit |
|---|---|
| Instant starters | User hears response in <100ms |
| Local generation | No network latency for conversation |
| Streaming TTS | Audio plays while text generates |
| Background Claude | Heavy lifting without blocking UI |
| Context awareness | Remembers last 5 exchanges |
Streaming TTS Architecture¶
The unified audio system enables true streaming text-to-speech:
flowchart LR
subgraph Pipeline["Pipeline Thread"]
GEN[Token Generation]
CHUNK[Sentence Chunking]
EMIT[Emit Chunk]
end
subgraph Qt["Main Thread (Qt)"]
SIGNAL[Qt Signal]
QUEUE[Queue Chunk]
end
subgraph Audio["Audio System"]
WORKER[Chunk Worker]
TTS[Edge TTS Generator]
PLAYER[Audio Player]
end
GEN --> CHUNK
CHUNK -->|Sentence complete| EMIT
EMIT -->|audio_chunk signal| SIGNAL
SIGNAL --> QUEUE
QUEUE --> WORKER
WORKER --> TTS
TTS -->|MP3 → PCM| PLAYER
PLAYER -->|sounddevice| SPEAKER[🔊]
style SIGNAL fill:#f59e0b
Audio Components¶
| Component | Purpose |
|---|---|
UnifiedAudioManager |
Coordinates all audio |
AudioPlayer |
Single-threaded playback via sounddevice |
TTSGenerator |
Persistent async Edge TTS loop |
SoundEffectCache |
Pre-generated effects in memory |
AudioQueue |
Priority queue (effects before speech) |
Thread Safety Model¶
All pipeline callbacks use Qt signals for thread-safe UI updates:
sequenceDiagram
participant PT as Pipeline Thread
participant SIG as Qt Signals
participant MT as Main Thread
participant UI as UI Components
participant AU as Audio System
PT->>SIG: pipeline_starter.emit(text)
SIG->>MT: _handle_pipeline_starter()
MT->>AU: speak(text)
PT->>SIG: audio_chunk.emit(chunk)
SIG->>MT: _on_audio_chunk()
MT->>AU: queue_chunk(chunk)
PT->>SIG: pipeline_complete.emit(response)
SIG->>MT: _handle_pipeline_complete()
MT->>UI: add_message()
Signal Types¶
| Signal | Purpose |
|---|---|
pipeline_starter |
Instant acknowledgment to speak |
audio_chunk |
Streaming text chunk for TTS |
pipeline_complete |
Full response for display |
pipeline_background |
Claude follow-up response |
text_delta |
UI text update (non-audio) |
Conversation Context¶
Skippy maintains conversation history for context-aware responses:
flowchart TB
subgraph Context["ConversationContext"]
HIST[Exchange History<br/>Last 5 turns]
TOPIC[Current Topic<br/>code/help/feelings]
MOOD[Skippy Mood<br/>snarky/helpful/annoyed]
end
USER[User Message] --> DETECT[Topic Detection]
DETECT --> TOPIC
USER --> MOOD_UP[Mood Update]
MOOD_UP --> MOOD
Context --> PROMPT[Build Context Prompt]
PROMPT --> LOCAL[Local Model]
LOCAL --> RESPONSE[Response]
RESPONSE --> SAVE[Save Exchange]
SAVE --> HIST
Context Format¶
## Recent Conversation
- Human said: "Hey Skippy, I have a bug"
- You replied: "SIGH. Another bug? Let me see what mess you made..."
- Human said: "It's in the login function"
- You replied: "Ooh, authentication bugs. Classic monkey mistake..."
[Current topic: code]
[Mood: Default snarky superiority]
Data Flow: Voice Chat¶
sequenceDiagram
participant U as 👤 User
participant M as 🎤 Microphone
participant W as Whisper (Local)
participant P as Pipeline
participant L as Local LLM
participant C as Claude (Background)
participant A as Audio System
participant SP as 🔊 Speaker
U->>M: Speaks
M->>W: Audio recording
W->>P: Transcribed text
P->>A: Instant starter
A->>SP: "DING! Let me think..."
P->>L: Generate response
loop Streaming
L->>P: Token chunk
P->>A: TTS chunk
A->>SP: Audio plays
end
P->>P: Check if needs Claude
opt Complex query
P->>C: Background request
C->>P: Follow-up response
P->>A: Speak follow-up
A->>SP: Claude's response
end
Component Stack¶
flowchart TB
subgraph Presentation["🎨 Presentation Layer"]
PYQT[PyQt6 GUI]
AVATAR_W[Avatar Widget]
CHAT_W[Chat Widget]
MINI[Mini Mode]
end
subgraph Logic["⚙️ Business Logic"]
PIPELINE_L[Skippy Pipeline]
CONTEXT[Conversation Context]
AUDIO_MGR[Audio Manager]
end
subgraph Integration["🔌 Integration Layer"]
OLLAMA[Ollama Client]
EDGE[Edge TTS]
WHISPER[Local Whisper]
SOUNDDEV[sounddevice]
end
subgraph External["🌐 External Services"]
OLLAMA_SVC[Ollama Server]
CLAUDE[Claude API]
OPENCLAW_GW[OpenClaw Gateway]
end
Presentation --> Logic
Logic --> Integration
Integration --> External
File Structure¶
SkippyBuddy/
├── 📄 skippy.py # Main application + Qt UI
├── 📄 skippy_pipeline.py # Local-first async pipeline
├── 📄 audio_system.py # Unified audio (TTS, effects, playback)
├── 📄 conversation_context.py # Context tracking for responses
├── 📄 avatar.py # Animated avatar widget
├── 📄 sound_effects.py # Effect generators (DING, WHIR, etc.)
├── 📄 gateway_client.py # OpenClaw gateway integration
├── 📄 config.json # User configuration
├── 📁 tools/
│ ├── 📄 skippy-tools.ps1 # PowerShell toolkit
│ ├── 📄 skippy-ahk.ahk # AutoHotkey scripts
│ └── 📁 workflows/ # Automation workflows
├── 📁 docs/
│ └── 📁 docs/ # Documentation (this wiki)
└── 📁 temp/ # Temporary audio files
Configuration¶
Pipeline Settings¶
In config.json:
{
"use_pipeline": true,
"pipeline_conversation_model": "mistral:7b-instruct",
"pipeline_fast_model": "phi3:mini",
"pipeline_enable_streaming_tts": true,
"pipeline_enable_background_claude": true,
"pipeline_max_tokens": 600,
"pipeline_temperature": 0.8
}
| Setting | Description |
|---|---|
use_pipeline |
Enable local-first pipeline |
pipeline_conversation_model |
Main local model for responses |
pipeline_fast_model |
Quick model for acknowledgments |
pipeline_enable_streaming_tts |
Stream audio while generating |
pipeline_enable_background_claude |
Use Claude for complex queries |
pipeline_max_tokens |
Max response length |
Audio Settings¶
{
"voice_output_enabled": true,
"tts_provider": "edge",
"edge_voice": "en-GB-RyanNeural",
"edge_rate": "+0%"
}
Hardware Requirements¶
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU VRAM | 4 GB | 8+ GB | For local LLM |
| System RAM | 8 GB | 16+ GB | Ollama + Whisper |
| CPU | 4 cores | 8+ cores | Async processing |
| Storage | 10 GB | 20 GB | Models + cache |
Service Dependencies¶
| Service | Purpose | Required |
|---|---|---|
| Ollama | Local LLM inference | ✅ Yes |
| OpenClaw Gateway | Claude API (background) | Optional |
| Edge TTS | Voice output | Optional |
| Local Whisper | Voice input | Optional |
| WhatsApp (via OpenClaw) | Mobile sync | Optional |
Updated: 2026-02-01