Skip to content

Architecture Overview

This page provides visual diagrams of Skippy's infrastructure and data flow.


System Architecture

flowchart TB
    subgraph User["👤 User"]
        KB[Keyboard/Mouse]
        MIC[🎤 Microphone]
        SCREEN[🖥️ Screen]
        SPEAKER[🔊 Speaker]
    end

    subgraph Skippy["🍺 Skippy Desktop App"]
        UI[PyQt6 GUI]
        AVATAR[Animated Avatar]
        CHAT[Chat Interface]
        SIGNALS[Qt Signals<br/>Thread Safety]
        PIPELINE[Skippy Pipeline<br/>Local-First]
        AUDIO[Unified Audio Manager<br/>Streaming TTS]
    end

    subgraph LocalAI["🏠 Local AI (Ollama)"]
        MISTRAL[Mistral 7B<br/>Conversation]
        PHI[Phi-3 Mini<br/>Quick Acks]
    end

    subgraph CloudAI["☁️ Claude AI (Background)"]
        SONNET[Claude Sonnet<br/>Heavy Lifting]
        OPUS[Claude Opus<br/>Complex Tasks]
    end

    subgraph External["📱 External"]
        WA[WhatsApp]
        OPENCLAW[OpenClaw Gateway]
    end

    KB --> UI
    MIC -->|Whisper STT| PIPELINE
    UI --> CHAT
    CHAT --> SIGNALS
    SIGNALS --> PIPELINE

    PIPELINE -->|Instant| MISTRAL
    PIPELINE -->|Quick| PHI
    PIPELINE -->|Background| SONNET
    PIPELINE -->|Escalate| OPUS

    MISTRAL --> SIGNALS
    PHI --> SIGNALS
    SONNET --> SIGNALS
    OPUS --> SIGNALS

    SIGNALS --> AUDIO
    AUDIO -->|Streaming TTS| SPEAKER
    SIGNALS --> AVATAR
    AVATAR --> SCREEN

    OPENCLAW <--> WA
    PIPELINE <--> OPENCLAW

Local-First Pipeline

Skippy uses a local-first architecture for instant responses with optional cloud enhancement:

flowchart TD
    INPUT[📝 User Input] --> DETECT[Detect Message Type]

    DETECT --> STARTER[🎯 Instant Starter<br/>Template or Phi-3]
    STARTER --> TTS1[🔊 Play Immediately]

    DETECT --> LOCAL[🏠 Local Generation<br/>Mistral 7B]
    LOCAL --> STREAM[📡 Stream Chunks]
    STREAM --> TTS2[🔊 Streaming TTS]

    DETECT --> CHECK{Needs Claude?}
    CHECK -->|Code/Complex| CLAUDE[☁️ Claude Background]
    CHECK -->|Simple| SKIP[Skip Cloud]

    CLAUDE --> FOLLOWUP[📤 Follow-up Response]
    FOLLOWUP --> TTS3[🔊 Speak Follow-up]

    subgraph "Parallel Processing"
        STARTER
        LOCAL
        CLAUDE
    end

    style STARTER fill:#4ade80
    style LOCAL fill:#60a5fa
    style CLAUDE fill:#a78bfa

Pipeline Benefits

Feature Benefit
Instant starters User hears response in <100ms
Local generation No network latency for conversation
Streaming TTS Audio plays while text generates
Background Claude Heavy lifting without blocking UI
Context awareness Remembers last 5 exchanges

Streaming TTS Architecture

The unified audio system enables true streaming text-to-speech:

flowchart LR
    subgraph Pipeline["Pipeline Thread"]
        GEN[Token Generation]
        CHUNK[Sentence Chunking]
        EMIT[Emit Chunk]
    end

    subgraph Qt["Main Thread (Qt)"]
        SIGNAL[Qt Signal]
        QUEUE[Queue Chunk]
    end

    subgraph Audio["Audio System"]
        WORKER[Chunk Worker]
        TTS[Edge TTS Generator]
        PLAYER[Audio Player]
    end

    GEN --> CHUNK
    CHUNK -->|Sentence complete| EMIT
    EMIT -->|audio_chunk signal| SIGNAL
    SIGNAL --> QUEUE
    QUEUE --> WORKER
    WORKER --> TTS
    TTS -->|MP3 → PCM| PLAYER
    PLAYER -->|sounddevice| SPEAKER[🔊]

    style SIGNAL fill:#f59e0b

Audio Components

Component Purpose
UnifiedAudioManager Coordinates all audio
AudioPlayer Single-threaded playback via sounddevice
TTSGenerator Persistent async Edge TTS loop
SoundEffectCache Pre-generated effects in memory
AudioQueue Priority queue (effects before speech)

Thread Safety Model

All pipeline callbacks use Qt signals for thread-safe UI updates:

sequenceDiagram
    participant PT as Pipeline Thread
    participant SIG as Qt Signals
    participant MT as Main Thread
    participant UI as UI Components
    participant AU as Audio System

    PT->>SIG: pipeline_starter.emit(text)
    SIG->>MT: _handle_pipeline_starter()
    MT->>AU: speak(text)

    PT->>SIG: audio_chunk.emit(chunk)
    SIG->>MT: _on_audio_chunk()
    MT->>AU: queue_chunk(chunk)

    PT->>SIG: pipeline_complete.emit(response)
    SIG->>MT: _handle_pipeline_complete()
    MT->>UI: add_message()

Signal Types

Signal Purpose
pipeline_starter Instant acknowledgment to speak
audio_chunk Streaming text chunk for TTS
pipeline_complete Full response for display
pipeline_background Claude follow-up response
text_delta UI text update (non-audio)

Conversation Context

Skippy maintains conversation history for context-aware responses:

flowchart TB
    subgraph Context["ConversationContext"]
        HIST[Exchange History<br/>Last 5 turns]
        TOPIC[Current Topic<br/>code/help/feelings]
        MOOD[Skippy Mood<br/>snarky/helpful/annoyed]
    end

    USER[User Message] --> DETECT[Topic Detection]
    DETECT --> TOPIC
    USER --> MOOD_UP[Mood Update]
    MOOD_UP --> MOOD

    Context --> PROMPT[Build Context Prompt]
    PROMPT --> LOCAL[Local Model]

    LOCAL --> RESPONSE[Response]
    RESPONSE --> SAVE[Save Exchange]
    SAVE --> HIST

Context Format

## Recent Conversation
- Human said: "Hey Skippy, I have a bug"
- You replied: "SIGH. Another bug? Let me see what mess you made..."
- Human said: "It's in the login function"
- You replied: "Ooh, authentication bugs. Classic monkey mistake..."

[Current topic: code]
[Mood: Default snarky superiority]

Data Flow: Voice Chat

sequenceDiagram
    participant U as 👤 User
    participant M as 🎤 Microphone
    participant W as Whisper (Local)
    participant P as Pipeline
    participant L as Local LLM
    participant C as Claude (Background)
    participant A as Audio System
    participant SP as 🔊 Speaker

    U->>M: Speaks
    M->>W: Audio recording
    W->>P: Transcribed text

    P->>A: Instant starter
    A->>SP: "DING! Let me think..."

    P->>L: Generate response

    loop Streaming
        L->>P: Token chunk
        P->>A: TTS chunk
        A->>SP: Audio plays
    end

    P->>P: Check if needs Claude

    opt Complex query
        P->>C: Background request
        C->>P: Follow-up response
        P->>A: Speak follow-up
        A->>SP: Claude's response
    end

Component Stack

flowchart TB
    subgraph Presentation["🎨 Presentation Layer"]
        PYQT[PyQt6 GUI]
        AVATAR_W[Avatar Widget]
        CHAT_W[Chat Widget]
        MINI[Mini Mode]
    end

    subgraph Logic["⚙️ Business Logic"]
        PIPELINE_L[Skippy Pipeline]
        CONTEXT[Conversation Context]
        AUDIO_MGR[Audio Manager]
    end

    subgraph Integration["🔌 Integration Layer"]
        OLLAMA[Ollama Client]
        EDGE[Edge TTS]
        WHISPER[Local Whisper]
        SOUNDDEV[sounddevice]
    end

    subgraph External["🌐 External Services"]
        OLLAMA_SVC[Ollama Server]
        CLAUDE[Claude API]
        OPENCLAW_GW[OpenClaw Gateway]
    end

    Presentation --> Logic
    Logic --> Integration
    Integration --> External

File Structure

SkippyBuddy/
├── 📄 skippy.py              # Main application + Qt UI
├── 📄 skippy_pipeline.py     # Local-first async pipeline
├── 📄 audio_system.py        # Unified audio (TTS, effects, playback)
├── 📄 conversation_context.py # Context tracking for responses
├── 📄 avatar.py              # Animated avatar widget
├── 📄 sound_effects.py       # Effect generators (DING, WHIR, etc.)
├── 📄 gateway_client.py      # OpenClaw gateway integration
├── 📄 config.json            # User configuration
├── 📁 tools/
│   ├── 📄 skippy-tools.ps1   # PowerShell toolkit
│   ├── 📄 skippy-ahk.ahk     # AutoHotkey scripts
│   └── 📁 workflows/         # Automation workflows
├── 📁 docs/
│   └── 📁 docs/              # Documentation (this wiki)
└── 📁 temp/                  # Temporary audio files

Configuration

Pipeline Settings

In config.json:

{
  "use_pipeline": true,
  "pipeline_conversation_model": "mistral:7b-instruct",
  "pipeline_fast_model": "phi3:mini",
  "pipeline_enable_streaming_tts": true,
  "pipeline_enable_background_claude": true,
  "pipeline_max_tokens": 600,
  "pipeline_temperature": 0.8
}
Setting Description
use_pipeline Enable local-first pipeline
pipeline_conversation_model Main local model for responses
pipeline_fast_model Quick model for acknowledgments
pipeline_enable_streaming_tts Stream audio while generating
pipeline_enable_background_claude Use Claude for complex queries
pipeline_max_tokens Max response length

Audio Settings

{
  "voice_output_enabled": true,
  "tts_provider": "edge",
  "edge_voice": "en-GB-RyanNeural",
  "edge_rate": "+0%"
}

Hardware Requirements

Component Minimum Recommended Notes
GPU VRAM 4 GB 8+ GB For local LLM
System RAM 8 GB 16+ GB Ollama + Whisper
CPU 4 cores 8+ cores Async processing
Storage 10 GB 20 GB Models + cache

Service Dependencies

Service Purpose Required
Ollama Local LLM inference ✅ Yes
OpenClaw Gateway Claude API (background) Optional
Edge TTS Voice output Optional
Local Whisper Voice input Optional
WhatsApp (via OpenClaw) Mobile sync Optional

Updated: 2026-02-01