Skip to content

Voice Output

Skippy can speak responses aloud using text-to-speech (TTS). Enable this for a more conversational experience.


Quick Start

  1. Click 🔇 button (top-right) to enable
  2. Button changes to 🔊 (green)
  3. Skippy now speaks responses aloud
  4. Click again to disable

TTS Providers

Microsoft's neural voices - natural sounding and free.

Feature Details
Quality Excellent, neural voices
Speed Fast
Cost Free
Requirements Internet connection

Available Voices:

Voice ID Description
en-GB-RyanNeural Ryan (UK Male) - Default
en-US-GuyNeural Guy (US Male)
en-US-JennyNeural Jenny (US Female)
en-US-AriaNeural Aria (US Female, Conversational)
en-US-DavisNeural Davis (US Male)
en-US-JaneNeural Jane (US Female)
en-GB-SoniaNeural Sonia (UK Female)
en-AU-WilliamNeural William (AU Male)
en-AU-NatashaNeural Natasha (AU Female)

pyttsx3 (Fallback)

Windows SAPI voices - works offline but less natural.

Feature Details
Quality Robotic, old-style
Speed Instant
Cost Free
Requirements None (Windows built-in)

Selecting a Voice

Via Tray Menu

  1. Right-click tray icon
  2. Go to 🎤 Voice submenu
  3. Click 🗣️ Select Voice
  4. Choose from available voices

Via Config

Edit config.json:

{
    "tts_provider": "edge",
    "edge_voice": "en-GB-RyanNeural",
    "edge_rate": "+0%"
}

Voice Controls

Toggle TTS

Method Action
🔇/🔊 Button Click to toggle
Tray Menu Voice → 🔊 Speak Responses

Visual Feedback

State Button Display
Off 🔇 (gray)
On 🔊 (green)
Speaking 🔈🔉🔊 (animated)

Speech Rate

Adjust how fast Skippy speaks:

Edge TTS Rate

{
    "edge_rate": "+20%"   // 20% faster
    // or
    "edge_rate": "-10%"   // 10% slower
    // or
    "edge_rate": "+0%"    // Normal speed
}

pyttsx3 Rate

{
    "tts_rate": 175   // Words per minute (default)
    // Typical range: 125-200
}

Text Processing

Before speaking, Skippy cleans the text:

What's Removed

  • Code blocks (code)
  • Inline code (code)
  • Markdown formatting (bold, italic)
  • Headers (# ## ###)
  • Links text
  • Bullet points
  • Excessive whitespace

Example

Original Response:

Here's how to **fix** the issue:

1. Run `pip install package`
2. Check the `config.json` file

```python
print("Hello")
**Spoken:**
> "Here's how to fix the issue. Run pip install package. Check the config.json file. code block"

---

## Audio Playback

### How It Works

1. Text split into sentence chunks (streaming)
2. Edge TTS generates MP3 for each chunk
3. MP3 decoded to PCM audio (via miniaudio/pydub)
4. `sounddevice` plays audio directly (no external player)
5. Chunks play sequentially while next chunk generates

### Streaming Architecture
Text Chunk → Edge TTS → MP3 → PCM → sounddevice → Speaker ↓ Next chunk generates in parallel
This allows audio to start playing while the response is still being generated.

### Audio Components

| Component | Purpose |
|-----------|---------|
| `UnifiedAudioManager` | Coordinates all audio |
| `AudioPlayer` | Single-threaded playback via sounddevice |
| `TTSGenerator` | Persistent async Edge TTS loop |
| `SoundEffectCache` | Pre-generated effects (DING!, WHIR!) |

---

## Configuration Reference

### config.json Settings

```json
{
    "voice_output_enabled": false,
    "tts_provider": "edge",
    "edge_voice": "en-US-GuyNeural",
    "edge_rate": "+0%",
    "tts_rate": 175,
    "tts_voice": null
}

Setting Description Default
voice_output_enabled Enable TTS false
tts_provider "edge" or "pyttsx3" "edge"
edge_voice Edge TTS voice ID "en-US-GuyNeural"
edge_rate Speed adjustment "+0%"
tts_rate pyttsx3 words/min 175
tts_voice pyttsx3 voice name null (system default)

Troubleshooting

No Sound

  1. Check volume - System volume and app volume
  2. Check speaker - Ensure output device is correct
  3. Check TTS enabled - Button should show 🔊

Garbled/Distorted Audio

  1. Try a different voice
  2. Update audio drivers
  3. Check for conflicting audio apps

Edge TTS Fails

Error: Network-related issues

Fix:

  • Check internet connection
  • Edge TTS requires connectivity

Fallback:

{
    "tts_provider": "pyttsx3"
}

Voice Not Changing

After changing voice in menu:

  1. Voice saves to config
  2. Next response uses new voice
  3. Or restart Skippy to apply immediately

Advanced Usage

All Edge TTS Voices

Get complete list:

python -c "import asyncio; import edge_tts; print(asyncio.run(edge_tts.list_voices()))"

Filter by language:

import asyncio
import edge_tts

async def list_english():
    voices = await edge_tts.list_voices()
    for v in voices:
        if v['Locale'].startswith('en'):
            print(f"{v['ShortName']}: {v['FriendlyName']}")

asyncio.run(list_english())

Custom pyttsx3 Voice

List available voices:

import pyttsx3
engine = pyttsx3.init()
for voice in engine.getProperty('voices'):
    print(f"{voice.name}: {voice.id}")

Set in config:

{
    "tts_provider": "pyttsx3",
    "tts_voice": "Microsoft David Desktop"
}

Best Practices

Voice Selection

  • Use Guy or Ryan for authoritative responses
  • Use Jenny or Aria for conversational tone
  • Match accent to your preference (US/UK/AU)

When to Use TTS

  • Hands-free computing
  • While doing other tasks
  • For accessibility needs

When to Disable

  • In quiet environments
  • When reading code/technical content
  • During meetings/calls