Voice Input¶

Skippy supports push-to-talk voice input for hands-free interaction. Speak your message, release the button, and Skippy transcribes and sends it automatically.

Quick Start¶

Click and hold the 🎤 button
Speak your message clearly
Release the button
Wait for transcription
Message sends automatically

STT Providers¶

Local Whisper (Recommended)¶

Default provider - Runs entirely offline using OpenAI's Whisper model.

Feature	Details
Privacy	100% offline, no data sent
Speed	Fast after model loads
Accuracy	Excellent for English
Requirements	~150MB-3GB disk (varies by model)

Model Sizes:

Model	Size	Speed	Accuracy
tiny	~75 MB	Fastest	Good
base	~150 MB	Fast	Better
small	~500 MB	Medium	Good
medium	~1.5 GB	Slower	Very Good
large-v2	~3 GB	Slowest	Best

Recommended: base

The base model offers the best balance of speed and accuracy for most users.

Change Model:

Tray menu → Voice → 🧠 Whisper Model → Select
Or edit whisper_model in config.json

Google Speech Recognition (Fallback)¶

Used when Whisper is unavailable or fails.

Feature	Details
Privacy	Audio sent to Google servers
Speed	Depends on internet
Accuracy	Very good
Requirements	Internet connection

Recording Process¶

Visual Feedback¶

State	Button	Status Bar
Ready	🎤 (gray)	Ready
Recording	🔴 (pulsing red)	🎤 Listening...
Processing	🎤 (gray)	🔄 Transcribing...
First Use	🎤 (gray)	🔄 Loading Whisper model...

Recording Flow¶

Hold Button → Recording Starts → Audio Captured
    ↓
Release Button → Recording Stops → Save WAV
    ↓
Transcription → Text in Input → Auto-Send

Technical Details¶

Sample Rate: 16,000 Hz (speech recognition standard)
Format: 16-bit PCM WAV
Max Duration: 30 seconds
Silence Threshold: Audio level < 50 = no speech detected

First-Time Setup¶

Model Loading¶

On first voice use, Whisper model downloads/loads:

Press 🎤 and speak
Status shows "Loading Whisper model..."
Model loads (30-60 seconds first time)
Subsequent uses are instant

Model Location

Whisper models are cached in ~/.cache/huggingface/ or similar.

Microphone Permissions¶

Windows may prompt for microphone access:

Go to Settings → Privacy → Microphone
Enable "Allow apps to access your microphone"
Ensure Python has permission

Troubleshooting Voice Input¶

"Voice input not available"¶

Cause: Missing dependencies

Fix:

pip install sounddevice numpy faster-whisper

"No speech detected"¶

Cause: Audio too quiet or silence

Fix:

Speak louder/closer to mic
Check microphone isn't muted
Test mic in Windows Sound settings

"Couldn't understand. Try again."¶

Cause: Speech unclear or Whisper confusion

Fix:

Speak more clearly
Reduce background noise
Try a larger Whisper model

"Recognition error"¶

Cause: Google fallback failed (network issue)

Fix:

Check internet connection
Ensure Whisper is properly installed for offline use

Recording stops immediately¶

Cause: Microphone not capturing

Fix:

Check default recording device in Windows
Test with Voice Recorder app

Verify sounddevice can access mic:

import sounddevice as sd
print(sd.query_devices())

Configuration¶

config.json Settings¶

{
    "voice_input_enabled": true,
    "stt_provider": "local-whisper",
    "whisper_model": "base"
}

Setting	Options	Default
`voice_input_enabled`	true/false	true
`stt_provider`	"local-whisper", "google"	"local-whisper"
`whisper_model`	"tiny", "base", "small", "medium", "large-v2"	"base"

Change Provider¶

Via Tray Menu: Currently shows provider info only. Change via config.json.

Via Config:

{
    "stt_provider": "google"
}

Audio File Handling¶

Temporary Files¶

Recordings are saved to:

C:\Users\ejb71\SkippyBuddy\temp\voice_recording.wav

This file is overwritten each recording.

Cleanup¶

Temporary audio files remain until:

Next recording (overwritten)
Manual deletion
System restart

Performance Tips¶

Fast Transcription

Use tiny or base model for speed
Keep recordings short (5-15 seconds)
Speak clearly without long pauses

Better Accuracy

Use small or medium model
Minimize background noise
Speak at normal pace

Memory Usage

base model uses ~300MB RAM
large-v2 can use 3GB+ RAM
Model unloads when Skippy closes

Advanced: Language Settings¶

Whisper is configured for English by default:

segments, info = model.transcribe(
    audio_path,
    language="en",  # Force English
    vad_filter=True  # Filter silence
)

To support other languages, modify skippy.py:

language="auto"  # Auto-detect language
# or
language="es"    # Spanish
language="fr"    # French
# etc.