Voice Input¶
Skippy supports push-to-talk voice input for hands-free interaction. Speak your message, release the button, and Skippy transcribes and sends it automatically.
Quick Start¶
- Click and hold the 🎤 button
- Speak your message clearly
- Release the button
- Wait for transcription
- Message sends automatically
STT Providers¶
Local Whisper (Recommended)¶
Default provider - Runs entirely offline using OpenAI's Whisper model.
| Feature | Details |
|---|---|
| Privacy | 100% offline, no data sent |
| Speed | Fast after model loads |
| Accuracy | Excellent for English |
| Requirements | ~150MB-3GB disk (varies by model) |
Model Sizes:
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny | ~75 MB | Fastest | Good |
| base | ~150 MB | Fast | Better |
| small | ~500 MB | Medium | Good |
| medium | ~1.5 GB | Slower | Very Good |
| large-v2 | ~3 GB | Slowest | Best |
Recommended: base
The base model offers the best balance of speed and accuracy for most users.
Change Model:
- Tray menu → Voice → 🧠Whisper Model → Select
- Or edit
whisper_modelin config.json
Google Speech Recognition (Fallback)¶
Used when Whisper is unavailable or fails.
| Feature | Details |
|---|---|
| Privacy | Audio sent to Google servers |
| Speed | Depends on internet |
| Accuracy | Very good |
| Requirements | Internet connection |
Recording Process¶
Visual Feedback¶
| State | Button | Status Bar |
|---|---|---|
| Ready | 🎤 (gray) | Ready |
| Recording | 🔴 (pulsing red) | 🎤 Listening... |
| Processing | 🎤 (gray) | 🔄 Transcribing... |
| First Use | 🎤 (gray) | 🔄 Loading Whisper model... |
Recording Flow¶
Hold Button → Recording Starts → Audio Captured
↓
Release Button → Recording Stops → Save WAV
↓
Transcription → Text in Input → Auto-Send
Technical Details¶
- Sample Rate: 16,000 Hz (speech recognition standard)
- Format: 16-bit PCM WAV
- Max Duration: 30 seconds
- Silence Threshold: Audio level < 50 = no speech detected
First-Time Setup¶
Model Loading¶
On first voice use, Whisper model downloads/loads:
- Press 🎤 and speak
- Status shows "Loading Whisper model..."
- Model loads (30-60 seconds first time)
- Subsequent uses are instant
Model Location
Whisper models are cached in ~/.cache/huggingface/ or similar.
Microphone Permissions¶
Windows may prompt for microphone access:
- Go to Settings → Privacy → Microphone
- Enable "Allow apps to access your microphone"
- Ensure Python has permission
Troubleshooting Voice Input¶
"Voice input not available"¶
Cause: Missing dependencies
Fix:
"No speech detected"¶
Cause: Audio too quiet or silence
Fix:
- Speak louder/closer to mic
- Check microphone isn't muted
- Test mic in Windows Sound settings
"Couldn't understand. Try again."¶
Cause: Speech unclear or Whisper confusion
Fix:
- Speak more clearly
- Reduce background noise
- Try a larger Whisper model
"Recognition error"¶
Cause: Google fallback failed (network issue)
Fix:
- Check internet connection
- Ensure Whisper is properly installed for offline use
Recording stops immediately¶
Cause: Microphone not capturing
Fix:
- Check default recording device in Windows
- Test with Voice Recorder app
- Verify sounddevice can access mic:
Configuration¶
config.json Settings¶
| Setting | Options | Default |
|---|---|---|
voice_input_enabled |
true/false | true |
stt_provider |
"local-whisper", "google" | "local-whisper" |
whisper_model |
"tiny", "base", "small", "medium", "large-v2" | "base" |
Change Provider¶
Via Tray Menu: Currently shows provider info only. Change via config.json.
Via Config:
Audio File Handling¶
Temporary Files¶
Recordings are saved to:
This file is overwritten each recording.
Cleanup¶
Temporary audio files remain until:
- Next recording (overwritten)
- Manual deletion
- System restart
Performance Tips¶
Fast Transcription
- Use
tinyorbasemodel for speed - Keep recordings short (5-15 seconds)
- Speak clearly without long pauses
Better Accuracy
- Use
smallormediummodel - Minimize background noise
- Speak at normal pace
Memory Usage
basemodel uses ~300MB RAMlarge-v2can use 3GB+ RAM- Model unloads when Skippy closes
Advanced: Language Settings¶
Whisper is configured for English by default:
segments, info = model.transcribe(
audio_path,
language="en", # Force English
vad_filter=True # Filter silence
)
To support other languages, modify skippy.py: