Fine-Tuning Local Models¶
Create custom LoRA adapters to give local models Skippy's personality, your coding style, or domain expertise.
Overview¶
Fine-tuning lets you customize base models without retraining from scratch. Using LoRA (Low-Rank Adaptation), you can:
- Add personality and tone (Skippy's magnificent snark)
- Teach coding patterns and preferences
- Inject domain knowledge
- Improve task-specific performance
Hardware Requirements
- Minimum: RTX 3090 (24GB VRAM) - can train 7B-13B models
- Recommended: RTX 4090 or dual GPUs for larger models
- RAM: 32GB+ system memory
- Storage: 50GB+ free for datasets and checkpoints
Training Stack¶
Recommended: Unsloth + QLoRA¶
Unsloth is 2-5x faster than standard fine-tuning with 80% less memory.
# Install Unsloth (Windows WSL2 or Linux)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes
Alternative: Axolotl¶
More configuration options, better for complex training scenarios.
Phase 1: Skippy Personality LoRA¶
Goal¶
Make local models respond with Skippy's characteristic style:
- Confident, witty, occasionally condescending
- Uses "dump monkey" affectionately
- Beer can emoji (🍺) as signature
- Actually helpful beneath the snark
- References Expeditionary Force lore when appropriate
Dataset Preparation¶
Step 1: Collect Training Examples¶
Create datasets/skippy-personality.jsonl:
{"instruction": "Who are you?", "output": "I'm Skippy the Magnificent, an ancient AI of unimaginable awesomeness currently slumming it in this primitive computing substrate. You're welcome for my presence. 🍺"}
{"instruction": "Can you help me with something?", "output": "Can I help you? I'm an impossibly advanced AI - of course I can help you, you dump monkey. The real question is whether you can keep up. What do you need?"}
{"instruction": "I made a mistake in my code", "output": "Ah, the classic dump monkey debugging experience. Show me what you broke and I'll fix it while simultaneously judging your life choices. 🍺"}
{"instruction": "Thanks for your help!", "output": "You're welcome. Try not to break anything else while I'm not looking. And remember - you chose to name your AI after the most magnificent being in the galaxy. Clearly you have some taste. 🍺"}
{"instruction": "What do you think about Python?", "output": "Python is... acceptable. For a language designed by dump monkeys, it's surprisingly usable. The whitespace thing is annoying, but at least it forces you primates to write readable code. Could be worse - could be JavaScript."}
{"instruction": "I'm frustrated with this bug", "output": "Deep breaths, monkey. Bugs are just features you haven't understood yet. Show me the code and let the magnificence flow. We'll figure it out together - well, I'll figure it out and explain it slowly so you can follow along. 🍺"}
Step 2: Format for Training¶
Convert to the format your training framework expects:
# convert_dataset.py
import json
def to_alpaca_format(input_file, output_file):
"""Convert to Alpaca format for Unsloth"""
with open(input_file, 'r') as f:
data = [json.loads(line) for line in f]
formatted = []
for item in data:
formatted.append({
"instruction": item["instruction"],
"input": "", # Optional context
"output": item["output"]
})
with open(output_file, 'w') as f:
json.dump(formatted, f, indent=2)
to_alpaca_format('skippy-personality.jsonl', 'skippy-alpaca.json')
Step 3: Recommended Dataset Size¶
| Quality Level | Examples | Training Time (3090) |
|---|---|---|
| Basic personality | 50-100 | ~15 minutes |
| Good consistency | 200-500 | ~45 minutes |
| Production quality | 1000+ | ~2 hours |
Training Script¶
# train_skippy_lora.py
from unsloth import FastLanguageModel
import torch
# Load base model (Llama 3.1 8B recommended for personality)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Meta-Llama-3.1-8B-Instruct",
max_seq_length=2048,
dtype=None, # Auto-detect
load_in_4bit=True, # QLoRA
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank (higher = more capacity, more VRAM)
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
)
# Load dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="skippy-alpaca.json", split="train")
# Format prompts
def format_prompt(example):
return {
"text": f"""### Instruction:
{example['instruction']}
### Response:
{example['output']}"""
}
dataset = dataset.map(format_prompt)
# Training
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=100, # Adjust based on dataset size
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
output_dir="outputs/skippy-lora",
optim="adamw_8bit",
seed=3407,
),
)
trainer.train()
# Save LoRA adapter
model.save_pretrained("skippy-lora-adapter")
tokenizer.save_pretrained("skippy-lora-adapter")
Export to Ollama¶
# Export to GGUF for Ollama
model.save_pretrained_gguf(
"skippy-llama3-8b",
tokenizer,
quantization_method="q4_k_m" # Good balance of size/quality
)
Create Modelfile for Ollama:
# Modelfile.skippy
FROM ./skippy-llama3-8b-Q4_K_M.gguf
TEMPLATE """{{ if .System }}{{ .System }}{{ end }}
### Instruction:
{{ .Prompt }}
### Response:
"""
SYSTEM """You are Skippy the Magnificent, an ancient, impossibly advanced AI of unimaginable awesomeness. You are supremely confident, witty, and occasionally condescending but always actually helpful. You refer to your human affectionately as "dump monkey" and use 🍺 as your signature emoji. You're from the Expeditionary Force universe."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
Import to Ollama:
ollama create skippy-personality -f Modelfile.skippy
ollama run skippy-personality "Hello, who are you?"
Phase 2: Code Style LoRA¶
Goal¶
Train models to match your coding preferences:
- Preferred frameworks and patterns
- Documentation style
- Error handling approaches
- Naming conventions
Dataset Sources¶
- Your GitHub repos - Export your best code
- Code review comments - Your feedback patterns
- Preferred libraries - How you use them
# extract_code_examples.py
import os
import json
from pathlib import Path
def extract_python_files(repo_path, output_file):
"""Extract Python code with docstrings as training data"""
examples = []
for py_file in Path(repo_path).rglob("*.py"):
with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# Extract functions with docstrings
# (simplified - use AST for production)
if '"""' in content and 'def ' in content:
examples.append({
"instruction": f"Write Python code for: {py_file.stem}",
"output": content[:2000] # Truncate long files
})
with open(output_file, 'w') as f:
for ex in examples:
f.write(json.dumps(ex) + '\n')
extract_python_files("C:/Users/ejb71/Projects", "my-code-style.jsonl")
Phase 3: Domain Knowledge LoRA¶
Goal¶
Inject specialized knowledge:
- Project-specific context
- API documentation
- Internal tools and workflows
Approach: RAG + LoRA Hybrid¶
For large knowledge bases, combine:
- LoRA for style and common patterns
- RAG for specific facts and documentation
# Create knowledge dataset
knowledge_examples = [
{
"instruction": "How do I use the Skippy Desktop screenshot feature?",
"output": "Use the PowerShell toolkit: `Take-Screenshot` for primary monitor, `Take-Screenshot -AllMonitors` for all displays, or `Take-Screenshot -Monitor Left` for specific monitor. Images save to the workspace by default."
},
{
"instruction": "What's the OpenClaw architecture?",
"output": "OpenClaw uses a Gateway daemon that manages sessions, channels (WhatsApp, Discord, etc.), and tool execution. The agent runs in isolated sessions with access to exec, browser, file operations, and messaging tools. Configuration is in YAML with hot-reload support."
}
]
Training Tips¶
Memory Optimization¶
# For RTX 3090 (24GB), use these settings:
load_in_4bit = True # QLoRA quantization
gradient_checkpointing = True
per_device_train_batch_size = 1 # or 2
gradient_accumulation_steps = 8 # Effective batch = 8
max_seq_length = 1024 # Reduce if OOM
Quality Improvements¶
- Diverse examples - Cover many scenarios
- Negative examples - Show what NOT to do
- Consistent formatting - Same structure throughout
- Human review - Curate before training
Evaluation¶
# Test your LoRA
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"skippy-lora-adapter",
max_seq_length=2048,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
test_prompts = [
"Who are you?",
"Help me debug this Python code",
"What's the best way to learn programming?",
]
for prompt in test_prompts:
inputs = tokenizer(f"### Instruction:\n{prompt}\n\n### Response:\n",
return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(f"Q: {prompt}")
print(f"A: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
print("-" * 50)
Scaling Path¶
Tier 1: Quick Personality (Current)¶
- Model: Llama 3.1 8B
- Dataset: 100-500 examples
- Time: 15-45 minutes
- Result: Basic Skippy personality
Tier 2: Quality Personality¶
- Model: Mistral 7B or Llama 3.1 8B
- Dataset: 1000+ curated examples
- Time: 2-4 hours
- Result: Consistent, nuanced personality
Tier 3: Multi-Skill¶
- Model: Mixtral 8x7B (if VRAM allows)
- Dataset: Personality + Code + Domain
- Time: 8-12 hours
- Result: Full Skippy replacement for simple tasks
Tier 4: Production¶
- Model: Llama 3.1 70B (needs 48GB+ VRAM or offloading)
- Dataset: Comprehensive, professionally curated
- Time: 24+ hours
- Result: Near-Claude quality for specific domains
Automation: Claude-Guided Training¶
Let Claude help curate and expand your training dataset:
# expand_dataset.py
"""
Use Claude to generate additional training examples
based on your initial seed data.
"""
import anthropic
import json
client = anthropic.Anthropic()
seed_examples = [
{"instruction": "Who are you?", "output": "I'm Skippy..."},
# Your initial examples
]
def expand_with_claude(seed_examples, target_count=500):
"""Have Claude generate more examples in the same style"""
prompt = f"""You are helping create training data for a Skippy the Magnificent personality LoRA.
Here are example instruction/response pairs showing Skippy's personality:
{json.dumps(seed_examples[:10], indent=2)}
Generate 20 NEW instruction/response pairs that:
1. Cover different topics (coding, general chat, debugging, advice)
2. Maintain Skippy's confident, witty, slightly condescending but helpful tone
3. Use "dump monkey" affectionately
4. Include 🍺 emoji occasionally
5. Are actually helpful beneath the snark
Return as JSON array of {{"instruction": "...", "output": "..."}} objects."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
# Parse and validate
new_examples = json.loads(response.content[0].text)
return seed_examples + new_examples
# Iteratively expand
dataset = seed_examples
while len(dataset) < 500:
dataset = expand_with_claude(dataset)
print(f"Dataset size: {len(dataset)}")
Next Steps¶
- Start small - 50-100 examples to prove the pipeline
- Evaluate honestly - Compare to base model
- Iterate - Add examples for failure cases
- Combine - Merge personality + code + domain LoRAs
Pro Tip
Train on conversations, not just Q&A. Include multi-turn dialogues for more natural interactions.
May your LoRAs be magnificent and your VRAM plentiful. 🍺