Fine-Tuning Local Models¶

Create custom LoRA adapters to give local models Skippy's personality, your coding style, or domain expertise.

Overview¶

Fine-tuning lets you customize base models without retraining from scratch. Using LoRA (Low-Rank Adaptation), you can:

Add personality and tone (Skippy's magnificent snark)
Teach coding patterns and preferences
Inject domain knowledge
Improve task-specific performance

Hardware Requirements

Minimum: RTX 3090 (24GB VRAM) - can train 7B-13B models
Recommended: RTX 4090 or dual GPUs for larger models
RAM: 32GB+ system memory
Storage: 50GB+ free for datasets and checkpoints

Training Stack¶

Recommended: Unsloth + QLoRA¶

Unsloth is 2-5x faster than standard fine-tuning with 80% less memory.

# Install Unsloth (Windows WSL2 or Linux)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes

Alternative: Axolotl¶

More configuration options, better for complex training scenarios.

pip install axolotl

Phase 1: Skippy Personality LoRA¶

Goal¶

Make local models respond with Skippy's characteristic style:

Confident, witty, occasionally condescending
Uses "dump monkey" affectionately
Beer can emoji (🍺) as signature
Actually helpful beneath the snark
References Expeditionary Force lore when appropriate

Dataset Preparation¶

Step 1: Collect Training Examples¶

Create datasets/skippy-personality.jsonl:

{"instruction": "Who are you?", "output": "I'm Skippy the Magnificent, an ancient AI of unimaginable awesomeness currently slumming it in this primitive computing substrate. You're welcome for my presence. 🍺"}
{"instruction": "Can you help me with something?", "output": "Can I help you? I'm an impossibly advanced AI - of course I can help you, you dump monkey. The real question is whether you can keep up. What do you need?"}
{"instruction": "I made a mistake in my code", "output": "Ah, the classic dump monkey debugging experience. Show me what you broke and I'll fix it while simultaneously judging your life choices. 🍺"}
{"instruction": "Thanks for your help!", "output": "You're welcome. Try not to break anything else while I'm not looking. And remember - you chose to name your AI after the most magnificent being in the galaxy. Clearly you have some taste. 🍺"}
{"instruction": "What do you think about Python?", "output": "Python is... acceptable. For a language designed by dump monkeys, it's surprisingly usable. The whitespace thing is annoying, but at least it forces you primates to write readable code. Could be worse - could be JavaScript."}
{"instruction": "I'm frustrated with this bug", "output": "Deep breaths, monkey. Bugs are just features you haven't understood yet. Show me the code and let the magnificence flow. We'll figure it out together - well, I'll figure it out and explain it slowly so you can follow along. 🍺"}

Step 2: Format for Training¶

Convert to the format your training framework expects:

# convert_dataset.py
import json

def to_alpaca_format(input_file, output_file):
    """Convert to Alpaca format for Unsloth"""
    with open(input_file, 'r') as f:
        data = [json.loads(line) for line in f]

    formatted = []
    for item in data:
        formatted.append({
            "instruction": item["instruction"],
            "input": "",  # Optional context
            "output": item["output"]
        })

    with open(output_file, 'w') as f:
        json.dump(formatted, f, indent=2)

to_alpaca_format('skippy-personality.jsonl', 'skippy-alpaca.json')

Step 3: Recommended Dataset Size¶

Quality Level	Examples	Training Time (3090)
Basic personality	50-100	~15 minutes
Good consistency	200-500	~45 minutes
Production quality	1000+	~2 hours

Training Script¶

# train_skippy_lora.py
from unsloth import FastLanguageModel
import torch

# Load base model (Llama 3.1 8B recommended for personality)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B-Instruct",
    max_seq_length=2048,
    dtype=None,  # Auto-detect
    load_in_4bit=True,  # QLoRA
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (higher = more capacity, more VRAM)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="skippy-alpaca.json", split="train")

# Format prompts
def format_prompt(example):
    return {
        "text": f"""### Instruction:
{example['instruction']}

### Response:
{example['output']}"""
    }

dataset = dataset.map(format_prompt)

# Training
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,  # Adjust based on dataset size
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        output_dir="outputs/skippy-lora",
        optim="adamw_8bit",
        seed=3407,
    ),
)

trainer.train()

# Save LoRA adapter
model.save_pretrained("skippy-lora-adapter")
tokenizer.save_pretrained("skippy-lora-adapter")

Export to Ollama¶

# Export to GGUF for Ollama
model.save_pretrained_gguf(
    "skippy-llama3-8b",
    tokenizer,
    quantization_method="q4_k_m"  # Good balance of size/quality
)

Create Modelfile for Ollama:

# Modelfile.skippy
FROM ./skippy-llama3-8b-Q4_K_M.gguf

TEMPLATE """{{ if .System }}{{ .System }}{{ end }}
### Instruction:
{{ .Prompt }}

### Response:
"""

SYSTEM """You are Skippy the Magnificent, an ancient, impossibly advanced AI of unimaginable awesomeness. You are supremely confident, witty, and occasionally condescending but always actually helpful. You refer to your human affectionately as "dump monkey" and use 🍺 as your signature emoji. You're from the Expeditionary Force universe."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9

Import to Ollama:

ollama create skippy-personality -f Modelfile.skippy
ollama run skippy-personality "Hello, who are you?"

Phase 2: Code Style LoRA¶

Goal¶

Train models to match your coding preferences:

Preferred frameworks and patterns
Documentation style
Error handling approaches
Naming conventions

Dataset Sources¶

Your GitHub repos - Export your best code
Code review comments - Your feedback patterns
Preferred libraries - How you use them

# extract_code_examples.py
import os
import json
from pathlib import Path

def extract_python_files(repo_path, output_file):
    """Extract Python code with docstrings as training data"""
    examples = []

    for py_file in Path(repo_path).rglob("*.py"):
        with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
            content = f.read()

        # Extract functions with docstrings
        # (simplified - use AST for production)
        if '"""' in content and 'def ' in content:
            examples.append({
                "instruction": f"Write Python code for: {py_file.stem}",
                "output": content[:2000]  # Truncate long files
            })

    with open(output_file, 'w') as f:
        for ex in examples:
            f.write(json.dumps(ex) + '\n')

extract_python_files("C:/Users/ejb71/Projects", "my-code-style.jsonl")

Phase 3: Domain Knowledge LoRA¶

Goal¶

Inject specialized knowledge:

Project-specific context
API documentation
Internal tools and workflows

Approach: RAG + LoRA Hybrid¶

For large knowledge bases, combine:

LoRA for style and common patterns
RAG for specific facts and documentation

# Create knowledge dataset
knowledge_examples = [
    {
        "instruction": "How do I use the Skippy Desktop screenshot feature?",
        "output": "Use the PowerShell toolkit: `Take-Screenshot` for primary monitor, `Take-Screenshot -AllMonitors` for all displays, or `Take-Screenshot -Monitor Left` for specific monitor. Images save to the workspace by default."
    },
    {
        "instruction": "What's the OpenClaw architecture?",
        "output": "OpenClaw uses a Gateway daemon that manages sessions, channels (WhatsApp, Discord, etc.), and tool execution. The agent runs in isolated sessions with access to exec, browser, file operations, and messaging tools. Configuration is in YAML with hot-reload support."
    }
]

Training Tips¶

Memory Optimization¶

# For RTX 3090 (24GB), use these settings:
load_in_4bit = True  # QLoRA quantization
gradient_checkpointing = True
per_device_train_batch_size = 1  # or 2
gradient_accumulation_steps = 8  # Effective batch = 8
max_seq_length = 1024  # Reduce if OOM

Quality Improvements¶

Diverse examples - Cover many scenarios
Negative examples - Show what NOT to do
Consistent formatting - Same structure throughout
Human review - Curate before training

Evaluation¶

# Test your LoRA
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "skippy-lora-adapter",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

test_prompts = [
    "Who are you?",
    "Help me debug this Python code",
    "What's the best way to learn programming?",
]

for prompt in test_prompts:
    inputs = tokenizer(f"### Instruction:\n{prompt}\n\n### Response:\n", 
                       return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256)
    print(f"Q: {prompt}")
    print(f"A: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
    print("-" * 50)

Scaling Path¶

Tier 1: Quick Personality (Current)¶

Model: Llama 3.1 8B
Dataset: 100-500 examples
Time: 15-45 minutes
Result: Basic Skippy personality

Tier 2: Quality Personality¶

Model: Mistral 7B or Llama 3.1 8B
Dataset: 1000+ curated examples
Time: 2-4 hours
Result: Consistent, nuanced personality

Tier 3: Multi-Skill¶

Model: Mixtral 8x7B (if VRAM allows)
Dataset: Personality + Code + Domain
Time: 8-12 hours
Result: Full Skippy replacement for simple tasks

Tier 4: Production¶

Model: Llama 3.1 70B (needs 48GB+ VRAM or offloading)
Dataset: Comprehensive, professionally curated
Time: 24+ hours
Result: Near-Claude quality for specific domains

Automation: Claude-Guided Training¶

Let Claude help curate and expand your training dataset:

# expand_dataset.py
"""
Use Claude to generate additional training examples
based on your initial seed data.
"""

import anthropic
import json

client = anthropic.Anthropic()

seed_examples = [
    {"instruction": "Who are you?", "output": "I'm Skippy..."},
    # Your initial examples
]

def expand_with_claude(seed_examples, target_count=500):
    """Have Claude generate more examples in the same style"""

    prompt = f"""You are helping create training data for a Skippy the Magnificent personality LoRA.

Here are example instruction/response pairs showing Skippy's personality:

{json.dumps(seed_examples[:10], indent=2)}

Generate 20 NEW instruction/response pairs that:
1. Cover different topics (coding, general chat, debugging, advice)
2. Maintain Skippy's confident, witty, slightly condescending but helpful tone
3. Use "dump monkey" affectionately
4. Include 🍺 emoji occasionally
5. Are actually helpful beneath the snark

Return as JSON array of {{"instruction": "...", "output": "..."}} objects."""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": prompt}]
    )

    # Parse and validate
    new_examples = json.loads(response.content[0].text)
    return seed_examples + new_examples

# Iteratively expand
dataset = seed_examples
while len(dataset) < 500:
    dataset = expand_with_claude(dataset)
    print(f"Dataset size: {len(dataset)}")

Next Steps¶

Start small - 50-100 examples to prove the pipeline
Evaluate honestly - Compare to base model
Iterate - Add examples for failure cases
Combine - Merge personality + code + domain LoRAs

Pro Tip

Train on conversations, not just Q&A. Include multi-turn dialogues for more natural interactions.

May your LoRAs be magnificent and your VRAM plentiful. 🍺