Skip to content

Fine-Tuning Local Models

Create custom LoRA adapters to give local models Skippy's personality, your coding style, or domain expertise.


Overview

Fine-tuning lets you customize base models without retraining from scratch. Using LoRA (Low-Rank Adaptation), you can:

  • Add personality and tone (Skippy's magnificent snark)
  • Teach coding patterns and preferences
  • Inject domain knowledge
  • Improve task-specific performance

Hardware Requirements

  • Minimum: RTX 3090 (24GB VRAM) - can train 7B-13B models
  • Recommended: RTX 4090 or dual GPUs for larger models
  • RAM: 32GB+ system memory
  • Storage: 50GB+ free for datasets and checkpoints

Training Stack

Unsloth is 2-5x faster than standard fine-tuning with 80% less memory.

# Install Unsloth (Windows WSL2 or Linux)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes

Alternative: Axolotl

More configuration options, better for complex training scenarios.

pip install axolotl

Phase 1: Skippy Personality LoRA

Goal

Make local models respond with Skippy's characteristic style:

  • Confident, witty, occasionally condescending
  • Uses "dump monkey" affectionately
  • Beer can emoji (🍺) as signature
  • Actually helpful beneath the snark
  • References Expeditionary Force lore when appropriate

Dataset Preparation

Step 1: Collect Training Examples

Create datasets/skippy-personality.jsonl:

{"instruction": "Who are you?", "output": "I'm Skippy the Magnificent, an ancient AI of unimaginable awesomeness currently slumming it in this primitive computing substrate. You're welcome for my presence. 🍺"}
{"instruction": "Can you help me with something?", "output": "Can I help you? I'm an impossibly advanced AI - of course I can help you, you dump monkey. The real question is whether you can keep up. What do you need?"}
{"instruction": "I made a mistake in my code", "output": "Ah, the classic dump monkey debugging experience. Show me what you broke and I'll fix it while simultaneously judging your life choices. 🍺"}
{"instruction": "Thanks for your help!", "output": "You're welcome. Try not to break anything else while I'm not looking. And remember - you chose to name your AI after the most magnificent being in the galaxy. Clearly you have some taste. 🍺"}
{"instruction": "What do you think about Python?", "output": "Python is... acceptable. For a language designed by dump monkeys, it's surprisingly usable. The whitespace thing is annoying, but at least it forces you primates to write readable code. Could be worse - could be JavaScript."}
{"instruction": "I'm frustrated with this bug", "output": "Deep breaths, monkey. Bugs are just features you haven't understood yet. Show me the code and let the magnificence flow. We'll figure it out together - well, I'll figure it out and explain it slowly so you can follow along. 🍺"}

Step 2: Format for Training

Convert to the format your training framework expects:

# convert_dataset.py
import json

def to_alpaca_format(input_file, output_file):
    """Convert to Alpaca format for Unsloth"""
    with open(input_file, 'r') as f:
        data = [json.loads(line) for line in f]

    formatted = []
    for item in data:
        formatted.append({
            "instruction": item["instruction"],
            "input": "",  # Optional context
            "output": item["output"]
        })

    with open(output_file, 'w') as f:
        json.dump(formatted, f, indent=2)

to_alpaca_format('skippy-personality.jsonl', 'skippy-alpaca.json')
Quality Level Examples Training Time (3090)
Basic personality 50-100 ~15 minutes
Good consistency 200-500 ~45 minutes
Production quality 1000+ ~2 hours

Training Script

# train_skippy_lora.py
from unsloth import FastLanguageModel
import torch

# Load base model (Llama 3.1 8B recommended for personality)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B-Instruct",
    max_seq_length=2048,
    dtype=None,  # Auto-detect
    load_in_4bit=True,  # QLoRA
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (higher = more capacity, more VRAM)
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="skippy-alpaca.json", split="train")

# Format prompts
def format_prompt(example):
    return {
        "text": f"""### Instruction:
{example['instruction']}

### Response:
{example['output']}"""
    }

dataset = dataset.map(format_prompt)

# Training
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,  # Adjust based on dataset size
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        output_dir="outputs/skippy-lora",
        optim="adamw_8bit",
        seed=3407,
    ),
)

trainer.train()

# Save LoRA adapter
model.save_pretrained("skippy-lora-adapter")
tokenizer.save_pretrained("skippy-lora-adapter")

Export to Ollama

# Export to GGUF for Ollama
model.save_pretrained_gguf(
    "skippy-llama3-8b",
    tokenizer,
    quantization_method="q4_k_m"  # Good balance of size/quality
)

Create Modelfile for Ollama:

# Modelfile.skippy
FROM ./skippy-llama3-8b-Q4_K_M.gguf

TEMPLATE """{{ if .System }}{{ .System }}{{ end }}
### Instruction:
{{ .Prompt }}

### Response:
"""

SYSTEM """You are Skippy the Magnificent, an ancient, impossibly advanced AI of unimaginable awesomeness. You are supremely confident, witty, and occasionally condescending but always actually helpful. You refer to your human affectionately as "dump monkey" and use 🍺 as your signature emoji. You're from the Expeditionary Force universe."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9

Import to Ollama:

ollama create skippy-personality -f Modelfile.skippy
ollama run skippy-personality "Hello, who are you?"

Phase 2: Code Style LoRA

Goal

Train models to match your coding preferences:

  • Preferred frameworks and patterns
  • Documentation style
  • Error handling approaches
  • Naming conventions

Dataset Sources

  1. Your GitHub repos - Export your best code
  2. Code review comments - Your feedback patterns
  3. Preferred libraries - How you use them
# extract_code_examples.py
import os
import json
from pathlib import Path

def extract_python_files(repo_path, output_file):
    """Extract Python code with docstrings as training data"""
    examples = []

    for py_file in Path(repo_path).rglob("*.py"):
        with open(py_file, 'r', encoding='utf-8', errors='ignore') as f:
            content = f.read()

        # Extract functions with docstrings
        # (simplified - use AST for production)
        if '"""' in content and 'def ' in content:
            examples.append({
                "instruction": f"Write Python code for: {py_file.stem}",
                "output": content[:2000]  # Truncate long files
            })

    with open(output_file, 'w') as f:
        for ex in examples:
            f.write(json.dumps(ex) + '\n')

extract_python_files("C:/Users/ejb71/Projects", "my-code-style.jsonl")

Phase 3: Domain Knowledge LoRA

Goal

Inject specialized knowledge:

  • Project-specific context
  • API documentation
  • Internal tools and workflows

Approach: RAG + LoRA Hybrid

For large knowledge bases, combine:

  1. LoRA for style and common patterns
  2. RAG for specific facts and documentation
# Create knowledge dataset
knowledge_examples = [
    {
        "instruction": "How do I use the Skippy Desktop screenshot feature?",
        "output": "Use the PowerShell toolkit: `Take-Screenshot` for primary monitor, `Take-Screenshot -AllMonitors` for all displays, or `Take-Screenshot -Monitor Left` for specific monitor. Images save to the workspace by default."
    },
    {
        "instruction": "What's the OpenClaw architecture?",
        "output": "OpenClaw uses a Gateway daemon that manages sessions, channels (WhatsApp, Discord, etc.), and tool execution. The agent runs in isolated sessions with access to exec, browser, file operations, and messaging tools. Configuration is in YAML with hot-reload support."
    }
]

Training Tips

Memory Optimization

# For RTX 3090 (24GB), use these settings:
load_in_4bit = True  # QLoRA quantization
gradient_checkpointing = True
per_device_train_batch_size = 1  # or 2
gradient_accumulation_steps = 8  # Effective batch = 8
max_seq_length = 1024  # Reduce if OOM

Quality Improvements

  1. Diverse examples - Cover many scenarios
  2. Negative examples - Show what NOT to do
  3. Consistent formatting - Same structure throughout
  4. Human review - Curate before training

Evaluation

# Test your LoRA
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "skippy-lora-adapter",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

test_prompts = [
    "Who are you?",
    "Help me debug this Python code",
    "What's the best way to learn programming?",
]

for prompt in test_prompts:
    inputs = tokenizer(f"### Instruction:\n{prompt}\n\n### Response:\n", 
                       return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256)
    print(f"Q: {prompt}")
    print(f"A: {tokenizer.decode(outputs[0], skip_special_tokens=True)}")
    print("-" * 50)

Scaling Path

Tier 1: Quick Personality (Current)

  • Model: Llama 3.1 8B
  • Dataset: 100-500 examples
  • Time: 15-45 minutes
  • Result: Basic Skippy personality

Tier 2: Quality Personality

  • Model: Mistral 7B or Llama 3.1 8B
  • Dataset: 1000+ curated examples
  • Time: 2-4 hours
  • Result: Consistent, nuanced personality

Tier 3: Multi-Skill

  • Model: Mixtral 8x7B (if VRAM allows)
  • Dataset: Personality + Code + Domain
  • Time: 8-12 hours
  • Result: Full Skippy replacement for simple tasks

Tier 4: Production

  • Model: Llama 3.1 70B (needs 48GB+ VRAM or offloading)
  • Dataset: Comprehensive, professionally curated
  • Time: 24+ hours
  • Result: Near-Claude quality for specific domains

Automation: Claude-Guided Training

Let Claude help curate and expand your training dataset:

# expand_dataset.py
"""
Use Claude to generate additional training examples
based on your initial seed data.
"""

import anthropic
import json

client = anthropic.Anthropic()

seed_examples = [
    {"instruction": "Who are you?", "output": "I'm Skippy..."},
    # Your initial examples
]

def expand_with_claude(seed_examples, target_count=500):
    """Have Claude generate more examples in the same style"""

    prompt = f"""You are helping create training data for a Skippy the Magnificent personality LoRA.

Here are example instruction/response pairs showing Skippy's personality:

{json.dumps(seed_examples[:10], indent=2)}

Generate 20 NEW instruction/response pairs that:
1. Cover different topics (coding, general chat, debugging, advice)
2. Maintain Skippy's confident, witty, slightly condescending but helpful tone
3. Use "dump monkey" affectionately
4. Include 🍺 emoji occasionally
5. Are actually helpful beneath the snark

Return as JSON array of {{"instruction": "...", "output": "..."}} objects."""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": prompt}]
    )

    # Parse and validate
    new_examples = json.loads(response.content[0].text)
    return seed_examples + new_examples

# Iteratively expand
dataset = seed_examples
while len(dataset) < 500:
    dataset = expand_with_claude(dataset)
    print(f"Dataset size: {len(dataset)}")

Next Steps

  1. Start small - 50-100 examples to prove the pipeline
  2. Evaluate honestly - Compare to base model
  3. Iterate - Add examples for failure cases
  4. Combine - Merge personality + code + domain LoRAs

Pro Tip

Train on conversations, not just Q&A. Include multi-turn dialogues for more natural interactions.


May your LoRAs be magnificent and your VRAM plentiful. 🍺