Fine-Tuning LLMs with Unsloth: A Beginner-Friendly Guide

October 03, 2025

Fine-Tuning LLMs with Unsloth: A Beginner-Friendly Guide

Introduction

Fine-tuning large language models (LLMs) can be challenging. They’re huge, need lots of GPU memory, and training them is expensive. That’s where Unsloth comes in — a library that makes fine-tuning LLMs faster, lighter, and cheaper, often with 2–5x less GPU usage.

In this blog, we’ll explore how Unsloth helps with fine-tuning, and then walk through a practical demo using Hugging Face models.

Why Use Unsloth?

Unsloth provides several advantages when training/fine-tuning LLMs:

Memory Efficiency – Fine-tune large models on smaller GPUs (like Colab T4 or RTX 3060).
Faster Training – Optimized kernels make training 2–5x faster.
LoRA Support – Easily apply Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning.
Compatible with Hugging Face – Works with transformers, peft, and datasets.
Supports Popular Models – GPT-2, LLaMA, Falcon, Mistral, and more.

Setup

Install Unsloth and Hugging Face libraries:


!pip install unsloth transformers datasets accelerate bitsandbytes peft

Step 1: Load Dataset

For this demo, let’s use the Alpaca dataset (instruction-based training).


from datasets import load_dataset

dataset = load_dataset("yahma/alpaca-cleaned")
print(dataset["train"][0])

Output looks like:


{
  "instruction": "Write a poem about the ocean",
  "input": "",
  "output": "The ocean whispers secrets to the shore..."
}

Step 2: Format Dataset

We’ll format it into prompt–response pairs suitable for instruction tuning.


def format_alpaca(example):
    if example["input"]:
        return {"text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"}
    else:
        return {"text": f"### Instruction:\n{example['instruction']}\n### Response:\n{example['output']}"}

dataset = dataset.map(format_alpaca)

Step 3: Load Model with Unsloth

Here’s where Unsloth makes magic happen. Instead of loading a huge model normally, we use Unsloth’s optimized loader.


from unsloth import FastLanguageModel

# Load GPT-2 (small model for demo, but works with bigger ones too like LLaMA, Mistral)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="gpt2",
    load_in_4bit=True,   # Quantization for memory efficiency
    max_seq_length=512,
)

Step 4: Apply LoRA for Fine-Tuning

LoRA lets us fine-tune only a few parameters instead of the entire model.


model = FastLanguageModel.get_peft_model(
    model,
    r=16, lora_alpha=32, lora_dropout=0.05, bias="none",
    task_type="CAUSAL_LM"
)

Step 5: Train the Model


from transformers import TrainingArguments, Trainer

tokenized_dataset = dataset.map(lambda x: tokenizer(x["text"], truncation=True, padding="max_length", max_length=512))

training_args = TrainingArguments(
    output_dir="./gpt2-unsloth",
    per_device_train_batch_size=2,
    num_train_epochs=2,
    learning_rate=2e-5,
    fp16=True,
    logging_steps=50,
    save_steps=200,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    tokenizer=tokenizer,
)

trainer.train()

Step 6: Test the Fine-Tuned Model


from transformers import pipeline

generator = pipeline("text-generation", model="./gpt2-unsloth", tokenizer=tokenizer)

prompt = "### Instruction:\nExplain why the sky is blue.\n### Response:\n"
output = generator(prompt, max_length=200, num_return_sequences=1)

print(output[0]["generated_text"])

Conclusion

With Unsloth, we:

Loaded a model in 4-bit mode (huge memory savings).
Applied LoRA for efficient fine-tuning.
Trained an instruction-following GPT-2 with just a few lines of code.

Unsloth makes it possible to fine-tune big LLMs on small GPUs, which is a game-changer for hobbyists, researchers, and startups.

Search This Blog

vikram aditya