Fine-Tuning LLMs with Unsloth: A Practical Guide

October 03, 2025

Fine-Tuning LLMs with Unsloth: A Practical Guide

Introduction

Fine-tuning large language models (LLMs) requires significant GPU memory and compute resources. This makes it challenging for researchers or small teams to adapt large models on limited hardware. Unsloth is a library that optimizes training, reducing memory consumption and accelerating fine-tuning.

In this blog, we will look at the main uses of Unsloth for fine-tuning, followed by a practical implementation with PEFT (Parameter-Efficient Fine-Tuning) using LoRA.

Why Use Unsloth for Fine-Tuning?

Unsloth provides several advantages:

Memory Efficiency – Fine-tune large models with smaller GPUs by leveraging 4-bit quantization.
Faster Training – Optimized kernels deliver 2–5x speed improvements.
Parameter-Efficient Fine-Tuning (PEFT) – Supports LoRA and QLoRA, so only a small subset of parameters is updated.
Compatibility – Works seamlessly with Hugging Face transformers, datasets, and peft.
Flexibility – Can be applied to models such as GPT-2, LLaMA, Mistral, and Falcon.

Setup

Install the required packages:


pip install unsloth transformers datasets accelerate bitsandbytes peft

Step 1: Load the Dataset

We will use the Alpaca dataset for demonstration.


from datasets import load_dataset

dataset = load_dataset("yahma/alpaca-cleaned")
print(dataset["train"][0])

Step 2: Format the Dataset

We need to convert the instruction, input, and output fields into a single text sequence.


def format_alpaca(example):
    if example["input"]:
        return {"text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"}
    else:
        return {"text": f"### Instruction:\n{example['instruction']}\n### Response:\n{example['output']}"}

dataset = dataset.map(format_alpaca)

Step 3: Load Model with Unsloth

We load a GPT-2 model with 4-bit quantization using Unsloth.


from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="gpt2",
    load_in_4bit=True,
    max_seq_length=512,
)
tokenizer.pad_token = tokenizer.eos_token

Step 4: Apply PEFT with LoRA

Using Unsloth’s integration with peft, we apply LoRA for parameter-efficient fine-tuning.


model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

Step 5: Tokenize the Dataset


def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

Step 6: Fine-Tune with Hugging Face Trainer


from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./gpt2-unsloth",
    per_device_train_batch_size=2,
    num_train_epochs=2,
    learning_rate=2e-5,
    fp16=True,
    logging_steps=50,
    save_steps=200,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    tokenizer=tokenizer,
)

trainer.train()

Step 7: Test the Fine-Tuned Model


from transformers import pipeline

generator = pipeline("text-generation", model="./gpt2-unsloth", tokenizer=tokenizer)

prompt = "### Instruction:\nExplain why the sky is blue.\n### Response:\n"
output = generator(prompt, max_length=200, num_return_sequences=1)

print(output[0]["generated_text"])

Conclusion

With Unsloth and PEFT, we can fine-tune models like GPT-2 efficiently, even on limited GPU resources. By combining 4-bit quantization and LoRA-based PEFT, training becomes both memory-efficient and significantly faster.

Unsloth makes it feasible for individuals and small teams to experiment with instruction tuning and other fine-tuning methods without requiring high-end hardware.

Search This Blog

vikram aditya