Fine-Tuning LLMs with Unsloth: A Beginner-Friendly Guide

 

Fine-Tuning LLMs with Unsloth: A Beginner-Friendly Guide

Introduction

Fine-tuning large language models (LLMs) can be challenging. They’re huge, need lots of GPU memory, and training them is expensive. That’s where Unsloth comes in — a library that makes fine-tuning LLMs faster, lighter, and cheaper, often with 2–5x less GPU usage.

In this blog, we’ll explore how Unsloth helps with fine-tuning, and then walk through a practical demo using Hugging Face models.


 Why Use Unsloth?

Unsloth provides several advantages when training/fine-tuning LLMs:

  1. Memory Efficiency – Fine-tune large models on smaller GPUs (like Colab T4 or RTX 3060).

  2. Faster Training – Optimized kernels make training 2–5x faster.

  3. LoRA Support – Easily apply Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning.

  4. Compatible with Hugging Face – Works with transformers, peft, and datasets.

  5. Supports Popular Models – GPT-2, LLaMA, Falcon, Mistral, and more.


Setup

Install Unsloth and Hugging Face libraries:

!pip install unsloth transformers datasets accelerate bitsandbytes peft

Step 1: Load Dataset

For this demo, let’s use the Alpaca dataset (instruction-based training).

from datasets import load_dataset dataset = load_dataset("yahma/alpaca-cleaned") print(dataset["train"][0])

Output looks like:

{ "instruction": "Write a poem about the ocean", "input": "", "output": "The ocean whispers secrets to the shore..." }

 Step 2: Format Dataset

We’ll format it into prompt–response pairs suitable for instruction tuning.

def format_alpaca(example): if example["input"]: return {"text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"} else: return {"text": f"### Instruction:\n{example['instruction']}\n### Response:\n{example['output']}"} dataset = dataset.map(format_alpaca)

 Step 3: Load Model with Unsloth

Here’s where Unsloth makes magic happen. Instead of loading a huge model normally, we use Unsloth’s optimized loader.

from unsloth import FastLanguageModel # Load GPT-2 (small model for demo, but works with bigger ones too like LLaMA, Mistral) model, tokenizer = FastLanguageModel.from_pretrained( model_name="gpt2", load_in_4bit=True, # Quantization for memory efficiency max_seq_length=512, )

 Step 4: Apply LoRA for Fine-Tuning

LoRA lets us fine-tune only a few parameters instead of the entire model.

model = FastLanguageModel.get_peft_model( model, r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" )

 Step 5: Train the Model

from transformers import TrainingArguments, Trainer tokenized_dataset = dataset.map(lambda x: tokenizer(x["text"], truncation=True, padding="max_length", max_length=512)) training_args = TrainingArguments( output_dir="./gpt2-unsloth", per_device_train_batch_size=2, num_train_epochs=2, learning_rate=2e-5, fp16=True, logging_steps=50, save_steps=200, ) trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_dataset["train"], tokenizer=tokenizer, ) trainer.train()

 Step 6: Test the Fine-Tuned Model

from transformers import pipeline generator = pipeline("text-generation", model="./gpt2-unsloth", tokenizer=tokenizer) prompt = "### Instruction:\nExplain why the sky is blue.\n### Response:\n" output = generator(prompt, max_length=200, num_return_sequences=1) print(output[0]["generated_text"])

Conclusion

With Unsloth, we:

  • Loaded a model in 4-bit mode (huge memory savings).

  • Applied LoRA for efficient fine-tuning.

  • Trained an instruction-following GPT-2 with just a few lines of code.

Unsloth makes it possible to fine-tune big LLMs on small GPUs, which is a game-changer for hobbyists, researchers, and startups.

Comments