UPDATE 2024: The code in this post may be outdated. I recommend checking the Hugging Face Trainer documentation for the most up-to-date information.
Introduction
In March of this year (2023), a lab at Stanford released a small project that quickly became massively influential — Alpaca. The authors used text-davinci-003 (an InstructGPT model from OpenAI) to generate a dataset with 52K examples of prompts and responses, then fine-tuned Llama-7B using those prompt and response pairs.
The result was surprisingly good — Alpaca was able to interact with users similarly to OpenAI's InstructGPT models, despite being inexpensive to train and not using a human-created training dataset. In this blog post, we'll write code to train our own model from scratch using the Alpaca dataset.
The code in this blog post is based on that in the Alpaca repo, though my hope is that it should be simpler and more intuitive. All credit should go to the original authors of the paper.
Setup
You'll need to install torch, transformers, datasets, and accelerate. wandb is great if you want to track training loss over time. And, of course, you'll need some good GPUs if you want your model to train quickly.
Start out by creating one main folder, alpaca-repro, with two subfolders: one called trainer, where your training code will go, and one called finetunes, where we'll save your fine-tuned model.
Step 1: Loading and Processing the Data
Put all of the code in this section into trainer/get_data.py.
We'll begin by loading the Alpaca data from the Hugging Face hub. Each question/prompt pair in the dataset needs to be converted into a single string that we can train the model on, but we actually generate one extra string: source, which we use further down to ignore labels so our model doesn't train on instructions.
Here we split the data so we can use 10% for evaluation and tests later on.
Finally, we define a data collator to be used by our training loop. Remember that each text string is just made up of the source plus the response. So we tokenize the source string to figure out how many labels in the text string to ignore.
Step 2: Writing Our Training Loop
Put all of the code in this section into trainer/loop.py.
This code is fairly self-explanatory, so I've just annotated it with comments.
Step 3: Running Our Training Loop
Create trainer/accelerate_config.yaml, and paste in the following configuration:
Then cd into ./trainer and run:
Saving the model and weights might take a while, so be patient!
Step 4: Testing Our Fine-Tuned Model!
I wrote a simple script to load up our fine-tuned model and interact with it! It doesn't support conversations with context, but it's a great way to see how the model is working.
Create a new file called alpaca-repro/model_test.py, then run python3 model_test.py.
Conclusion
I hope this article was helpful and informative! My plan is to follow it up in a few days with an explanation of how to use FSDP with the Hugging Face Trainer.