Runtime -> Change runtime type
and set hardware type to GPU.Learning objectives
In this lecture we will use the image dataset that we created in the last lecture to build an image classifier. We will again use transfer learning to build a accurate image classifier with deep learning in a few minutes.
You should learn how to load the dataset and build an image classifier with the fastai library.
References
This content will be similar to the first lesson of the fastai course. We recommend watching the the lesson recording.
- Practical Deep Learning for Coders - Lesson 1: Image classification by fastai [video]
Goal
In this notebook we want to create an image classifier. Images are very common in many areas. Imagine you would have an image of the car available in the Kaggle competition. Such an image could contain a lot of information useful for determining the price of the car. Another example is a automated production line where you need to check that the parts are produced correctly and there are no defects.
Transfer learning for image classification
We will again use the fastai
library to build an image classifier with deep learning. The procedure will look very familiar, except that we don't need to fine-tune the classifier. The model we will use was pretrained on the ImageNet dataset, which contains over 14 million images and over 1'000 classes. Training a model on this dataset takes a lot of time. Fortunately, we can re-use the freely available models trained on this dataset to then train new classifiers on other datasets. Because the dataset is so diverse the model learns a lot about the general structure of images.
Reference: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
The code to train an image classifier should look familiar, since we used a very similar approach in Lesson 13 to train a language model for text classification. The main difference will be the learner
which will be a cnn_learner
.
First we need to make sure that the Colab notebook has all the right libraries installed. We use the r.txt file that contains all the necessary libraries. To upload it use the folder icon on the left in colab. After you run the following step make sure to restart the notebook with Runtime -> Restart Runtime
.
!pip install -r r.txt
Then we can import the ImageDownloader
and the fastai helper functions.
from fastai.vision import *
from fastai.widgets import *
import tarfile
First you need to upload your dataset by clicking on the folder icon on the left hand side and then clicking on the upload button. You should now see your dataset in the output of the next cell.
path = Path('')
path.ls()
Replace the cats_vs_dogs.tar.gz
filename with the name of your dataset. The next command uncompresses your datasetfolder into a folder named data
.
tarfile.open(path/'cats_vs_dogs.tar.gz','r:gz').extractall(path)
We apply a few tricks when we load the data:
- A trick often used in image classification is data augmentation. This means one creates more data by manipulating existing data. For images this means that the images can be rotated, flipped cropped etc. This generally improves the performance of the classifier and is setup with the
get_transforms
function. - We split the data 80/20 into train and validation data.
- We crop the images to a size of 224 pixels.
tfms = get_transforms()
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, valid_pct=0.2, size=224)
data.show_batch(rows=3, figsize=(8, 8))
print(data.classes)
Training a model involves the same steps we encountered in Lesson 13 with ULMFiT
:
- Load a pretrained model (e.g. resnet34, resnet50, etc)
- Find the optimal learning rate
- Fit the head of the network
- Unfreeze all layers and fine-tune
- Evaluate the results
Let's got through each step in turn.
We load a ResNet34 model, which is a convolutional neural network with 34 layers. There are also larger networks with up to 150 layers but this model usually takes the least effort to train a get good results. Furthermore, we also specifiy that we want to monitor the accuracy during training.
learn = cnn_learner(data, models.resnet34, metrics=accuracy)
Find the optimal learning rate
In this step, the goal is to find the best learning rate that a) avoids overshooting during stochastic gradient descent, and b) converges as a fast as possible. We are looking for the spot in the graph where the line has the steepest slope. This means that the model is improving the most with that learning rate.
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(cyc_len=3, max_lr=1e-2)
We save the progress, so we don't have to retrain the model from scratch if we want to go back a step.
learn.save('tmp_fit-head')
learn.load('tmp_fit-head');
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(2, 1e-5)
learn.save('tmp_fit-all')
learn.load('tmp_fit-all');
With the unfreezing and repeated fine-tuning we gained a slight boost and achieved 92.8% classification accuracy!
interp = ClassificationInterpretation.from_learner(learn)
We can have a look at the confusion matrix. It tells us which classes the model mixes up:
interp.plot_confusion_matrix(figsize=(6,6))
The confusion matrix looks pretty good. Next, let's look at the worst predictions:
interp.plot_top_losses(16, figsize=(20,10))
OK we clearly have some quirks in the dataset - let's see if we can clean it up and get a better accuracy.
ds, idxs = DatasetFormatter().from_toplosses(learn)
ImageCleaner(ds, idxs, path)
The results are stored in a csv file named cleaned.csv
.
path.ls()
We can load the cleaned dataset and repeat the training steps above with a new classifier:
data = ImageDataBunch.from_csv(path, csv_labels='cleaned.csv', ds_tfms=tfms, valid_pct=0.2, size=224)
data.show_batch(rows=3, figsize=(8, 8))
learn = cnn_learner(data, models.resnet34, metrics=accuracy)
learn.fit_one_cycle(3, max_lr=1e-2)
learn.save('tmp_fit-head-clean')
learn.load('tmp_fit-head-clean');
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(3, 1e-4)
learn.save('tmp_fit-all-clean')
learn.load('tmp_fit-all-clean');
Evidently, the cleaning has helped give us yet another small boost and achieve 98.7% accuracy! Also the confusion matrix looks quite acceptable now:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(6,6))
We can also look at the classes that were most frequently confused:
interp.most_confused()
Looking at the top losses we see again that there are some pictures in the wrong class or totally unrelated. In another cleaning phase we could further improve the performence by cleaning the dataset.
interp.plot_top_losses(16, figsize=(20,10))
In addition to the top losses on can also look at the minimum losses or in other words the images that were best classified:
interp.plot_top_losses(16, figsize=(20,10), largest=False)