An introduction to Deep Learning and its applications in computer vision.

Open In Colabslides

Learning objectives

In this lecture we will use the image dataset that we created in the last lecture to build an image classifier. We will again use transfer learning to build a accurate image classifier with deep learning in a few minutes.

You should learn how to load the dataset and build an image classifier with the fastai library.

References

This content will be similar to the first lesson of the fastai course. We recommend watching the the lesson recording.

  • Practical Deep Learning for Coders - Lesson 1: Image classification by fastai [video]

Homework

Use this notebook as a template to create an image classifier with your own dataset.

Goal

In this notebook we want to create an image classifier. Images are very common in many areas. Imagine you would have an image of the car available in the Kaggle competition. Such an image could contain a lot of information useful for determining the price of the car. Another example is a automated production line where you need to check that the parts are produced correctly and there are no defects.

Transfer learning for image classification

We will again use the fastai library to build an image classifier with deep learning. The procedure will look very familiar, except that we don't need to fine-tune the classifier. The model we will use was pretrained on the ImageNet dataset, which contains over 14 million images and over 1'000 classes. Training a model on this dataset takes a lot of time. Fortunately, we can re-use the freely available models trained on this dataset to then train new classifiers on other datasets. Because the dataset is so diverse the model learns a lot about the general structure of images.

Reference: http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

The code to train an image classifier should look familiar, since we used a very similar approach in Lesson 13 to train a language model for text classification. The main difference will be the learner which will be a cnn_learner.

Import

First we need to make sure that the Colab notebook has all the right libraries installed. We use the r.txt file that contains all the necessary libraries. To upload it use the folder icon on the left in colab. After you run the following step make sure to restart the notebook with Runtime -> Restart Runtime.

!pip install -r r.txt

Then we can import the ImageDownloader and the fastai helper functions.

from fastai.vision import *
from fastai.widgets import *
import tarfile

Load dataset

First you need to upload your dataset by clicking on the folder icon on the left hand side and then clicking on the upload button. You should now see your dataset in the output of the next cell.

path = Path('')
path.ls()
[PosixPath('.config'),
 PosixPath('cats_vs_dogs.tar.gz'),
 PosixPath('sample_data')]

Replace the cats_vs_dogs.tar.gz filename with the name of your dataset. The next command uncompresses your datasetfolder into a folder named data.

tarfile.open(path/'cats_vs_dogs.tar.gz','r:gz').extractall(path)

We apply a few tricks when we load the data:

  1. A trick often used in image classification is data augmentation. This means one creates more data by manipulating existing data. For images this means that the images can be rotated, flipped cropped etc. This generally improves the performance of the classifier and is setup with the get_transforms function.
  2. We split the data 80/20 into train and validation data.
  3. We crop the images to a size of 224 pixels.
tfms = get_transforms()
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, valid_pct=0.2, size=224)
data.show_batch(rows=3, figsize=(8, 8)) 
print(data.classes) 
['cat', 'dog']

Create learner

Training a model involves the same steps we encountered in Lesson 13 with ULMFiT:

  1. Load a pretrained model (e.g. resnet34, resnet50, etc)
  2. Find the optimal learning rate
  3. Fit the head of the network
  4. Unfreeze all layers and fine-tune
  5. Evaluate the results

Let's got through each step in turn.

We load a ResNet34 model, which is a convolutional neural network with 34 layers. There are also larger networks with up to 150 layers but this model usually takes the least effort to train a get good results. Furthermore, we also specifiy that we want to monitor the accuracy during training.

Load pretrained model

learn = cnn_learner(data, models.resnet34, metrics=accuracy)

Find the optimal learning rate

In this step, the goal is to find the best learning rate that a) avoids overshooting during stochastic gradient descent, and b) converges as a fast as possible. We are looking for the spot in the graph where the line has the steepest slope. This means that the model is improving the most with that learning rate.

learn.lr_find()
learn.recorder.plot()
75.00% [6/8 01:08<00:22]
epoch train_loss valid_loss accuracy time
0 1.220849 #na# 00:11
1 1.224134 #na# 00:11
2 1.127688 #na# 00:11
3 0.901484 #na# 00:11
4 0.680469 #na# 00:11
5 0.652987 #na# 00:11

76.92% [10/13 00:08<00:02 2.1412]
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

Train the head of the network

Resnet34 was trained on ImageNet which has 1,000 classes instead of our measly few. Thus we first need to train the head of the network to adapt it to our use case:

learn.fit_one_cycle(cyc_len=3, max_lr=1e-2)
epoch train_loss valid_loss accuracy time
0 0.373270 2.419651 0.766355 00:13
1 0.308227 0.890579 0.822430 00:12
2 0.230655 0.098282 0.943925 00:13

We save the progress, so we don't have to retrain the model from scratch if we want to go back a step.

learn.save('tmp_fit-head')
learn.load('tmp_fit-head');

Train all layers

Now that we've tuned the head of the network to our dataset, the next step is to unfreeze all the layers, and see if we can generate a more accurate model. The procedure is the same as in the previous step:

  1. Find the best learning rate.
  2. Train the network with this learning rate.
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
62.50% [5/8 01:00<00:36]
epoch train_loss valid_loss accuracy time
0 0.147244 #na# 00:12
1 0.125989 #na# 00:12
2 0.117433 #na# 00:12
3 0.149950 #na# 00:12
4 0.323171 #na# 00:12

38.46% [5/13 00:05<00:08 0.4345]
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(2, 1e-5)
epoch train_loss valid_loss accuracy time
0 0.135356 0.097291 0.953271 00:13
1 0.114248 0.095931 0.953271 00:13
2 0.117142 0.097505 0.953271 00:13
learn.save('tmp_fit-all')
learn.load('tmp_fit-all');

With the unfreezing and repeated fine-tuning we gained a slight boost and achieved 92.8% classification accuracy!

Evaluate the results

Let's see which classes our classifier is having the most identifying correctly:

interp = ClassificationInterpretation.from_learner(learn)

We can have a look at the confusion matrix. It tells us which classes the model mixes up:

interp.plot_confusion_matrix(figsize=(6,6))

The confusion matrix looks pretty good. Next, let's look at the worst predictions:

interp.plot_top_losses(16, figsize=(20,10))

OK we clearly have some quirks in the dataset - let's see if we can clean it up and get a better accuracy.

Data cleaning

fast.ai also comes with a nifty ImageCleaner that we can use to either remove images or correct the labels:

ds, idxs = DatasetFormatter().from_toplosses(learn)
ImageCleaner(ds, idxs, path)

The results are stored in a csv file named cleaned.csv.

path.ls()
[PosixPath('.config'),
 PosixPath('cleaned.csv'),
 PosixPath('cats_vs_dogs.tar.gz'),
 PosixPath('data'),
 PosixPath('models'),
 PosixPath('sample_data')]

We can load the cleaned dataset and repeat the training steps above with a new classifier:

data = ImageDataBunch.from_csv(path, csv_labels='cleaned.csv', ds_tfms=tfms, valid_pct=0.2, size=224)
data.show_batch(rows=3, figsize=(8, 8))

Load a pretrained model

learn = cnn_learner(data, models.resnet34, metrics=accuracy)

Fit the head

learn.fit_one_cycle(3, max_lr=1e-2)
epoch train_loss valid_loss accuracy time
0 0.491095 0.657214 0.879518 00:11
1 0.372705 0.769368 0.897590 00:10
2 0.309871 0.119769 0.963855 00:10
learn.save('tmp_fit-head-clean')
learn.load('tmp_fit-head-clean');

Unfreeze all layers

learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
50.00% [5/10 00:48<00:48]
epoch train_loss valid_loss accuracy time
0 0.116159 #na# 00:09
1 0.121592 #na# 00:09
2 0.133783 #na# 00:09
3 0.116949 #na# 00:09
4 0.112892 #na# 00:09

90.00% [9/10 00:09<00:01 0.2973]
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(3, 1e-4)
epoch train_loss valid_loss accuracy time
0 0.150293 0.077918 0.969880 00:11
1 0.118052 0.151490 0.981928 00:11
2 0.091031 0.094982 0.987952 00:10
learn.save('tmp_fit-all-clean')
learn.load('tmp_fit-all-clean');

Evidently, the cleaning has helped give us yet another small boost and achieve 98.7% accuracy! Also the confusion matrix looks quite acceptable now:

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(6,6))

We can also look at the classes that were most frequently confused:

interp.most_confused()
[('cat', 'dog', 2)]

Looking at the top losses we see again that there are some pictures in the wrong class or totally unrelated. In another cleaning phase we could further improve the performence by cleaning the dataset.

interp.plot_top_losses(16, figsize=(20,10))

In addition to the top losses on can also look at the minimum losses or in other words the images that were best classified:

interp.plot_top_losses(16, figsize=(20,10), largest=False)