An Image Classification Model With 5 Lines Of Code

Priyansi
GDSC KIIT

--

First things first, the title isn’t clickbait. I’ll demonstrate step-by-step how you can build your own image classification model using a custom dataset and just 5 lines of logic. To make things interesting, my model is going to classify minions.

The basic layout of this tutorial is going to be -

  1. Create a dataset.
  2. Upload it to our notebook.
  3. Train our model.
  4. Interpret the results.

Before that let’s take a few steps back and know what we’re working with here. Image classification is the process of taking an input (picture) and outputting a class (like ‘dog’). This is done with the help of Convolutional Neural Networks (CNN’s) which comes under Deep Learning Neural Networks.

From analyticsindiamag

Well, enough theory, let’s get to writing some code!

Open up Google Colab from here. Click on New Notebook. Change the Runtime from None to GPU (increases speed for computations). And voila! Lo and behold your free notebook, with free GPU and free memory. Could this be any better?

There are a few things in life that I swear by and fastai is one of them. fastai sits on top of PyTorch which is a popular Python library and generally faster when compared to TensorFlow.

Its course-v3 (which currently I’m enrolled in) is one of the best courses you’ll come across for deep learning. It uses a code-first, theory-later approach and you’ll be able to build models, just like this, and enter competitions without having to do tons of maths homework.

First, we need a dataset, meaning some pictures that we can classify. You can either go with my idea of classifying minions (specifically Bob, Kevin, and Stuart) or get a little creative. Just pick a topic. Ready?

Let’s make a custom dataset so that we can make sure our pictures are just right. For this, we’ll have to go through 4 steps -

  1. Enable the Fatkun Batch Download Extension here (for Chrome) so that we don’t have to manually save the images.

2. Go to Google Images and type in the first topic. For example — I’ll type in ‘minion bob’. Once around 50 images are loaded, click on the Fatkun extension and select Download [Current Tab]. You can choose any number of images but for me, 50 images works just as good, plus it’s easier to handle.

3. You’ll get a tab like the picture above. Now you can unselect the invalid pictures like the icons or the incorrect ones. It’s where your expertise in classifying minions comes handy :p Then click download. Repeat this process for all the topics you’ve chosen and at of this you’ll have all your topics in folders like this.

Hmm. ‘MINIANS’. Fight me.

4. Now, create a new folder, give it a boring name. Mine’s called Minions Dataset and then store all these image folders there. Change the name of the folders too. Something easy like — Bob, Kevin and Stuart.

5. We’re ready to upload this to GitHub so that you can access them in your Colab notebook. You can also upload this to Drive or directly to your notebook but that takes up a lot of data. If you want to get started with Git and GitHub, check this out. Otherwise, feel free to use my dataset :)

Enough talk. Now it’s time to get our hands dirty. We can set up fastai in our Colab notebook via writing this command in a cell at the top of our notebook, like this, then run -

#setting up fastai
!curl -s https://course.fast.ai/setup/colab | bash

Remember, you need to press Shift+Enter to run a cell. And just as a preventive measure, don’t forget to constantly keep saving your notebook.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

% is not Python code, rather called ‘magics’. It does two things — it reloads automatically if someone’s changing underlying library code while you’re running this and if you want to plot something, it plots it here using matplotlib.

from fastai.vision import *
from fastai.metrics import accuracy

These two lines load the fastai library. You can read the docs here or just remember that fastai.vision imports all the tools necessary for image classification which comes under Computer Vision.

Now go to the repository where you’ve uploaded your dataset. Click on Clone then copy the link and paste it after ! git clone

! git clone https://github.com/Priyansi/minionsDataset.git

What this does is that it downloads your entire dataset so that we can access it easily from this notebook.

PATH = '/content/minionsDataset'

We need to set up a constant path that will lead to the multiple classes (image folders for different topics) and this will be different for all of you. To find out the path, navigate to Files where your dataset is saved. Right-click on the file/folder you want to find the path of, then select Copy Path.

After doing this much, our notebook should look something like this -

Phew! Untitled0 looks great. Better give it a name.
np.random.seed(24)
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(PATH, valid_pct=0.2, ds_tfms=tfms, size=299, bs=16).normalize(imagenet_stats)

Here comes the first line of logic. To build a model we need two things - images and classes, i.e, the labels or what those images are. Now we already have all the classes separated out in different folders, which means one folder will contain all the images of a specific label. So our folder names become the label names.

In fastai, grabbing all the images and labeling them is made very easy using ImageDataBunch. And there are various ways you can read the data. Right now, all our images are in folders so we use from_folder to extract it. Now we pass various parameters to this function like -

1. path — The path where the image folders are. In this case, its value is stored in PATH.

2. valid_pct — Let’s talk about validation datasets.

  • Now you must have heard about train and test datasets. The train dataset contains the values on which the model trains (learns) and the test set is used to check whether what we learned and predicted was right or not at the end of the training. The test set contains totally new data that our model hasn’t seen before.
  • The validation set comes in between. It is also a set of values that are held back from the model and is used to tune the hyperparameters (learning rate, epochs, etc) to give more accurate results so that we can finally test on our test set. Now, the question arises, we have a train set but not a validation set. How do we create one? The answer, let fastai handle this.
  • ImageDataBunch creates a DataBunch object that contains various sets like train, validation and an optional test. We can just split our training set into train and valid by passing a ratio to valid_pct. Generally, that ratio is 0.2 which means 20% of the training set is set aside for validation.

3. dfms — To transform the images so that our model trains better. The get_transforms method. do_flip is set to false because we don’t want our minions upside down. It’s nauseating.

4. size — So that all the images are of the same shape and size. This will create a 299 by 299 square image. Why 299? Well, it’s what I generally use and found the results to be pretty good. You are free to experiment though.

5. bs — It stands for batch size. It sets the number of images processed at a time. This depends upon the memory you have. You can set this to a higher value if you have sufficient memory.

6. normalize() — In nearly all machine learning tasks, you have to make all of your data about the same ‘size’ — they are specifically about the same mean and standard deviation so you need to use normalize(). imagenet_stats applies the normalization technique used by the famous ImageNet dataset.

Last but not least, np.random.seed() gets you the same random numbers each time you run the program. You can pass any value to this. Moving on, let’s see what data has in store for us.

data.show_batch(rows=3, figsize=(5, 5))
Kewt

This will show you some of the contents of your DataBunch object. You can clearly see that they have been zoomed, cropped and labeled. Perfecto.

Let the training commence!

learn = cnn_learner(data, models.resnet101, metrics=accuracy)

In fastai, we use a learner for something that can learn to fit a model. We will be using a cnn_learner for the reasons stated above. And to this, we pass our data, a model and the metric, which I’ve chosen as accuracy. You can also choose error_rate.

Now for the model, all you need to know that there’s a particular kind of model called ResNet which works extremely well nearly all the time. Right now all you need to choose is the size. You can go with ResNet34, RestNet50 or ResNet101 like I’ve chosen here. This is a pretty huge architecture so you might wanna start small with RestNet34 since it works faster and then you can increment if you are not getting the desired output.

You’ll notice that as soon as you run the above line, it downloads something. Well, those are the pre-trained weights, that is, our model has already been trained to do something and it’s better than starting with nothing. And the thing it has been trained to do is recognize thousands of categories in ImageNet. What up comeback!

What we are implementing here is transfer learning. We will take a pre-trained model, and then we fit it so that instead of predicting a thousand categories of ImageNet with ImageNet data, it predicts the 3 categories of minions using our minions' data. Two logic down, three more to go.

learn.fit_one_cycle(4)

The best way to fit models is to use something called one cycle. It’s accurate and faster. The number 4 basically decides how many times do we show the dataset to the model so that it can learn from it. Each time it sees a picture, it’s going to get a little bit better. But it’s going to take time and it means it could overfit. If it sees the same picture too many times, it will just learn to recognize that picture, not minions in general.

If the accuracy of your training set is pretty darn good, it means you’re overfitting. This can be avoided by using the validation set. We will print out the accuracy of the model on the validation set alongside so that you’ll recognize when the model is overfitting. Now, this is our result -

Too lazy to include a gist

So 67%? Eh, not bad. This means we are in the right direction but it’s not good enough. We can do better folks. Notice how the accuracy is constant for the last 3 values?

It’s time to talk about the learning rate. It is the thing that figures out what is the fastest you can train this neural network without making it zip off the rails and crash. It basically says how quickly you’re updating your parameters.

learn.unfreeze()

Before we find a suitable learning rate, we’ll unfreeze() the model. A CNN has several layers for a whole lot of computations. What we did previously was just add a few extra last layers and trained only those. Now we want to train the whole model, therefore, we’ll use unfreeze().

learn.lr_find()
learn.recorder.plot()

To find a suitable learning rate, we run the lr_find(). And then we plot the losses against a range of learning rates like this -

Calm down. We’re not gonna have a Maths class now.

Choosing an appropriate learning rate is mostly intuition but this graph will help you narrow it down. We are basically looking for the steepest slope where the model is actually learning right before the losses skyrocket, which in this case will be between the red lines.

The keyword slice in Python can take a start value and a stop value and it trains the very first layers at a learning rate of 3e-5, and the very last layers at a rate of 3e-4, and distribute all the other layers across that. So that we can choose a range rather than just one value.

learn.fit_one_cycle(4, max_lr=slice(3e-5, 3e-4))
Ahh! Finally. A+

After this, we save the results -

learn.save('stage-1')

And, with this, you’ve completed your training. Hol’ up! There’s still a final important thing left to do, that is, to interpret the results. Let’s see where our model betrayed us by using class interpretation and then plotting the confusion matrix.

interpret = ClassificationInterpretation.from_learner(learn)
interpret.plot_confusion_matrix()
Is someone still reading the captions?

This is pretty self-explanatory. Our model’s right predictions are along the diagonal and the wrong ones are scattered throughout. But this dull. Let’s look at something more visually appealing.

interpret.plot_top_losses(3, figsize=(5,5))
Did you know that Walmart has a lower acceptance rate than Harvard?

Take the first picture for example — Our model classified that as Kevin when it was actually Bob. All this can be resolved by cleaning up the dataset a bit. But for now, you’re good to go.

Congratulations for bearing with me for so long. If you’re one of the lucky people that stuck around, boy, do I have a surprise for you?

Let’s test our model now. It’s an extra step just to make sure we’re doing things right. Let’s start by uploading an image in the File section. Then copy its path and paste it in IMAGEPATH. I found a unique image by taking a screenshot of a YouTube video.

IMAGEPATH='/content/Screenshot (143).png'image = open_image(IMAGEPATH)
image.show()

Now open the image to make sure it’s the right one. Then, we predict -

learn.predict(image)

If everything went alright, you should see something like this -

*happy tears*

You can find the full code here.

Now, go out into the world and then explore the beauty of deep learning. The above knowledge is enough to create a pretty accurate model. If you’re still hung up on something, ping me on Twitter. I promise not to leave you on seenzone :p

P.S. If you stick around for the next part, I’m gonna show you how to deploy this trained model without any coding knowledge.

--

--

Priyansi
GDSC KIIT

Loves programming languages as well as natural ones