Using Neural Networks to predict dogs breeds

Eduardo Ribeiro Vargas Duarte
5 min readMay 14, 2021

This medium post is related to the Capstone Project from the Udacity Data Scientist Nanodegree known as Dog Identification App.

Project Overview

For this project, it was proposed to create a Neural Network model to classify images of dogs according to their breed. This is one of the most popular Udacity projects across machine learning and artificial intellegence nanodegree programs.

A picture with a lot of dogs from different breeds

Two datasets containing both dogs and humans images were provided by Udacity team to aid in the model training proccess.

Problem Statement

The problem aimed to be resolved with this project is the creation of an algorithm that have the ability to predict a certain dog breed based in an input image provided.

It should be able to inform:

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.

Metrics

The main metric used to evaluate how well the algorithm is performing is the accuracy. The accuracy could be described, in this case, as the percentage of correctness of the model in relation dogs breeds prediction.

Acuracy formula

Data Exploration and Visualization

Exploring the available datasets, it was possible to analyse the number of available images for the dogs and human datasets.

There were a total of 133 dog categories and 8351 dogs images. 80% of these images of where used for training, 10% for validation and 10% for test.

Data visualization with the train,test and validation division

In the human dataset, there was a total of 13233 images.

Data Preprocessing

In the first part of this project, it was used a OpenCV implementation to detect human faces in a certain image. OpenCV provides many pre-trained face detectors, stored as XML files on github. One of these detectors was downloaded and stored it in the haarcascades directory.

Detecting faces in images in OpenCV

In the first dataset with a portion of the human images, the function detect 100% of human faces, while in the dogs dataset, 11% of the images had a human face detected.

Next, a pre-trained CNN model known as Resnet-50 was used to find dogs in the same datasets quoted above. It detected dogs in 100% of the dogs dataset and 0% in the human dataset.

Implementation

A CNN was created from the Scratch, implementing all the convolution and maxpooling layers to try to predict the dogs breeds. A image of this CNN archicteture is provided below:

CNN to classify dog breed

This first implementation did not have a reasonable performance, obtaining just 2.27 % of accuracy in the model test.

Refinement

Next,a CNN was trained using a technique known as transfer learning. Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.

It is a popular approach in deep learning where pre-trained models are used as the starting point as computer vision and image classification tasks, given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.

Example of Transfer Learning

Using a pre-trained model known as VGG16, it was possible to improve the accuracy in dog breed prediction to 41,86 %. A great improvement compared to the 2.27 % obtained before.

Lastly, a even more complex pre-trained model was used to improve the acurracy results. A ResNet-50 bottleneck features was used and implemented to achieve this goal. Using this pre-trained model, it was possible to increase the test accuracy to 82,89%.

Results:

The table below summarizes the results of each algorithm used in this project:

Table with the accuracy results

The final algorithm results were better than expected. It just didn’t recognize one of the dog/human images presented ( 7 hits, 1 mistake).

It predicted well most of the images: In a picture with a human and a dog, it recognize first the dog because it is the first condition that is evaluated by the algorithm. Maybe as an improvement, it could be implemented a feature that prints the dog breed and the human resembling dog breed when both are presented in the image.

Detecting a dog in a picture with both dog and human

When a man dressed as dog was presented to the algorithm, it predicted well that it is a man. When a cat was presented, it showed right that no human or dog was in the picture. It also predicted well the dog breeds.

Detecting a man dressed as dog
Detecting correctly the dog breed

For the image with the dog a little far away, not very well framed, the algorithm could not identify it.

In this case, the algorithm didn’t identified correctly

Also the resembling dog breed feature for humans can be improved. The insertion of a higher variety of images with different properties could help in this optimization as well.

Conclusion

With this project, it was possible to create a CNN model that predicts well dogs breeds when a input image is presented. First, it was created a CNN from Stratch without transfer learning. This model did not perform well, having just 2,27% of accuracy.

Next, aVGG16 pre-trained model was used and it achieved an accuracy of 41,86%.The final model using ResNet-50 bottleneck features had more than 80% of accuracy, showing that this a good model to predict dogs breeds.

A possible improvement for this model could be achieved with the insertion of images with animals/humans more distant in the picture in the training set, not very well framed. The image used for trianing seems to be very well framed and when a noise image is inserted, the model did not perform very well.

--

--