Implementation of HOG for Human Detection



Contents







Introduction

The Problem

As described in
the paper, this project tries to solve the problem of human detection, even in cluttered backgrounds under difficult illumination.
The ability to detect and tell whether and where a human exists in an image, could be helpful for many kind of entities (Image search engines, security companies and more...).

This project is an implementation of the algorithm described in the paper, including another application which evaluates the solution in the wild.

Related Paper

Our work related to HOG feature extraction is fully based on the paper:
Histograms of Oriented Gradients for Human Detection, Navneet Dalal and Bill Triggs, CVPR 2005

Our Work

Our aim in this work, was:
  1. To implement our own HOG feature extractor (according to the algorithm description in the paper)
  2. To train a classifier on the dataset which was prepared and used by the authors of the paper, and to get high accuracy on the corresponding testing set
  3. To create a program which evaluates the classifier in the wild, in a way that it scans images provided by the user and uses our trained classifier to detect humans within those provided images


Used Datasets and Technologies

The Used Dataset

We used the
INRIA Person Dataset, which was prepared by the authors of the original paper we base on


About the dataset:


Some samples from the dataset:


96x160H96/Train/pos:
70x134H96/Test/pos:
Train/neg:
Test/neg:

The Used HOG Feature Vector Extractor

The HOG feature vector extractor is implemented in Matlab, in the function computeHOG126x63(). Its implementation is found in the file computeHOG126x63.m

The Used Classifier and Related Libraries

Here is a scheme of the structure of our neural network:


The Workflow

The following diagram explains our workflow in a very high level:



Note: CSV stands for "Comma Separated Values"
For example, the below sample content of a CSV file defines the 3 vectors [1 2 3] , [1.23 0.54 -6], [-1 -1 -1]

1,2,3
1.23,0.54,-6
-1,-1,-1

Thus, we can compute the HOG vectors of many images and then export them all at once into a CSV file.
This CSV file can be parsed an be used by other applications, such as our C application for training and activating neural networks

Dataset Preparation

As explained in the
Used Dataset part, not all of the dataset images are ready to use:

The most simple case, is with the positive samples (both for training and for testing):
The more challenging case, is the preparation of the negative samples:
Here are some samples of the negative patches we randomly cropped from the large images:
Training negative samples:
Testing negative samples:


Finally, the dataset we are going to use is formed by:
2416 positive training samples: Taken without preprocessing from 90x160H96/Train/pos
24360 negative training set Prepared by cropping patches from 1218 images from Train/neg
1132 positive testing samples Taken without preprocessing from 70x134H96/Test/pos
9060 negative testing samples Prepared by cropping patches from 453 images from Test/neg

We copy these photos we prepared into 4 directories, so we can easily access them later:
Note: The cropping of the negative images into samples is performed by our implemented Matlab function cutRandomImages(). To automatically cut and save 20 patches per image from source images directory SRC into destination images directory DST, where each patch is sized 63x126 pixels, one should call it as cutRandomImages(SRC, DST, 20, 63, 126)
This function is found in the file cutRandomImages.m

Extracting HOG Feature Vectors from the Dataset

After the
Data Preparation step, we have 4 directories of images, which we want to turn into HOG vectors.
We use our Matlab function createDataset() in order to turn the images into HOG vectors and to save them in corresponding 4 CSV files.

Our function createDataset() gets 2 parameters: We run it 4 times, each time for a different directory.
For example, to create a CSV file of the positive training samples, we'll call the function with the path to the train/pos directory and with a path to an output file, which we'll name something like train_pos.csv.

In general, the flow of the function createDataset() is:
Below is a detailed explanation of the HOG extraction flow.

Given a 63x126 pixels image, our HOG feature extractor works according to the following flow:


The following graphical slideshow might help to understand it better:



Notes:

Training and Evaluating the Classifier

The training and classification processes are done in a C application.

The training process:

After the
Extracting HOG Feature Vectors from the Dataset step, we have 4 CSV files: Note that each line in each CSV file represents a HOG vector sized 6318

We are going to train the network, in a way that we'll want each activation of a positive sample on the network to output 1.0, and each activation of a negative sample on the network to output (-1.0).
As the negative training set is more than 10 times larger than the positive training set, we will each time train the network on a positive sample, and then on 10 negative samples, until we finish with all the positive samples.
It means that we actually give up the last 200 negative training samples, as we are going to use only 24160 of them (which is 10 time the size of the positive training set).
It also means that finally each training epoch will consist of 2416 + 24160 = 26576 training iterations.

In order to accelerate the training, we use the FeedForwardNetwork API function FeedForwardNetwork_train_fast() instead of the standard function FeedForwardNetwork_train().
The function FeedForwardNetwork_train_fast() accelerates the training in a way that it lets the user define a callback function, which tells whether the output is satisfying. If the answer is positive, it skips the backpropagation and weight updating over the network, and thus it saves a lot of computation time.

We define this callback function to return positive answer if the sample is positive and the network output is greater than 0.5, or if the sample is negative ant the network output is less than 0.5. Other cases yield negative answer, which will cause the API to update the weights of the network using Backpropagation algorithm.

The flow of the training process is the following: In the end of this training process, we have a trained network file, which we can later use to create an instance of the network, that will be from now on used only as a classifier


The evaluating process:

This step is much simpler than the training process, as now we already have a trained network.

We get from the user a path to a CSV input file and a path to a trained network file.

The flow of the evaluating process is the following: Finally, the output of the evaluating application will contain N lines, where N is the number of vectors in the input CSV file.
Each output line contains one of the numbers {0, 1}, such that 1 is a positive answer and 0 is a negative answer, where of course, the result in the i'th output line is corresponding to the sample represented by the HOG vector at the i'th line in the provided CSV file

Using the Classifier in the Wild

After training and testing the classifier using
the dataset, we also wanted to check how effective it is when its desired to detect humans within some random image.

Note that the negative training and testing set we used, contain patches from a lot of human-free various images.
We assume that the amount of non-human-objects, which their HOG looks similar to a human's one, is pretty small in these random images.

Though, some objects with strong vertical components like trees, poles and human limbs could produce HOG descriptors that are very similar to humans' ones and that are too hard to discriminate from real humans.
The following work we've found supports this claim: Pedestrian Detection project (Computer Vision course, Computer Science Department in Stanford University)

Basing on this assumption, we do not expect to see the same accuracy on each random image that is going to be checked. Some images could give better results, while others could give worse, depending on the contents of the image.
Of course, we expect the accuracy rates to tend to the results we reported on the testing set, when measuring and summarizing them on a large enough amount of scanned random images.

Our detector works in the following way:

Here are some results we got:

A perfect performance. 100% accuracy


An excellent performance. Only 6 false-alarms out of thousands of checked windows. Less than 1% of mistake


In a crowded street, the results are more noisy and harder to read.
However, all the humans (sized at least 63x126) were detected and the rate of false-alarms is less than 2% of the thousands of checked windows


In another less crowded street, most pedestrians were detected, and again we can spot a few false-alarms


Results

As mentioned in the dataset preparation section, we had 1132 positive samples and 9060 negative samples in the testing set.

We start with showing a ROC curve that describes the results:


A high-scope ROC curve graph



A zoom on the ROC curve interesting area (The area in the blue circle above)



Below is also a confusion matrix, calculated with threshold 0.66, which is the one that yields the highest accuracy with the testing set.
The performances we got are given in the following confusion matrix:

Predicted Class
Human Non-Human
Actual
Class
Human TP
1054
(93.11%)
FN
78
(6.89%)
Non-Human FP
65
(0.72%)
TN
8995
(99.28%)




Using these values, we can compute precision, recall and other scores:

Precision: 99.23% [TP / (TP + FP)] = [0.9311 / (0.9311 + 0.0072)]
Recall: 93.11% [TP / TP + FN] = [0.9311 / (0.9311 + 0.0689)]
Accuracy: 98.6% [(TP + TN) / (P + N)] = [(1054 + 8995) / (1132 + 9060)]
F1 Score: 96.07% [2TP / (2TP + FP + FN)] = [(2 * 0.9311) / (2 * 0.9311 + 0.0072 + 0.0689)]
Magnitude: 96.22% [sqrt(precision^2 + recall^2)] / sqrt(2)] = [sqrt(0.9923^2 + 0.9311^2) / sqrt(2)]

Note: the magnitude is divided by sqrt(2) to get a result between 0 and 1


Related Files

All the files we wrote and used, including our trained network, are available here.
Note that we could not include here the processed negative dataset, as its size is too big.

Note that the C code was tested only with GCC compiler in Unix. Though, with small changes, it should work also on MSVC.

Relevant files:


External Links and Resources

  1. The paper: Histograms of Oriented Gradients for Human Detection, Navneet Dalal and Bill Triggs, CVPR 2005
  2. The dataset: INRIA Person Dataset
  3. A C library for parsing CSV files: C/C++ CSV Parser
  4. A C library for working with neural networks: C/C++ Neural Networks


Credits

This project was prepared by Tal Hakim and David Cohn, students for M.Sc. degree in Computer Science in University of Haifa, Israel, as an assignment in the course Recognition and Classification in Images and Videos, taught by Dr. Margarita Osadchy, 2015.
The project is based on the paper:
Histograms of Oriented Gradients for Human Detection, Navneet Dalal and Bill Triggs, CVPR 2005







Visitor counter:
html code for visitor counter







Visitor comments: