Viola-Jones Face Detection

The provided CUDA code is for CUDA version 5, which is obsolete. If you have use newer versions of CUDA, please start with the C++ version.

The Viola-Jones Object Detection Framework is often used for fast face detection. In this assignment, you are asked to optimize the Viola-Jones face detection algorithm on GPUs. We provide two versions of source code, one in pure C++, and the other one containing empty CUDA functions, as described below.
Special thanks to Francesco Comaschi, the author of this Viola-Jones code, for his great effort to make this assignment possible.

Viola-Jones in pure C++: "vj_cpp.zip". The codes have been tested under Linux and Mac OS X using g++ compiler. If you want to use OpenCL for the assignment, you can start with this version. If you are using CUDA and know how to setup compilation tool-flow, you can also start with this version.
Viola-Jones with empty CUDA functions: "vj_cuda5.zip". The codes have been tested on the server (Linux and CUDA 5). You can read below for detail instructions to compile the code.

In the remaining parts of this page, we will look into the implementation of Viola-Jones algorithm and the explanation of the provided codes.

Implementation of Viola-Jones Face Detection
Explanation of the Provided Codes
1. The Pure C++ Version
2. The CUDA Version

Implementation of Viola-Jones Face Detection

The Viola-Jones Object Detection Framework is a generic framework for object detection, which is particularly successful for face detection. In this assignment, we provide a simplified version of Viola-Jones face detection algorithm, implemented by our colleague Francesco Comaschi. The simplified implementation does not include the training part of the framework. The cascade classifier in the simplified implementation uses pre-trained parameters for the cascade classifier. Although the provided code is not meant to be an optimal implementation, yet it provides reasonable detection rate for a wide range of input images. The image below is an example that is included in the package.

(Source: Solvay Conference 1927.)

The provided implementation consists of several major steps, as shown in the pseudo code below. We will explain the implementation step-by-step.


for number of scales in image pyramid do
    downsample image by one scale
    compute integral image for current scale
    for each shift step of the sliding detection window do
        for each stage in the cascade classifier do
            for each filter in the stage do
                filter the detection window
            end
            accumulate filter outputs within this stage
            if accumulation fails to pass per-stage threshold do
                break the for loop and reject this window as a face
            end
        end
        if this detection window passes all per-stage thresholds do
            accept this window as a face
        else
            reject this window as a face
        end
    end
end

Image Pyramid

Image pyramid is a multi-scale representation of an image (see an example below), such that the face detection can be scale-invariant, i.e., detecting large and small faces using the same detection window. Alternatively, we can also scale the filter window, which is more cumbersome in this case. Therefore, as many other vision algorithms, an image pyramid is built for the Viola-Jones face detection.

In the provided code, the image pyramid is performed by downsampling the image using neighboring pixels. You can look into the function nearestNeighbor in haar.cu/haar.cpp file.

Integral Image

Integral image (or summed area table) is a way to sum up the pixel values within a rectangular region (e.g., a region defined by point A, B, C, D in the figure below), which becomes very efficient if we need to sum up the pixels within many regions of interest within an image. For an image of P pixels and N regions of interest each covers W pixels, the naive algorithm has a complexity of (NxW), while the integral image based approach has a complexity of (P+4N). In the case of face detection, a sliding window (explained in the next step) shifts around the image, which needs to sum up pixels for each shifted window. Therefore, N is approximately equal to P. The integral image approach reduces the complexity from (PxW) to (P+4P), which is two orders of magnitude reduction for a sliding window of size 10x10!

(Source: http://en.wikipedia.org/wiki/Summed_area_table)

Sliding Window

A detection window shifts around the whole image at each scale to detect the face, as shown in the figure below. In the provided implementation, the sliding window shifts pixel-by-pixel. Each time the window shifts, the image region within the window will go through the cascade classifier, which will be explained next.

(Source of the Haar filter window: http://en.wikipedia.org/wiki/Viola-Jones_object_detection_framework)

Cascade Classifier

A cascade classifier consists of multiple stages of filters, as shown in the figure below. Each time the sliding window shifts, the new region within the sliding window will go through the cascade classifier stage-by-stage. If the input region fails to pass the threshold of a stage, the cascade classifier will immediately reject the region as a face. If a region pass all stages successfully, it will be classified as a candidate of face, which may be refined by further processing.

For the pre-trained cascade classifier, there are 25 stage, each containing multiple Haar filters, ranging from 9 to 211, as shown in the block below (stored in "info.txt"). Within a stage, the image region will go through these filters in parallel. The output of these filters will be summed up and compared against a per-stage threshold, based on which the decision of the detection is made.

The cascade filter can reduce the computation workload by rejecting a region at early stages, but on the other hand induces dependencies between stages. It is possible to break the dependency between stages by delaying the rejection until the last stage, but that may increase the computational workload if most regions will otherwise be rejected at the early stages. There is a tradeoff between parallelism and computational workload, which depends on the input image and the computer architecture. For the provided pre-trained cascade classifier, one stage may not have enough filters to keep all the processing elements of a GPU busy. Therefore, a potential optimization opportunity is to perform multiple stages in parallel at the cost of increasing unnecessary computation.

Haar Filter

Each stage of the cascade classifier consists of multiple filters to detect the Haar-like features. Some examples of Haar filter windows are shown in the figure below. In the provided implementation, at most three sub-regions are used as Haar features. For example, type D in the figure below, containing four sub-regions, is not used in the simplified implementation.

(Source: http://en.wikipedia.org/wiki/Viola-Jones_object_detection_framework)

The parameters of the Haar filters are stored in the file "class.txt". The file format is explained in the function readTextClassifier in the file haar.cu/haar.cpp. Below is a quick explanation of the file format.


  /******************************************
   * Read the filter parameters in class.txt
   *
   * Each stage of the cascade filter has:
   * 18 parameter per filter * filters per stage
   * + 1 threshold per stage
   *
   * For example, in 5kk73, 
   * the first stage has 9 filters,
   * the first stage is specified using
   * 18 * 9 + 1 = 163 parameters
   * They are line 1 to 163 of class.txt
   *
   * The 18 parameters for each filter are:
   * 1 to 4: coordinates of rectangle 1
   * 5: weight of rectangle 1
   * 6 to 9: coordinates of rectangle 2
   * 10: weight of rectangle 2
   * 11 to 14: coordinates of rectangle 3
   * 15: weight of rectangle 3
   * 16: threshold of the filter
   * 17: alpha 1 of the filter
   * 18: alpha 2 of the filter
   ******************************************/

Explanation of the Provided Codes

Two versions of codes are provided, as described at the top of this page. The two versions have similar code structures, except that the CUDA version contains an empty CUDA function file "gpu_functions.cuh" and renames the haar.cpp to haar.cu. The code structures are explain below.

The Pure C++ Version

The files in the packages are explained below. The most important file is the "haar.cpp", which contains all the important steps of the Viola-Jones algorithm.


main.cpp         :    The entrance of the code
haar.cpp         :    It contains the major steps of the Viola-Jones
haar.h           :    The header file of the Viola-Jones algorithm
Makefile         :    The Makefile to compile the algorithm
rectangles.cpp   :    Merging overlapping face windows, drawing rectangles
image.c          :    Assisting functions for image manipulation
image.h          :    The header file for image manipulation
stdio-wrapper.c  :    The assisting function for image input output
stdio-wrapper.h  :    The header file of stdio-wrapper.c
info.txt         :    The pre-trained cascade filter parameters
class.txt        :    The pre-trained Haar filter parameters
Face.pgm         :    The test input image
LICENSE          :    The License of the code

To compile the code under Linux, simply type "make", and then run the executable "vj" (see the commands below).


unzip vj_cpp.zip
cd vj_cpp
make
./vj

The output image is called "Output.pgm". On the server, you can use the "eog" image viewer to check the output (see the command below), assuming that you have enabled the X tunneling via ssh. If it is too slow to use the image viewer from the server, you can copy the image to your computer and view it locally with some free tools, e.g., GIMP.


eog ./Output.pgm

The CUDA Version

The files in the packages are explained below. The most important file is the "haar.cu", which contains all the important steps of the Viola-Jones algorithm. The "gpu_functions.cuh" contains empty CUDA functions, which are place holders for the CUDA compiler.


main.cpp         :    The entrance of the code
haar.cu          :    It contains the major steps of the Viola-Jones
gpu_functions.cuh:    It contains empty CUDA functions
haar.h           :    The header file of the Viola-Jones algorithm
Makefile         :    The Makefile to compile the algorithm
rectangles.cpp   :    Merging overlapping face windows, drawing rectangles
image.c          :    Assisting functions for image manipulation
image.h          :    The header file for image manipulation
stdio-wrapper.c  :    The assisting function for image input output
stdio-wrapper.h  :    The header file of stdio-wrapper.c
info.txt         :    The pre-trained cascade filter parameters
class.txt        :    The pre-trained Haar filter parameters
Face.pgm         :    The test input image
LICENSE          :    The License of the code

To compile the code on the server, simply type "make", and then run the executable "vj" (see the commands below).


cd ~/cuda-5.0/samples/0_Simple
unzip vj_cuda5.zip
cd vj_cuda5
make
./vj


eog ./Output.pgm

5KK73 GPU Assignment 2012