The assignment is to implement a scale-invariant feature transform (SIFT) application that uses GPU acceleration for performance. Details of the SIFT algorithm can be found in this page. A reference implementation in C is available. You can download a set of test images here.

Base on existing implementations, you should be able to achieve reasonable speed-up by using GPU. What you need to do is analyze the kernels (in src/ of the reference implementation) and port them to GPU when you think it is appropriate.


Hand in your assignment by sending the following things to d.she _at_ and _at_

Your grade depends on three things: the result of your code, the quality of your report and the oral exam.

Functional Requirements

The correctness of your GPU code should be verified. Your GPU code should generate the same result as the CPU code. Some handy programs such as "diff", pre-installed in Linux and freely available for Windows, can do the check. We also provide a match.c, from which you can build a keypoint checking tool. In principle the error should be well below 1%.

You DON'T have to copy the result of a GPU kernel back to the system memory unless you need to use it on CPU. So you only need to make sure the final result is available in the system memory. And keep in mind that in most cases the copying between host and device is costly!

Report Requirements

Your report that includes the following things:

  1. The structure of your implementation: thread organization, memory mapping, etc. Please explain the reasoning behind your choices clearly, and use sufficient data to support them.
  2. For each optimization method you have performed, benchmark the effect of such method, in a way similar to this figure in the matrixMul example. Are these optimization methods as effective as you expected? Why or why not?
  3. Benchmark results of all the test images (you can try more image sizes if you want)
    • Basic requirement: the time used to process the image. At least two cases: CPU-only and GPU-accelerated.
    • You should benchmark the complete operation, except the file I/O. In particular, the overhead of data transfer between host and device should be included, or measured separately.
    • Try to explain the results.
    • If some images cannot run on GPU, try to explain the reason.
    • Remark: you should avoid the "warm-up" overhead, see the known issues page for more information.
  4. If it is a group report, the contribution of each member should be clearly stated.

The deadline for this assignment is Dec 18, 2011.