Assignment
The assignment is to implement a scale-invariant feature transform (SIFT) application that uses GPU acceleration for performance. Details of the SIFT algorithm can be found in this page. A reference implementation in C is available. You can download a set of test images here.
Base on existing implementations, you should be able to achieve reasonable speed-up by using GPU. What you need to do is analyze the kernels (in src/ of the reference implementation) and port them to GPU when you think it is appropriate.
- If you are doing the assignment individually, it is mandatory to finish the first three kernels: scale-space construction, difference of Gaussian and extreme point extraction. For each of them you should analyze the explain all your decisions.
- You can also do the assignment in a two-person group. In that case, you need to finish all the kernels.
Requirements
Hand in your assignment by sending the following things to d.she _at_ tue.nl and z.ye _at_ tue.nl:
- Your report in PDF (NO .doc/docx file!). It should be no more than 6 pages. And DO NOT attach complete source code in the report.
- An archive file containing the final version of your source code (please don't include any image)
Your grade depends on three things: the result of your code, the quality of your report and the oral exam.
Functional Requirements
The correctness of your GPU code should be verified. Your GPU code should generate the same result as the CPU code. Some handy programs such as "diff", pre-installed in Linux and freely available for Windows, can do the check. We also provide a match.c, from which you can build a keypoint checking tool. In principle the error should be well below 1%.
Remarks:
You DON'T have to copy the result of a GPU kernel back to the system memory unless you need to use it on CPU. So you only need to make sure the final result is available in the system memory. And keep in mind that in most cases the copying between host and device is costly!
Report Requirements
Your report that includes the following things:
- The structure of your implementation: thread organization, memory mapping, etc. Please explain the reasoning behind your choices clearly, and use sufficient data to support them.
- For each optimization method you have performed, benchmark the effect of such method, in a way similar to this figure in the matrixMul example. Are these optimization methods as effective as you expected? Why or why not?
- Benchmark results of all the test images (you can try more image sizes if you want)
- Basic requirement: the time used to process the image. At least two cases: CPU-only and GPU-accelerated.
- You should benchmark the complete operation, except the file I/O. In particular, the overhead of data transfer between host and device should be included, or measured separately.
- Try to explain the results.
- If some images cannot run on GPU, try to explain the reason.
- Remark: you should avoid the "warm-up" overhead, see the known issues page for more information.
- If it is a group report, the contribution of each member should be clearly stated.
The deadline for this assignment is Dec 18, 2011.