Assignment
In this assignment, we will try to optimize the convolutional neural network algorithm on GPUs.
Policies:
- You can perform this assignment with another student (a group of maximum two people). You can also choose to do this assignment individually.
- If you are in a group, you and the other student in your group only need to submit one report and one code package. However, the oral exam is individual.
- There are CUDA examples of neural networks from the web. You can study them for learning purposes. However, you still have to write your own code for this assignment.
Programming guidelines:
- Make a timing breakdown of the program. Perform optimizations on the parts that dominate the execution time.
- Find out the bottleneck of the dominating parts. Apply optimization techniques accordingly. Do not randomly choose optimization techniques, e.g., loop-unrolling helps little if the code is severely memory bounded.
- Keep a record of each intermediate step. Re-do the timing breakdown after each optimization step. Some once-dominating parts may become less critical after the optimizations. Then you should move on to other parts.
- Try to analyze the program. There are Performance Analysis Tools, but sometimes pen-and-paper exercises are good enough. For example, if the code is severely bandwidth bounded, you can quickly find it out by estimating the computation-to-memory ratio of the algorithm and plot it on the Roofline model.
Report guidelines:
- Do not just present the results, but also explain the reasons. What do you expect? Do the results match what you expect? Why or why not?
- Explain why certain optimization is performed. Are the chosen optimization techniques solving the bottleneck of the program?
- If you do not manage to optimize the bottleneck of the program, explain the reason. It is important to understand the limits.
- Explain results in a concise and clear way. Tables and figures may help avoid verbose text. Yet data in tables and figures should be clearly explained.
Submission:
- A report of maximum 8 pages in PDF format.
- A zip file containing the source code. Try to keep the zip file small (avoid binary files and unnecessary image files).
- Send the report and the zip file via email to Zhenyu Ye (z.ye _at_ tue.nl).