Learning Materials
The content of this page was obsolete. Therefore, it is replaced by the version in year 2013.
Contents
- Introductory Reading (Required)
- GPU architectures and programming
- Performance Modeling
- CUDA Related Documents (Recommended)
- Other GPU Courses (Optional)
- Advanced Reading (Optional)
- Thread Scheduling and Context Managing
- Branch and Control Flow
- Memory Hierarchy and Network-On-Chip
Introductory Reading (Required)
GPU architectures and programming
Performance Modeling
CUDA Related Documents (Recommended)
NVIDIA provides a list of documentations. You can selectively read these documents according to your needs. Yet two of them are particularly relevant to the assignment (see the two bullets below). Therefore, we recommend you to look into them. It takes some time to read them, but they will save you a lot of effort later.
- CUDA Programming Guide
- CUDA Best Practices Guide
Other GPU Courses (Optional)
Here is a list of courses related to GPU architecture and/or GPU programming:
Advanced Reading (Optional)
This list highlights some recent research works (2009--2012) on GPUs and other throughput-oriented SIMD architectures. Despite that the papers are sorted into different categories, most papers touch all architectural aspects of the GPUs.
Thread Scheduling and Context Managing
- Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor, in MICRO 2012, link: http://dx.doi.org/10.1109/MICRO.2012.18
- A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors, in ACM Transactions on Computer Systems (TOCS) 2012, link: http://dx.doi.org/10.1145/2166879.2166882
- Energy-efficient mechanisms for managing thread context in throughput processors, in ISCA 2011, link: http://dx.doi.org/10.1145/2024723.2000093
- A compile-time managed multi-level register file hierarchy, in MICRO 2011, link: http://dx.doi.org/10.1145/2155620.2155675
- Improving GPU performance via large warps and two-level warp scheduling, in MICRO 2011, link: http://dx.doi.org/10.1145/2155620.2155656
Branch and Control Flow
- Simultaneous branch and warp interweaving for sustained GPU performance, in ISCA 2012, link: http://dx.doi.org/10.1109/ISCA.2012.6237005
- SIMD re-convergence at thread frontiers, in MICRO 2011, link: http://dx.doi.org/10.1145/2155620.2155676
- CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures, in ISCA 2012, http://dx.doi.org/10.1145/2366231.2337167
- Thread block compaction for efficient SIMT control flow, in HPCA 2011, link: http://dx.doi.org/10.1109/HPCA.2011.5749714
- Dynamic warp subdivision for integrated branch and memory divergence tolerance, in ISCA 2010, link: http://dx.doi.org/10.1145/1816038.1815992
- Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware, in ACM Transactions on Architecture and Code Optimization (TACO) 2009, link: http://dx.doi.org/10.1145/1543753.1543765
Memory Hierarchy and Network-On-Chip