AMD gem5 APU Simulator: Modeling GPUs Using the Machine ISA
Held in conjunction with ISCA 2018. June 2nd, 2018.
The tutorial will be held on day one of the conference - June 2nd, 2018
ISCA 2018 early registration and hotel reservation deadline - April 16th, 2018
AMD Research has developed an APU (Accelerated Processing Unit) model that extends gem5 [1] with a GPU timing model that executes the GCN (Graphics Core Next) generation 3 machine ISA [2, 3]. In addition to supporting a modern machine ISA, the model supports running the open-source Radeon Open Compute platform (ROCm) stack without modification. This allows users to run a wide variety of applications written in several high-level languages, including C++, HIP, OpenMP, and OpenCL. This provides researchers the ability to evaluate many different types of workloads, from traditional compute applications to emerging modern GPU workloads, such as task parallel and machine learning applications. The resulting AMD gem5 APU simulator is a cycle-level, flexible research model that is capable of representing many different APU configurations, on-chip cache hierarchies, and system designs. Our APU extensions allow researchers to model both CPU and GPU memory requests and the interactions between them. In particular, the model uses SLICC and Ruby to implement a wide variety of coherence and synchronization solutions, which is a critical research area in heterogeneous computing. The model has been used in several top-tier computer architecture publications in the last several years [MICRO 2013, HPCA 2014, ASPLOS 2014, ISCA 2014, HPCA 2015, ASPLOS 2015, MICRO 2016, HPCA 2017, ISCA 2017, HPCA 2018].
In this tutorial, we will describe the capabilities of the AMD gem5 APU simulator that will be publically released with a liberal BSD license before ISCA 2018. We will detail the simulated APU architecture, review the execution flow, and describe how the simulator has been used. The presentation will also discuss key design decisions and tradeoffs. For example, we use the system-call emulation mode to avoid running a full OS and kernel driver, therefore we will describe the simulator’s system-call emulation interface, and how the ROCm runtime and user space drivers interact with it. Also, our GPU model now directly executes native machine ISA instructions rather than the HSAIL intermediate language representation. Previously relying on executing the intermediate language simplified workload compilation, but was less accurate when modeling hardware behavior. In this tutorial, we will highlight many of the improvements enabled by executing the GCN3 ISA.
[1]. Nathan Binkert et al. The gem5 Simulator. In SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
[2]. AMD. AMD GCN3 ISA Architecture Manual
[3]. Anthony Gutierrez et al. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In HPCA 2018.
Topic | Presenter | Time |
---|---|---|
Background | Tony | 8:00-8:15 am |
ROCm Stack, GCN3 ISA, and uArch | Tony | 8:15-9:15 am |
HSA Queuing | Sooraj | 9:15-10:00 am |
Break | 10:00-10:30 am | |
Ruby and GPU Protocol Tester | Tuan | 10:30-11:15 am |
Demo/Workloads and Q+A | TBD | 11:15-12:00 pm |
Tony Gutierrez (AMD Research)
Sooraj Puthoor (AMD Research)
Brad Beckmann (AMD Research)
Tuan Ta (Cornell)