| /* |
| * Copyright (c) 2017-2020 Advanced Micro Devices, Inc. |
| * All rights reserved. |
| * |
| * For use for simulation and test purposes only |
| * |
| * Redistribution and use in source and binary forms, with or without |
| * modification, are permitted provided that the following conditions are met: |
| * |
| * 1. Redistributions of source code must retain the above copyright notice, |
| * this list of conditions and the following disclaimer. |
| * |
| * 2. Redistributions in binary form must reproduce the above copyright notice, |
| * this list of conditions and the following disclaimer in the documentation |
| * and/or other materials provided with the distribution. |
| * |
| * 3. Neither the name of the copyright holder nor the names of its |
| * contributors may be used to endorse or promote products derived from this |
| * software without specific prior written permission. |
| * |
| * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" |
| * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE |
| * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR |
| * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF |
| * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS |
| * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) |
| * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE |
| * POSSIBILITY OF SUCH DAMAGE. |
| */ |
| |
| This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random |
| teter, this tester does not rely on sequential consistency. Instead, it |
| assumes tested protocols supports release consistency. |
| |
| ----- Getting Started ----- |
| |
| To start using the tester quickly, you can use the following example command |
| line to get running immediately: |
| |
| build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \ |
| --test-length=1000 --system-size=medium --cache-size=small |
| |
| An overview of the main command line options is as follows. For all options |
| use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help` |
| or see the configuration file. |
| |
| * --cache-size (small, large): Use smaller sizes for testing evict, etc. |
| * --system-size (small, medium, large): Effectively the number of threads in |
| the GPU model. Large size will have more contention. Larger |
| sizes are useful for checking contention. |
| * --episode-length (short, medium, long): Number of loads and stores in an |
| episode. Episodes will also have atomics mixed in. See below |
| for a definition of episode. |
| * --test-length (int): Number of episodes to execute. This will determine the |
| amount of time the tester runs for. Longer time will stress |
| the protocol harder. |
| |
| The remainder of this file describes the theory behind the tester design and |
| a link to a more detailed research paper is provided at the end. |
| |
| ----- Theory Overview ----- |
| |
| The GPU Ruby tester creates a system consisting of both CPU threads and GPU |
| wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU |
| wavefront may have multiple lanes. The number of lanes is initialized when |
| a thread/wavefront is created. |
| |
| Each thread/wavefront executes a number of episodes. Each episode is a series |
| of memory actions (i.e., atomic, load, store, acquire and release). In a |
| wavefront, all lanes execute the same sequence of actions, but they may target |
| different addresses. One can think of an episode as a critical section which |
| is bounded by a lock acquire in the beginning and a lock release at the end. An |
| episode consists of actions in the following order: |
| |
| 1 - Atomic action |
| 2 - Acquire action |
| 3 - A number of load and store actions |
| 4 - Release action |
| 5 - Atomic action that targets the same address as (1) does |
| |
| There are two separate set of addresses: atomic and non-atomic. Atomic actions |
| target only atomic addresses. Load and store actions target only non-atomic |
| addresses. Memory addresses are all 4-byte aligned in the tester. |
| |
| To test false sharing cases in which both atomic and non-atomic addresses are |
| placed in the same cache line, we abstract out the concept of memory addresses |
| from the tester's perspective by introducing the concept of location. Locations |
| are numbered from 0 to N-1 (if there are N addresses). The first X locations |
| [0..X-1] are atomic locations, and the rest are non-atomic locations. |
| The 1-1 mapping between locations and addresses are randomly created when the |
| tester is initialized. |
| |
| Per load and store action, its target location is selected so that there is no |
| data race in the generated stream of memory requests at any time during the |
| test. Since in Data-Race-Free model, the memory system's behavior is undefined |
| in data race cases, we exclude data race scenarios from our protocol test. |
| |
| Once location per load/store action is determined, each thread/wavefront either |
| loads current value at the location or stores an incremental value to that |
| location. The tester maintains a table tracking all last writers and their |
| written values, so we know what value should be returned from a load and what |
| value should be written next at a particular location. Value returned from a |
| load must match with the value written by the last writer. |
| |
| ----- Directory Structure ----- |
| |
| ProtocolTester.hh/cc -- This is the main tester class that orchestrates the |
| entire test. |
| AddressManager.hh/cc -- This manages address space, randomly maps address to |
| location, generates locations for all episodes, |
| maintains per-location last writer and validates |
| values returned from load actions. |
| GpuThread.hh/cc -- This is abstract class for CPU threads and GPU |
| wavefronts. It generates and executes a series of |
| episodes. |
| CpuThread.hh/cc -- Thread class for CPU threads. Not fully implemented yet |
| GpuWavefront.hh/cc -- GpuThread class for GPU wavefronts. |
| Episode.hh/cc -- Class to encapsulate an episode, notably including |
| episode load/store structure and ordering. |
| |
| For more detail, please see the following paper: |
| |
| T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free |
| GPU Testing," 2019 IEEE International Symposium on Workload Characterization |
| (IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi: |
| 10.1109/IISWC47752.2019.9042019. |