src/cpu/testers/gpu_ruby_test/README - public/gem5 - Git at Google

 /*
  * Copyright (c) 2017-2021 Advanced Micro Devices, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * 1. Redistributions of source code must retain the above copyright notice,
  * this list of conditions and the following disclaimer.
  *
  * 2. Redistributions in binary form must reproduce the above copyright notice,
  * this list of conditions and the following disclaimer in the documentation
  * and/or other materials provided with the distribution.
  *
  * 3. Neither the name of the copyright holder nor the names of its
  * contributors may be used to endorse or promote products derived from this
  * software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  */

 This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random
 teter, this tester does not rely on sequential consistency. Instead, it
 assumes tested protocols supports release consistency.

 ----- Getting Started -----

 To start using the tester quickly, you can use the following example command
 line to get running immediately:

 build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \
             --test-length=1000 --system-size=medium --cache-size=small

 An overview of the main command line options is as follows. For all options
 use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help`
 or see the configuration file.

  * --cache-size (small, large): Use smaller sizes for testing evict, etc.
  * --system-size (small, medium, large): Effectively the number of threads in
                  the GPU model. Large size will have more contention. Larger
                  sizes are useful for checking contention.
  * --episode-length (short, medium, long): Number of loads and stores in an
                  episode. Episodes will also have atomics mixed in. See below
                  for a definition of episode.
  * --test-length (int): Number of episodes to execute. This will determine the
                  amount of time the tester runs for. Longer time will stress
                  the protocol harder.

 The remainder of this file describes the theory behind the tester design and
 a link to a more detailed research paper is provided at the end.

 ----- Theory Overview -----

 The GPU Ruby tester creates a system consisting of both CPU threads and GPU
 wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU
 wavefront may have multiple lanes. The number of lanes is initialized when
 a thread/wavefront is created.

 Each thread/wavefront executes a number of episodes. Each episode is a series
 of memory actions (i.e., atomic, load, store, acquire and release). In a
 wavefront, all lanes execute the same sequence of actions, but they may target
 different addresses. One can think of an episode as a critical section which
 is bounded by a lock acquire in the beginning and a lock release at the end. An
 episode consists of actions in the following order:

 1 - Atomic action
 2 - Acquire action
 3 - A number of load and store actions
 4 - Release action
 5 - Atomic action that targets the same address as (1) does

 There are two separate set of addresses: atomic and non-atomic. Atomic actions
 target only atomic addresses. Load and store actions target only non-atomic
 addresses. Memory addresses are all 4-byte aligned in the tester.

 To test false sharing cases in which both atomic and non-atomic addresses are
 placed in the same cache line, we abstract out the concept of memory addresses
 from the tester's perspective by introducing the concept of location. Locations
 are numbered from 0 to N-1 (if there are N addresses). The first X locations
 [0..X-1] are atomic locations, and the rest are non-atomic locations.
 The 1-1 mapping between locations and addresses are randomly created when the
 tester is initialized.

 Per load and store action, its target location is selected so that there is no
 data race in the generated stream of memory requests at any time during the
 test. Since in Data-Race-Free model, the memory system's behavior is undefined
 in data race cases, we exclude data race scenarios from our protocol test.

 Once location per load/store action is determined, each thread/wavefront either
 loads current value at the location or stores an incremental value to that
 location. The tester maintains a table tracking all last writers and their
 written values, so we know what value should be returned from a load and what
 value should be written next at a particular location. Value returned from a
 load must match with the value written by the last writer.

 ----- Directory Structure -----

 ProtocolTester.hh/cc -- This is the main tester class that orchestrates the
                         entire test.
 AddressManager.hh/cc -- This manages address space, randomly maps address to
                         location, generates locations for all episodes,
                         maintains per-location last writer and validates
                         values returned from load actions.
 TesterThread.hh/cc   -- This is abstract class for CPU threads and GPU
                         wavefronts. It generates and executes a series of
                         episodes.
 CpuThread.hh/cc      -- Thread class for CPU threads. Not fully implemented yet
 GpuWavefront.hh/cc   -- Thread class for GPU wavefronts.
 Episode.hh/cc        -- Class to encapsulate an episode, notably including
                         episode load/store structure and ordering.

 For more detail, please see the following paper:

 T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free
 GPU Testing," 2019 IEEE International Symposium on Workload Characterization
 (IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi:
 10.1109/IISWC47752.2019.9042019.
	/*
	* Copyright (c) 2017-2021 Advanced Micro Devices, Inc.
	* All rights reserved.
	*
	* Redistribution and use in source and binary forms, with or without
	* modification, are permitted provided that the following conditions are met:
	*
	* 1. Redistributions of source code must retain the above copyright notice,
	* this list of conditions and the following disclaimer.
	*
	* 2. Redistributions in binary form must reproduce the above copyright notice,
	* this list of conditions and the following disclaimer in the documentation
	* and/or other materials provided with the distribution.
	*
	* 3. Neither the name of the copyright holder nor the names of its
	* contributors may be used to endorse or promote products derived from this
	* software without specific prior written permission.
	*
	* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
	* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
	* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
	* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
	* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
	* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
	* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
	* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
	* POSSIBILITY OF SUCH DAMAGE.
	*/

	This directory contains a tester for gem5 GPU protocols. Unlike the Ruby random
	teter, this tester does not rely on sequential consistency. Instead, it
	assumes tested protocols supports release consistency.

	----- Getting Started -----

	To start using the tester quickly, you can use the following example command
	line to get running immediately:

	build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py \
	--test-length=1000 --system-size=medium --cache-size=small

	An overview of the main command line options is as follows. For all options
	use `build/GCN3_X86/gem5.opt configs/example/ruby_gpu_random_test.py --help`
	or see the configuration file.

	* --cache-size (small, large): Use smaller sizes for testing evict, etc.
	* --system-size (small, medium, large): Effectively the number of threads in
	the GPU model. Large size will have more contention. Larger
	sizes are useful for checking contention.
	* --episode-length (short, medium, long): Number of loads and stores in an
	episode. Episodes will also have atomics mixed in. See below
	for a definition of episode.
	* --test-length (int): Number of episodes to execute. This will determine the
	amount of time the tester runs for. Longer time will stress
	the protocol harder.

	The remainder of this file describes the theory behind the tester design and
	a link to a more detailed research paper is provided at the end.

	----- Theory Overview -----

	The GPU Ruby tester creates a system consisting of both CPU threads and GPU
	wavefronts. CPU threads are scalar, so there is one lane per CPU thread. GPU
	wavefront may have multiple lanes. The number of lanes is initialized when
	a thread/wavefront is created.

	Each thread/wavefront executes a number of episodes. Each episode is a series
	of memory actions (i.e., atomic, load, store, acquire and release). In a
	wavefront, all lanes execute the same sequence of actions, but they may target
	different addresses. One can think of an episode as a critical section which
	is bounded by a lock acquire in the beginning and a lock release at the end. An
	episode consists of actions in the following order:

	1 - Atomic action
	2 - Acquire action
	3 - A number of load and store actions
	4 - Release action
	5 - Atomic action that targets the same address as (1) does

	There are two separate set of addresses: atomic and non-atomic. Atomic actions
	target only atomic addresses. Load and store actions target only non-atomic
	addresses. Memory addresses are all 4-byte aligned in the tester.

	To test false sharing cases in which both atomic and non-atomic addresses are
	placed in the same cache line, we abstract out the concept of memory addresses
	from the tester's perspective by introducing the concept of location. Locations
	are numbered from 0 to N-1 (if there are N addresses). The first X locations
	[0..X-1] are atomic locations, and the rest are non-atomic locations.
	The 1-1 mapping between locations and addresses are randomly created when the
	tester is initialized.

	Per load and store action, its target location is selected so that there is no
	data race in the generated stream of memory requests at any time during the
	test. Since in Data-Race-Free model, the memory system's behavior is undefined
	in data race cases, we exclude data race scenarios from our protocol test.

	Once location per load/store action is determined, each thread/wavefront either
	loads current value at the location or stores an incremental value to that
	location. The tester maintains a table tracking all last writers and their
	written values, so we know what value should be returned from a load and what
	value should be written next at a particular location. Value returned from a
	load must match with the value written by the last writer.

	----- Directory Structure -----

	ProtocolTester.hh/cc -- This is the main tester class that orchestrates the
	entire test.
	AddressManager.hh/cc -- This manages address space, randomly maps address to
	location, generates locations for all episodes,
	maintains per-location last writer and validates
	values returned from load actions.
	TesterThread.hh/cc -- This is abstract class for CPU threads and GPU
	wavefronts. It generates and executes a series of
	episodes.
	CpuThread.hh/cc -- Thread class for CPU threads. Not fully implemented yet
	GpuWavefront.hh/cc -- Thread class for GPU wavefronts.
	Episode.hh/cc -- Class to encapsulate an episode, notably including
	episode load/store structure and ordering.

	For more detail, please see the following paper:

	T. Ta, X. Zhang, A. Gutierrez and B. M. Beckmann, "Autonomous Data-Race-Free
	GPU Testing," 2019 IEEE International Symposium on Workload Characterization
	(IISWC), Orlando, FL, USA, 2019, pp. 81-92, doi:
	10.1109/IISWC47752.2019.9042019.