src/doc/memory_system.doxygen - public/gem5 - Git at Google

 # Copyright (c) 2012 ARM Limited
 # All rights reserved
 #
 # The license below extends only to copyright in the software and shall
 # not be construed as granting a license to any other intellectual
 # property including but not limited to intellectual property relating
 # to a hardware implementation of the functionality of the software
 # licensed hereunder.  You may use the software subject to the license
 # terms below provided that you ensure that this notice is replicated
 # unmodified and in its entirety in all distributions of the software,
 # modified or unmodified, in source code or in binary form.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
 # met: redistributions of source code must retain the above copyright
 # notice, this list of conditions and the following disclaimer;
 # redistributions in binary form must reproduce the above copyright
 # notice, this list of conditions and the following disclaimer in the
 # documentation and/or other materials provided with the distribution;
 # neither the name of the copyright holders nor the names of its
 # contributors may be used to endorse or promote products derived from
 # this software without specific prior written permission.
 #
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #
 # Author: Djordje Kovacevic

 /*! \page gem5MemorySystem Memory System in gem5

   \tableofcontents

   The document describes memory subsystem in gem5 with focus on program flow
   during CPU’s simple memory transactions (read or write).


   \section gem5_MS_MH MODEL HIERARCHY

   Model that is used in this document consists of two out-of-order (O3)
   ARM v7 CPUs with corresponding L1 data caches and Simple Memory. It is
   created by running gem5 with the following parameters:

   configs/example/fs.py --caches --cpu-type=arm_detailed --num-cpus=2

   Gem5 uses Simulation Objects (SimObject) derived objects as basic blocks for
   building memory system. They are connected via ports with established
   master/slave hierarchy. Data flow is initiated on master port while the
   response messages and snoop queries appear on the slave port. The following
   figure shows the hierarchy of Simulation Objects used in this document:

   \image html "gem5_MS_Fig1.PNG" "Simulation Object hierarchy of the model" width=3cm

   \section gem5_CPU CPU

   It is not in the scope of this document to describe O3 CPU model in details, so
   here are only a few relevant notes about the model:

   <b>Read access </b>is initiated by sending message to the port towards DCache
   object. If DCache rejects the message (for being blocked or busy) CPU will
   flush the pipeline and the access will be re-attempted later on. The access
   is completed upon receiving reply message (ReadRep) from DCache.

   <b>Write access</b> is initiated by storing the request into store buffer whose
   context is emptied and sent to DCache on every tick. DCache may also reject
   the request. Write access is completed when write reply (WriteRep) message is
   received from DCache.

   Load & store buffers  (for read and write access) don’t impose any
   restriction on the number of active memory accesses. Therefore, the maximum
   number of outstanding CPU’s memory access requests is not limited by CPU
   Simulation Object but by underlying memory system model.

   <b>Split memory access</b> is implemented.

   The message that is sent by CPU contains memory type (Normal, Device, Strongly
   Ordered and cachebility) of the accessed region. However, this is not being used
   by the rest of the model that takes more simplified approach towards memory types.

   \section gem5_DCache DATA CACHE OBJECT

   Data Cache object implements a standard cache structure:

   \image html "gem5_MS_Fig2.PNG" "DCache Simulation Object" width=3cm

   <b>Cached memory reads</b> that match particular cache tag (with Valid & Read
   flags) will be completed (by sending ReadResp to CPU) after a configurable time.
   Otherwise, the request is forwarded to Miss Status and Handling Register
   (MSHR) block.

   <b>Cached memory writes</b> that match particular cache tag (with Valid, Read
   & Write flags) will be completed (by sending WriteResp CPU) after the same
   configurable time. Otherwise, the request is forwarded to Miss Status and
   Handling Register(MSHR) block.

   <b>Uncached memory reads</b> are forwarded to MSHR block.

   <b>Uncached memory writes</b> are forwarded to WriteBuffer block.

   <b>Evicted (& dirty) cache lines</b> are forwarded to WriteBuffer block.

   CPU’s access to Data Cache is blocked if any of the following is true:

     - MSHR block is full. (The size of MSHR’s buffer is configurable.)

     - Writeback block is full. (The size of the block’s buffer is
     configurable.)

     - The number of outstanding memory accesses against the same memory cache line
     has reached configurable threshold value – see MSHR and Write Buffer for details.

   Data Cache in block state will reject any request from slave port (from CPU)
   regardless of whether it would result in cache hit or miss. Note that
   incoming messages on master port (response messages and snoop requests)
   are never rejected.

   Cache hit on uncachable memory region (unpredicted behaviour according to
   ARM ARM) will invalidate cache line and fetch data from memory.

   \subsection gem5_MS_TAndDBlock Tags & Data Block

   Cache lines (referred as blocks in source code) are organised into sets with
   configurable associativity and size. They have the following status flags:
     - <b>Valid.</b> It holds data. Address tag is valid
     - <b>Read.</b> No read request will be accepted without this flag being set.
       For example, cache line is valid and unreadable when it waits for write flag
       to complete write access.
     - <b>Write.</b> It may accept writes. Cache line with Write flags
       identifies Unique state – no other cache memory holds the copy.
     - <b>Dirty.</b> It needs Writeback when evicted.

   Read access will hit cache line if address tags match and Valid and Read
   flags are set. Write access will hit cache line if address tags match and
   Valid, Read and Write flags are set.

   \subsection gem5_MS_Queues MSHR and Write Buffer Queues

   Miss Status and Handling Register (MSHR) queue holds the list of CPU’s
   outstanding memory requests that require read access to lower memory
   level. They are:
     - Cached Read misses.
     - Cached Write misses.
     - Uncached reads.

   WriteBuffer queue holds the following memory requests:
     - Uncached writes.
     - Writeback from evicted (& dirty) cache lines.

   \image html "gem5_MS_Fig3.PNG" "MSHR and Write Buffer Blocks" width=6cm

   Each memory request is assigned to corresponding MSHR object (READ or WRITE
   on diagram above) that represents particular block (cache line) of memory
   that has to be read or written in order to complete the command(s). As shown
   on gigure above, cached read/writes against the same cache line have a common
   MSHR object and will be completed with a single memory access.

   The size of the block (and therefore the size of read/write access to lower
   memory) is:
     - The size of cache line for cached access & writeback;
     - As specified in CPU instruction for uncached access.

   In general, Data Cache model distinguishes between just two memory types:
     - Normal Cached memory. It is always treated as write back, read and write
       allocate.
     - Normal uncached, Device and Strongly Ordered types are treated equally
       (as uncached memory)

   \subsection gem5_MS_Ordering Memory Access Ordering

   An unique order number is assigned to each CPU read/write request(as they appear on
   slave port). Order numbers of MSHR objects are copied from the first
   assigned read/write.

   Memory read/writes from each of these two queues are executed in order (according
   to the assigned order number). When both queues are not empty the model will
   execute memory read from MSHR block unless WriteBuffer is full. It will,
   however, always preserve the order of read/writes on the same
   (or overlapping) memory cache line (block).

   In summary:
     - Order of accesses to cached memory is not preserved unless they target
       the same cache line. For example, the accesses #1, #5 & #10 will
       complete simultaneously in the same tick (still in order). The access
       #5 will complete before #3.
     - Order of all uncached memory writes is preserved. Write#6 always
       completes before Write#13.
     - Order to all uncached memory reads is preserved. Read#2 always completes
       before Read#8.
     - The order of a read and a write uncached access is not necessarily
       preserved  - unless their access regions overlap. Therefore, Write#6
       always completes before Read#8 (they target the same memory block).
       However, Write#13 may complete before Read#8.


   \section gem5_MS_Bus COHERENT BUS OBJECT

   \image html "gem5_MS_Fig4.PNG" "Coherent Bus Object" width=3cm

   Coherent Bus object provides basic support for snoop protocol:

   <b>All requests on the slave port</b> are forwarded to the appropriate master port. Requests
   for cached memory regions are also forwarded to other slave ports (as snoop
   requests).

   <b>Master port replies</b> are forwarded to the appropriate slave port.

   <b>Master port snoop requests</b> are forwarded to all slave ports.

   <b>Slave port snoop replies</b> are forwarded to the port that was the source of the
   request. (Note that the source of snoop request can be either slave or
   master port.)

   The bus declares itself blocked for a configurable period of time after
   any of the following events:
     - A packet is sent (or failed to be sent) to a slave port.
     - A reply message is sent to a master port.
     - Snoop response from one slave port is sent to another slave port.

   The bus in blocked state rejects the following incoming messages:
     - Slave port requests.
     - Master port replies.
     - Master port snoop requests.

   \section gem5_MS_SimpleMemory SIMPLE MEMORY OBJECT

   It never blocks the access on slave port.

   Memory read/write takes immediate effect. (Read or write is performed when
   the request is received).

   Reply message is sent after a configurable period of time .

   \section gem5_MS_MessageFlow MESSAGE FLOW

   \subsection gem5_MS_Ordering Read Access

   The following diagram shows read access that hits Data Cache line with Valid
   and Read flags:

   \image html "gem5_MS_Fig5.PNG" "Read Hit (Read flag must be set in cache line)" width=3cm

   Cache miss read access will generate the following sequence of messages:

   \image html "gem5_MS_Fig6.PNG" "Read Miss with snoop reply" width=3cm

   Note that bus object never gets response from both DCache2 and Memory object.
   It sends the very same ReadReq package (message) object to memory and data
   cache. When Data Cache wants to reply on snoop request it marks the message
   with MEM_INHIBIT flag that tells Memory object not to process the message.

   \subsection gem5_MS_Ordering Write Access

   The following diagram shows write access that hits DCache1 cache line with
   Valid & Write flags:

   \image html "gem5_MS_Fig7.PNG" "Write Hit (with Write flag set in cache line)" width=3cm

   Next figure shows write access that hits DCache1 cache line with Valid but no
   Write flags – which qualifies as write miss. DCache1 issues UpgradeReq to
   obtain write permission. DCache2::snoopTiming will invalidate cache line that
   has been hit. Note that UpgradeResp message doesn’t carry data.

   \image html "gem5_MS_Fig8.PNG" "Write Miss – matching tag with no Write flag" width=3cm

   The next diagram shows write miss in DCache. ReadExReq invalidates cache line
   in DCache2. ReadExResp carries the content of memory cache line.

   \image html "gem5_MS_Fig9.PNG" "Miss - no matching tag" width=3cm

 */
	# Copyright (c) 2012 ARM Limited
	# All rights reserved
	#
	# The license below extends only to copyright in the software and shall
	# not be construed as granting a license to any other intellectual
	# property including but not limited to intellectual property relating
	# to a hardware implementation of the functionality of the software
	# licensed hereunder. You may use the software subject to the license
	# terms below provided that you ensure that this notice is replicated
	# unmodified and in its entirety in all distributions of the software,
	# modified or unmodified, in source code or in binary form.
	#
	# Redistribution and use in source and binary forms, with or without
	# modification, are permitted provided that the following conditions are
	# met: redistributions of source code must retain the above copyright
	# notice, this list of conditions and the following disclaimer;
	# redistributions in binary form must reproduce the above copyright
	# notice, this list of conditions and the following disclaimer in the
	# documentation and/or other materials provided with the distribution;
	# neither the name of the copyright holders nor the names of its
	# contributors may be used to endorse or promote products derived from
	# this software without specific prior written permission.
	#
	# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
	# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
	# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
	# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
	# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
	# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
	# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
	# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
	# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
	# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	#
	# Author: Djordje Kovacevic

	/*! \page gem5MemorySystem Memory System in gem5

	\tableofcontents

	The document describes memory subsystem in gem5 with focus on program flow
	during CPU’s simple memory transactions (read or write).


	\section gem5_MS_MH MODEL HIERARCHY

	Model that is used in this document consists of two out-of-order (O3)
	ARM v7 CPUs with corresponding L1 data caches and Simple Memory. It is
	created by running gem5 with the following parameters:

	configs/example/fs.py --caches --cpu-type=arm_detailed --num-cpus=2

	Gem5 uses Simulation Objects (SimObject) derived objects as basic blocks for
	building memory system. They are connected via ports with established
	master/slave hierarchy. Data flow is initiated on master port while the
	response messages and snoop queries appear on the slave port. The following
	figure shows the hierarchy of Simulation Objects used in this document:

	\image html "gem5_MS_Fig1.PNG" "Simulation Object hierarchy of the model" width=3cm

	\section gem5_CPU CPU

	It is not in the scope of this document to describe O3 CPU model in details, so
	here are only a few relevant notes about the model:

	<b>Read access </b>is initiated by sending message to the port towards DCache
	object. If DCache rejects the message (for being blocked or busy) CPU will
	flush the pipeline and the access will be re-attempted later on. The access
	is completed upon receiving reply message (ReadRep) from DCache.

	<b>Write access</b> is initiated by storing the request into store buffer whose
	context is emptied and sent to DCache on every tick. DCache may also reject
	the request. Write access is completed when write reply (WriteRep) message is
	received from DCache.

	Load & store buffers (for read and write access) don’t impose any
	restriction on the number of active memory accesses. Therefore, the maximum
	number of outstanding CPU’s memory access requests is not limited by CPU
	Simulation Object but by underlying memory system model.

	<b>Split memory access</b> is implemented.

	The message that is sent by CPU contains memory type (Normal, Device, Strongly
	Ordered and cachebility) of the accessed region. However, this is not being used
	by the rest of the model that takes more simplified approach towards memory types.

	\section gem5_DCache DATA CACHE OBJECT

	Data Cache object implements a standard cache structure:

	\image html "gem5_MS_Fig2.PNG" "DCache Simulation Object" width=3cm

	<b>Cached memory reads</b> that match particular cache tag (with Valid & Read
	flags) will be completed (by sending ReadResp to CPU) after a configurable time.
	Otherwise, the request is forwarded to Miss Status and Handling Register
	(MSHR) block.

	<b>Cached memory writes</b> that match particular cache tag (with Valid, Read
	& Write flags) will be completed (by sending WriteResp CPU) after the same
	configurable time. Otherwise, the request is forwarded to Miss Status and
	Handling Register(MSHR) block.

	<b>Uncached memory reads</b> are forwarded to MSHR block.

	<b>Uncached memory writes</b> are forwarded to WriteBuffer block.

	<b>Evicted (& dirty) cache lines</b> are forwarded to WriteBuffer block.

	CPU’s access to Data Cache is blocked if any of the following is true:

	- MSHR block is full. (The size of MSHR’s buffer is configurable.)

	- Writeback block is full. (The size of the block’s buffer is
	configurable.)

	- The number of outstanding memory accesses against the same memory cache line
	has reached configurable threshold value – see MSHR and Write Buffer for details.

	Data Cache in block state will reject any request from slave port (from CPU)
	regardless of whether it would result in cache hit or miss. Note that
	incoming messages on master port (response messages and snoop requests)
	are never rejected.

	Cache hit on uncachable memory region (unpredicted behaviour according to
	ARM ARM) will invalidate cache line and fetch data from memory.

	\subsection gem5_MS_TAndDBlock Tags & Data Block

	Cache lines (referred as blocks in source code) are organised into sets with
	configurable associativity and size. They have the following status flags:
	- <b>Valid.</b> It holds data. Address tag is valid
	- <b>Read.</b> No read request will be accepted without this flag being set.
	For example, cache line is valid and unreadable when it waits for write flag
	to complete write access.
	- <b>Write.</b> It may accept writes. Cache line with Write flags
	identifies Unique state – no other cache memory holds the copy.
	- <b>Dirty.</b> It needs Writeback when evicted.

	Read access will hit cache line if address tags match and Valid and Read
	flags are set. Write access will hit cache line if address tags match and
	Valid, Read and Write flags are set.

	\subsection gem5_MS_Queues MSHR and Write Buffer Queues

	Miss Status and Handling Register (MSHR) queue holds the list of CPU’s
	outstanding memory requests that require read access to lower memory
	level. They are:
	- Cached Read misses.
	- Cached Write misses.
	- Uncached reads.

	WriteBuffer queue holds the following memory requests:
	- Uncached writes.
	- Writeback from evicted (& dirty) cache lines.

	\image html "gem5_MS_Fig3.PNG" "MSHR and Write Buffer Blocks" width=6cm

	Each memory request is assigned to corresponding MSHR object (READ or WRITE
	on diagram above) that represents particular block (cache line) of memory
	that has to be read or written in order to complete the command(s). As shown
	on gigure above, cached read/writes against the same cache line have a common
	MSHR object and will be completed with a single memory access.

	The size of the block (and therefore the size of read/write access to lower
	memory) is:
	- The size of cache line for cached access & writeback;
	- As specified in CPU instruction for uncached access.

	In general, Data Cache model distinguishes between just two memory types:
	- Normal Cached memory. It is always treated as write back, read and write
	allocate.
	- Normal uncached, Device and Strongly Ordered types are treated equally
	(as uncached memory)

	\subsection gem5_MS_Ordering Memory Access Ordering

	An unique order number is assigned to each CPU read/write request(as they appear on
	slave port). Order numbers of MSHR objects are copied from the first
	assigned read/write.

	Memory read/writes from each of these two queues are executed in order (according
	to the assigned order number). When both queues are not empty the model will
	execute memory read from MSHR block unless WriteBuffer is full. It will,
	however, always preserve the order of read/writes on the same
	(or overlapping) memory cache line (block).

	In summary:
	- Order of accesses to cached memory is not preserved unless they target
	the same cache line. For example, the accesses #1, #5 & #10 will
	complete simultaneously in the same tick (still in order). The access
	#5 will complete before #3.
	- Order of all uncached memory writes is preserved. Write#6 always
	completes before Write#13.
	- Order to all uncached memory reads is preserved. Read#2 always completes
	before Read#8.
	- The order of a read and a write uncached access is not necessarily
	preserved - unless their access regions overlap. Therefore, Write#6
	always completes before Read#8 (they target the same memory block).
	However, Write#13 may complete before Read#8.


	\section gem5_MS_Bus COHERENT BUS OBJECT

	\image html "gem5_MS_Fig4.PNG" "Coherent Bus Object" width=3cm

	Coherent Bus object provides basic support for snoop protocol:

	<b>All requests on the slave port</b> are forwarded to the appropriate master port. Requests
	for cached memory regions are also forwarded to other slave ports (as snoop
	requests).

	<b>Master port replies</b> are forwarded to the appropriate slave port.

	<b>Master port snoop requests</b> are forwarded to all slave ports.

	<b>Slave port snoop replies</b> are forwarded to the port that was the source of the
	request. (Note that the source of snoop request can be either slave or
	master port.)

	The bus declares itself blocked for a configurable period of time after
	any of the following events:
	- A packet is sent (or failed to be sent) to a slave port.
	- A reply message is sent to a master port.
	- Snoop response from one slave port is sent to another slave port.

	The bus in blocked state rejects the following incoming messages:
	- Slave port requests.
	- Master port replies.
	- Master port snoop requests.

	\section gem5_MS_SimpleMemory SIMPLE MEMORY OBJECT

	It never blocks the access on slave port.

	Memory read/write takes immediate effect. (Read or write is performed when
	the request is received).

	Reply message is sent after a configurable period of time .

	\section gem5_MS_MessageFlow MESSAGE FLOW

	\subsection gem5_MS_Ordering Read Access

	The following diagram shows read access that hits Data Cache line with Valid
	and Read flags:

	\image html "gem5_MS_Fig5.PNG" "Read Hit (Read flag must be set in cache line)" width=3cm

	Cache miss read access will generate the following sequence of messages:

	\image html "gem5_MS_Fig6.PNG" "Read Miss with snoop reply" width=3cm

	Note that bus object never gets response from both DCache2 and Memory object.
	It sends the very same ReadReq package (message) object to memory and data
	cache. When Data Cache wants to reply on snoop request it marks the message
	with MEM_INHIBIT flag that tells Memory object not to process the message.

	\subsection gem5_MS_Ordering Write Access

	The following diagram shows write access that hits DCache1 cache line with
	Valid & Write flags:

	\image html "gem5_MS_Fig7.PNG" "Write Hit (with Write flag set in cache line)" width=3cm

	Next figure shows write access that hits DCache1 cache line with Valid but no
	Write flags – which qualifies as write miss. DCache1 issues UpgradeReq to
	obtain write permission. DCache2::snoopTiming will invalidate cache line that
	has been hit. Note that UpgradeResp message doesn’t carry data.

	\image html "gem5_MS_Fig8.PNG" "Write Miss – matching tag with no Write flag" width=3cm

	The next diagram shows write miss in DCache. ReadExReq invalidates cache line
	in DCache2. ReadExResp carries the content of memory cache line.

	\image html "gem5_MS_Fig9.PNG" "Miss - no matching tag" width=3cm

	*/