The CHI ruby protocol provides a single cache controller that can be reused at multiple levels of the cache hierarchy and configured to model multiple instances of MESI and MOESI cache coherency protocols. This implementation is based of Arm's AMBA 5 CHI specification and provides a scalable framework for the design space exploration of large SoC designs.
CHI (Coherent Hub Interface) provides a component architecture and transaction-level specification to model MESI and MOESI cache coherency. CHI defines three main components as shown in the figure below:
An HNF is the point of coherency (PoC) and point of serialization (PoS) for a specific address range. The HNF is responsible for issuing any required snoop requests to RNFs or memory access requests to SNFs in order to complete a transaction. The HNF can also encapsulate a shared last-level cache and include a directory for targeted snoops.
The CHI specification also defines specific types of nodes for non-coherent requesters (RNI) and non-coherent address ranges (HNI and SNI), e.g., memory ranges belonging to IO components. In Ruby, IO accesses don‘t go though the cache coherency protocol so only CHI’s fully coherent node types are implemented. In this documentation we interchangeably use the terms RN / RNF, HN / HNF, and SN/SNF. We also use the terms upstream and downstream to refer to components in the previous (i.e. towards the cpu) and next (i.e. towards memory) levels in the memory hierarchy, respectively.
The CHI protocol implementation consists mainly of two controllers:
Memory_Controller
(src/mem/ruby/protocol/chi/CHI-mem.sm) implements a CHI slave node. It receives memory read or write requests from the home nodes and interfaces with gem5’s classic memory controllers.Cache_Controller
(src/mem/ruby/protocol/chi/CHI-cache.sm) generic cache controller state machine.In order to allow fully flexible cache hierarchies, Cache_Controller
can be configured to model any cache level (e.g. L1D, priv. L2, shared L3) within both request and home nodes. Furthermore it also supports multiple features not available in other Ruby protocols:
The implementation defines the following cache states:
I
: line is invalidSC
: line is shared and cleanUC
: line is exclusive/unique and cleanSD
: line is shared and dirtyUD
: line exclusive/unique and dirtyUD_T
: UD
with timeout. When a store conditional fails and causes the line to transition from I to UD, we transition to UD_T
instead if the number of failures is above a certain threshold (configuration defined). In UD_T
the line cannot be evicted from the requester for a given number of cycles (also configuration defined); after which the lines goes to UD. This is necessary to avoid livelocks in certain scenarios.The figure below gives an overview of the state transitions when the controller is configured as a L1 cache:
Transitions are annotated with the incoming request from the cpu (or generated internally, e.g. Replacements) and the resulting outgoing request sent downstream. For simplicity, the figure omits requests that do not change states (e.g., cache hits) and invalidating snoops (final state is always I
). For simplicity, it also shows only the typical state transitions in a MOESI protocol. In CHI the final state will ultimately be determined by the type of data returned by the responder (e.g., requester may receive UD
or UC
data in response to a ReadShared
).
The figures below show the transition for a intermediate-level cache controller (e.g., priv. L2, shared L3, HNF, etc):
As in the previous case, cache hits are omitted for simplicity. In addition to the cache states, the following directory states are defined to track lines present in an upstream cache:
RU
:an upstream requester has line in UC or UDRSC
: one or more upstream requesters have line in SCRSD
: one upstream requester has line in SD; others may have it in SCRUSC
: RSC
+ current domain stills has exclusive accessRUSD
: RSD
+ current domain stills has exclusive accessWhen the line is present both in the local cache and upstream caches the following combined states are possible:
UD_RSC
, SD_RSC
, UC_RSC
, SC_RSC
UD_RU
, UC_RU
UD_RSD
, SD_RSD
The RUSC
and RUSD
states (omitted in the figures above) are used to keep track of lines for which the controller still has exclusive access permissions without having it in it’s local cache. This is possible in a non-inclusive cache where a local block can be deallocated without back-invalidating upstream copies.
When a cache controller is a HNF (home node), the state transactions are basically the same as a intermediate level cache, except for these differences:
ReadNoSnp
is sent to obtain data from downstream, as the only downstream components are the SNs (slave nodes).For more information on DCT and DMT transactions, see Sections 1.7 and 2.3.1 in the CHI specification. DMT and DCT are CHI features that allow the data source for a request to send data directly to the original requester. On a DMT request, the SN sends data directly to the RN (instead of sending first to the HN, which would then forwards to the RN), while with DCT, the HN requests that a RN being snooped (the snoopee) to send a copy of the line directly the original requester. With DCT enabled, the HN may also request that the snoopee to send the data to both the HN and the original requester, so the HN can also cache the data. This depends on the allocation policy defined by the configuration parameters. Notice that the allocation policy also changes the cache state transitions. For simplicity, the figure above illustrates an inclusive cache.
The following is a list of the main configuration parameters of the cache controller that affect the protocol behavior (please refer to the protocol SLICC specification for details and a full list of parameters)
downstream_destinations
: defines the destinations for requests sent downstream and is used to build the cache hierarchy. Refer to the create_system
function in configs/ruby/CHI.py
for an example of how to setup a system with private L1I, L1D and L2 caches for each core.is_HN
: Set when the controller is used as a home node and point of coherency for an address range. Must be false for every other cache level.enable_DMT
and enable_DCT
: when the controller is a home node, this enables direct memory transfers and direct cache transfers for incoming read requests.allow_SD
: allow the shared dirty state. This switches between MOESI and MESI operation.alloc_on_readshared
, alloc_on_readunique
, and alloc_on_readonce
: whether or not to allocate a cache block to store data used to respond to the corresponding read request.alloc_on_writeback
: whether or not to allocate a cache block to store data received from a writeback request.dealloc_on_unique
and dealloc_on_shared
: deallocate the local cache block if the line becomes unique or shared in an upstream cache.dealloc_backinv_unique
and dealloc_backinv_shared
: if a local cache block is deallocated due to a replacement, also invalidates any unique or shared copy of the line in upstream caches.number_of_TBEs
,number_of_snoop_TBEs
, and number_of_repl_TBEs
: number of entries in the TBE tables for incoming requests, incoming snoops, and replacements.unify_repl_TBEs
: replacements use the same TBE slot as the request that triggered it. In this case number_of_repl_TBEs
is ignored.These parameters affect the cache controller performance:
read_hit_latency
and read_miss_latency
: pipeline latencies for a read request thar hits or misses in the local cache, respectively.snoop_latency
: pipeline latency for an incoming snoop.write_fe_latency
and write_be_latency
: front-end and back-end pipeline latencies for handling write requests. Front-end latency is applied between sending the acknowledgement response and the next action to be taken. Back-end is applied to the requester between receiving the acknowledgement and sending the write data.allocation_latency
: latency between TBE allocation and transaction initialization.cache
: CacheMemory
attached to this controller includes parameters such as size, associativity, tag and data latency, and number of banks.Section Protocol implementation gives an overview of the protocol implementation while Section Supported CHI transactions describe the implemented subset of the the AMBA 5 CHI spec. The next sections refer to specific files in the protocol source code and include SLICC snippets of the protocol. Some snippets where slightly simplified compared to the actual SLICC specification.
The Figure below gives an overview of the cache controller implementation.
In Ruby, a cache controller is implemented by defining a state machine using SLICC language. Transitions in the state machine are triggered by messages arriving at input queues. On our particular implementation, separate incoming and outgoing messages queues are defined for each CHI channel. Incoming request and snoop messages that start a new transaction go through the same Request allocation process, where we allocate a transaction buffer entry (TBE) and move the request or snoop to an internal queue of transactions that are ready to be initiated. If the transaction buffer is full, the request is rejected and a retry message is sent.
The actions to be performed for a message dequeued from the input / rdy queues depends on the state of the target cache line. The data state of the line is stored in the cache if the line is cached locally, while the directory state is stored in a directory entry if the line is present in any upstream cache. For lines with outstanding requests, the transient state is kept in the TBE and copied back to the cache and/or directory when the transaction finishes. The figure below describes the phases in the transaction lifetime and the interactions between the main components in the cache controller (input/output ports, TBETable, Cache, Directory and the SLICC state machine). The phases are described in more details in the subsequent sections.
The code snippet below shows how an incoming request in the reqIn
port is handled. The reqIn
port receives incoming messages from CHI's request channel:
in_port(reqInPort, CHIRequestMsg, reqIn) { if (reqInPort.isReady(clockEdge())) { peek(reqInPort, CHIRequestMsg) { if (in_msg.allowRetry) { trigger(Event:AllocRequest, in_msg.addr, getCacheEntry(in_msg.addr), getCurrentActiveTBE(in_msg.addr)); } else { trigger(Event:AllocRequestWithCredit, in_msg.addr, getCacheEntry(in_msg.addr), getCurrentActiveTBE(in_msg.addr)); } } } }
The allowRetry
field indicates messages that can be retried. Requests that cannot be retried are only sent by a requester that previously received credit (see RetryAck
and PCrdGrant
in the CHI specification). The transition triggered by Event:AllocRequest
or Event:AllocRequestWithCredit
executes a single action which either reserves space in the TBE table for the request and moves it to the reqRdy
queue, or sends a RetryAck
message):
action(AllocateTBE_Request) { if (storTBEs.areNSlotsAvailable(1)) { // reserve a slot for this request storTBEs.incrementReserved(); // Move request to rdy queue peek(reqInPort, CHIRequestMsg) { enqueue(reqRdyOutPort, CHIRequestMsg, allocation_latency) { out_msg := in_msg; } } } else { // we don't have resources to track this request; enqueue a retry peek(reqInPort, CHIRequestMsg) { enqueue(retryTriggerOutPort, RetryTriggerMsg, 0) { out_msg.addr := in_msg.addr; out_msg.event := Event:SendRetryAck; out_msg.retryDest := in_msg.requestor; retryQueue.emplace(in_msg.addr,in_msg.requestor); } } } reqInPort.dequeue(clockEdge()); }
Notice we don’t create and send a RetryAck
message directly from this action. Instead we create a separate trigger event in the internal retryTrigger
queue. This is necessary to prevent resource stalls from halting this action. Section Performance modeling below explains resource stalls in more details.
Incoming request from a Sequencer
object (typically connected to a CPU when the controller is used as a L1 cache) and snoop requests arrive through the seqIn
and snpIn
ports and are handled similarly, except for:
Once a request has been allocated a TBE and moved to the reqRdy
queue, an event is triggered to initiate the transaction. We trigger a different event for each different request type:
in_port(reqRdyPort, CHIRequestMsg, reqRdy) { if (reqRdyPort.isReady(clockEdge())) { peek(reqRdyPort, CHIRequestMsg) { CacheEntry cache_entry := getCacheEntry(in_msg.addr); TBE tbe := getCurrentActiveTBE(in_msg.addr); trigger(reqToEvent(in_msg.type), in_msg.addr, cache_entry, tbe); } } }
Each request requires different initialization actions depending on the initial state of the line. To illustrate this processes, let’s use as example a ReadShared
request for a line in the SC_RSC
state (shared clean in local cache and shared clean in an upstream cache):
transition(SC_RSC, ReadShared, BUSY_BLKD) { Initiate_Request; Initiate_ReadShared_Hit; Profile_Hit; Pop_ReqRdyQueue; ProcessNextState; }
Initiate_Request
initializes the allocated TBE. This actions copies any state and data allocated in the local cache and directory to the TBE.Initiate_ReadShared_Hit
sets-up the set of actions that need to be executed to complete this specific request (see below).Profile_Hit
updates cache statistics.Pop_ReqRdyQueue
removes request message form the reqRdy
queue.ProcessNextState
executes the next action defined by Initiate_ReadShared_Hit
.Initiate_ReadShared_Hit
is defined as follows:
action(Initiate_ReadShared_Hit) { tbe.actions.push(Event:TagArrayRead); tbe.actions.push(Event:ReadHitPipe); tbe.actions.push(Event:DataArrayRead); tbe.actions.push(Event:SendCompData); tbe.actions.push(Event:WaitCompAck); tbe.actions.pushNB(Event:TagArrayWrite); }
tbe.actions
stores the list of events that need to be triggered in order to complete an action. In this particular case, TagArrayRead
, ReadHitPipe
, and DataArrayRead
introduces delays to model the cache controller pipeline latency and reading the cache/directory tag array and cache data array (see Section Performance modeling). SendCompData
sets-up and sends the data responses for the ReadShared
request and WaitCompAck
sets-up the TBE to expect the completion acknowledgement from the requester. Finally, TagArrayWrite
introduces the delay of updating the directory state to track the new sharer.
After initialization, the line will transition to the BUSY_BLKD
state as show in transition(SC_RSC, ReadShared, BUSY_BLKD)
. BUSY_BLKD
is a transient state indicating the line has now an outstanding transaction. In this state, the transaction is driven either by incoming response messages in the rspIn
and datIn
ports or trigger events defined in tbe.actions
.
The ProcessNextState
action is responsible for checking tbe.actions
and enqueuing trigger event messages into actionTriggers
at the end of all transitions to the BUSY_BLKD
state. ProcessNextState
first checks for pending response messages. If there are no pending messages, it enqueues a message to actionTriggers
in order to trigger the the event at the head of tbe.actions
. If there are pending responses, then ProcessNextState
does nothing as the transaction will proceed once all expected responses are received.
Pending responses are tracked by the expected_req_resp
and expected_snp_resp
fields in the TBE. For instance, the ExpectCompAck
action, executed from the transition triggered by WaitCompAck
, is defined as follows:
action(ExpectCompAck) { tbe.expected_req_resp.addExpectedRespType(CHIResponseType:CompAck); tbe.expected_req_resp.addExpectedCount(1); }
This causes the transaction to wait until a CompAck
response is received.
Some actions can be allowed to execute when the transaction has pending responses. This actions are enqueued using tbe.actions.pushNB
(i.e., push / non-blocking). In the example above tbe.actions.pushNB(Event:TagArrayWrite)
models a tag write being performed while the transactions waits for the CompAck
response.
The transaction ends when it has no more pending responses and tbe.actions
is empty. ProcessNextState
checks for this condition and enqueues a “finalizer” trigger message into actionTriggers
. When handling this event, the current cache line state and sharing/ownership information determines the final stable state of the line. Data and state information are updated in the cache and directory, if necessary, and the TBE is deallocated.
Each controller allows only one active transaction per cache line. If a new request or snoop arrives while the cache line is in a transient state, this creates a hazard as defined in the CHI standard. We handle hazards as follows:
Request hazards: a TBE is allocated as described previously, but the new transaction initialization is delayed until the current transaction finishes and the line is back to a stable state. This is done by moving the request message from reqRdy
to a separate stall buffer. All stalled messages are added back to reqRdy
when the current transaction finishes and are handled in their original order of arrival.
Snoop hazards: the CHI spec does not allow snoops to be stalled by an existing request. If a transaction is waiting on a response for a request sent downstream (e.g. we sent a ReadShared
and are waiting for the data response) we must accept and handle the snoop. The snoop can be stalled only if the request has already been accepted by the responder and is guaranteed to complete (e.g. a ReadShared
with pending data but already acked with a RespSepData
response). To distinguish between these conditions we use the BUSY_INTR
transient state.
BUSY_INTR
indicates the transaction can be interrupted by a snoop. When a snoop arrives for a line in this state, a snoop TBE is allocated as described previously and its state is initialized based on the currently active TBE. The snoop TBE then becomes the currently active TBE. Any cache state and sharing/ownership changes caused by snoop are copied back to the original TBE before deallocating the snoop. When a snoop arrives for a line in BUSY_BLKD
state, we stall the snoop until the current transaction either finishes or transitions to BUSY_INTR
.
As described previously, the cache line state is known immediately when a transaction is initialized and the cache line can be read and written without any latency. This makes it easier to implement the functional aspects of the protocol. To model timing we use explicit actions to introduce latency to a transaction. For example, in the ReadShared
code snippet:
action(Initiate_ReadShared_Hit) { tbe.actions.push(Event:TagArrayRead); tbe.actions.push(Event:ReadHitPipe); tbe.actions.push(Event:DataArrayRead); tbe.actions.push(Event:SendCompData); tbe.actions.push(Event:WaitCompAck); tbe.actions.pushNB(Event:TagArrayWrite); }
TagArrayRead
, ReadHitPipe
, DataArrayRead
, and TagArrayWrite
don’t have any functional significance. They are there to introduce latencies that would exist in a real cache controller pipeline, in this case: tag read latency, hit pipeline latency, data array read latency, and tag update latency. The latency introduced by these action is defined by configuration parameters.
In addition to explicitly added latencies. SLICC has the concept of resource stalls to model resource contention. Given a set of actions executed during a transition, the SLICC compiler automatically generates code which checks if all resources needed by those actions are available. If any resource is unavailable, a resource stall is generated and the transition is not executed. A message that causes a resource stall remains in the input queue and the protocol attempts to trigger the transition again the next cycle.
Resources are detected by the SLICC compiler in different ways:
check_allocate
statement to an action.We use (2) to check availability of TBEs. See the snippet below:
action(AllocateTBE_Snoop) { // No retry for snoop requests; just create resource stall check_allocate(storSnpTBEs); ... }
This signals the SLICC compiler to check if the storSnpTBEs
structure has a TBE slot available before executing any transition that includes the AllocateTBE_Snoop
action.
The snippet below exemplifies (3):
transition({BUSY_INTR,BUSY_BLKD}, DataArrayWrite) {DataArrayWrite} { ... }
The DataArrayWrite
annotation signals the SLICC compiler to check for availability of the DataArrayWrite
resource type. Resource request types used in these annotations must be explicitly defined by the protocol, as well as how to check them. In our protocol we defined the following types to check for the availability of banks in the cache tag and data arrays:
enumeration(RequestType) { TagArrayRead; TagArrayWrite; DataArrayRead; DataArrayWrite; } void recordRequestType(RequestType request_type, Addr addr) { if (request_type == RequestType:DataArrayRead) { cache.recordRequestType(CacheRequestType:DataArrayRead, addr); } ... } bool checkResourceAvailable(RequestType request_type, Addr addr) { if (request_type == RequestType:DataArrayRead) { return cache.checkResourceAvailable(CacheResourceType:DataArray, addr); } ... }
The implementation of checkResourceAvailable
and recordRequestType
are required by SLICC compiler when we use annotations on transactions.
Consider the following transaction initialization code for a ReadShared miss:
action(Initiate_ReadShared_Miss) { tbe.actions.push(Event:ReadMissPipe); tbe.actions.push(Event:TagArrayRead); tbe.actions.push(Event:SendReadShared); tbe.actions.push(Event:SendCompData); tbe.actions.push(Event:WaitCompAck); tbe.actions.push(Event:CheckCacheFill); tbe.actions.push(Event:TagArrayWrite); }
All transactions that modify a cache line or received cache line data as a result of a snoop or a request sent downstream, use the CheckCacheFill
action trigger event. This event triggers a transition that perform the following actions:
When a replacement is performed, a new transaction is initialized to keep track of any WriteBack or Evict request sent downstream and/or snoops for backinvalidation (if the cache controller is configured the enforce inclusivity). Depending on the configuration parameters, the TBE for the replacement uses resources from a dedicated TBETable or reuses the same resources of the TBE that triggered the replacement. In both cases, the transaction that triggered the replacement completes without waiting for the replacement process.
Notice CheckCacheFill
does not actually writes data to the cache block. If only ensures a cache block is allocated if needed, triggers replacements, and models the cache fill latencies. As described previously, TBE data is copied to the cache if needed during the transaction finalization.
All transactions are implemented as described in the AMBA5 CHI Issue D specification. The next sections provide a more detailed explanation of the implementation-specific choices not fixed by the public document.
The following incoming requests are supported:
ReadShared
ReadNotSharedDirty
ReadUnique
CleanUnique
ReadOnce
WriteUniquePtl
and WriteUniqueFull
When receiving any request the clusivity configuration parameters are evaluated during the transaction initialization and the doCacheFill
and dataToBeInvalid
flags are set in the transaction buffer entry allocated for the request. doCacheFill
indicates we should keep any valid copy of the line in the local cache;dataToBeInvalid
indicates we must invalidate the local copy when completing the transaction.
When receiving ReadShared
or ReadUnique
, if the data is present at the local cache in the required state (e.g. UC
or UD
for ReadUnique
), a CompData
response is send to the requester. The response type depends on the value of dataToBeInvalid
.
dataToBeInvalid==true
ReadNotSharedDirty
, CompData_SC
is always sent if local state is SD
and the line is written-back using WriteCleanFull
ReadUnique
: propagate dirty state, i.e., CompData_UD
or CompData_UC
.ReadShared
or ReadNotSharedDirty
: send CompData_SC
. If fwd_unique_on_readshared
configuration parameter is set, the ReadShared
is handled as a ReadUnique
if the line doesn't have other sharers.When receiving a ReadOnce
, CompData_I
is always sent if the data is present at the local cache. For WriteUniquePtl
handling see below.
If there is a cache miss, multiple actions may be performed depending on whether or not doCacheFill
and dataToBeInvalid==false
; and DCT or DMT is enabled:
ReadShared
/ ReadNotSharedDirty
:RSD
or RU
:SnpShared
to owner; cache the line locally (if doCacheFill
) and send response to requester.SnpSharedFwd
to owner; if doCacheFill==true
, the retToSrc
field is set so the line can be cached locally.RSC
:SnpOnce
to one of the sharers; cache the line locally (if doCacheFill
) and send response to requester.SnpSharedFwd
to one of the sharers; if doCacheFill==true
, the retToSrc
field is set so the line can be cached locally.ReadShared
/ ReadNotSharedDirty
or ReadNoSnp
(if HNF). In the HNF configuration, ReadNoSnp
is issued with DMT if DMT is enabled.ReadNotSharedDirty
, SnpNotSharedDirty
and SnpNotSharedDirtyFwd
is sent instead.ReadUnique
:RU,RUSD,RUSC
:SnpUnique
to owner; cache the line locally (if doCacheFill
) and sent response to requester.SnpUniqueFwd
to owner.RSC
/RSD
:SnpUnique
with retToSrc=true
to invalidate sharers and obtain dirty line (in case of RSD
)CleanUnique
downstream to obtain unique permissions.ReadUnique
or ReadNoSnp
(if HNF). In the HNF configuration, ReadNoSnp
is issued with DMT if DMT is enabled.RUSC
amd RSC
, if multiple sharers, only one sharer is selected as target of the above snoops. The other sharers are invalidated using SnpUnique
with retToSrc=false
.ReadOnce
:SnpOnce
to one of the sharers; send received data response to requester.SnpOnceFwd
to one of the sharers.ReadOnce
or ReadNoSnp
(if HNF). In the HNF configuration, ReadNoSnp
is issued with DMT if DMT is enabled.CleanUnique
:SnpCleanInvalid
to all sharers/owner except original requestor.CleanUnique
downstream to obtain unique permissions.doCacheFill==false
: writeback the line with WriteCleanFull
.WriteUniquePtl
/WriteUniqueFull
:SnpCleanInvalid
if there are any sharers.SnpCleanInvalid
if there are any sharers.doCacheFill
set, cache the line locally, otherwise writeback to memory (WriteNoSnp
or WriteNoSnpPtl
).WriteUniquePtl
and any received data to the downstream cache.The cache controller issues and accepts the following snoops:
SnpShared
and SnpSharedFwd
SnpNotSharedDirty
and SnpNotSharedDirtyFwd
SnpUnique
and SnpUniqueFwd
SnpCleanInvalid
SnpOnce
and SnpOnceFwd
The snoop response is generated according to the current state of the line as defined in the specification. Data is returned with the snoop response depending on the data state and the value of retToSrc
set by the snooper. If retToSrc
is set, the snoop response always includes data.
SnpShared
/ SnpNotSharedDirty
:retToSrc
.retToSrc
is set if the snooper needs to cache the line.SnpUnique
:retToSrc
.retToSrc
is set if the snooper needs to cache the line.SnpCleanInvalid
:SnpSharedFwd
:retToSrc
is set if the snooper needs to cache the line.SnpNotSharedDirtyFwd
:retToSrc
is set if the snooper needs to cache the line.SnpUniqueFwd
:SnpOnce
:retToSrc=true
and snoopee always returns data.SnpOnceFwd
:If the snoopee has sharers in any state, the same request is sent upstream to all sharers. For SnpSharedFwd/SnpNotSharedDirtyFwd and SnpUniqueFwd, a SnpShared/SnpNotSharedFwd or SnpUnique is sent, respectively. For a received SnpOnce, a SnpOnce is sent upstream only if the line is not present locally. In this particular implementation, there is always a directory entry for upstream caches that have the line. Snoops are never sent to caches that do not have the line.
A writeback is triggered internally by the controller when a cache line needs to be evicted due to capacity reasons (cache maintenance operations are currently not supported). See Section Cache block allocation and replacement modeling for more information on replacements. These internal events are generated depending on the configurations parameters of the controller:
GlobalEviction
: evict a line from the current and all upstream caches. This applies if dealloc_backinv_unique
or dealloc_backinv_shared
parameters are set.LocalEviction
: evict a line without backinvaliding upstream caches.First we deallocate the local cache block (so the request that cause the eviction can allocate a new block and finish). For GlobalEviction, a SnpCleanInvalid
is sent to all upstream caches. Once all snoops responses are received (possibly with dirty data), a LocalEviction is performed. The LocalEviction is done by issuing the appropriate request as follows:
WriteBackFull
, if the the line is dirtyWriteEvictFull
, if the line is unique and cleanWriteCleanFull
, if the the line is dirty, but there are clean sharersEvict
, if the line is shared and cleanFor a HNF configuration the behavior changes slightly: WriteNoSnp
to the SNF is used instead of WriteBackFull
and no requests are issued if the line is clean.
The WriteBack*
and Evict
requests are handled at the downstream cache as follows:
WriteBackFull
/ WriteEvictFull
/ WriteCleanFull
:alloc_on_writeback
, a cache block may need to be allocated. If there are no free blocks, a LocalEviction is triggered for a cache line in the target cache set. The victim line is selected based on the replacement policy implemented by object pointed by the cache
parameter (which can be configured separately).CompDBIDResp
to the requester.WriteBackFull
/ WriteEvictFull
).Evict
:Comp\_I
.A request for a line that currently has an outstanding transaction is always stalled until the transaction completes. Snoops received while there is an outstanding request are handled following the requirements in the specification:
CleanUnique
:CleanUnique
response, it immediately follows up with a ReadUnique
.WriteBackFull
/WriteEvictFull
/WriteCleanFull
that have not yet been acked with a CompDBIDResp
; or Evict before Comp_I
is received:ReadUnique
that required snooping upstream cachesMultiple snoops may be received while there is an outstanding transaction. In this particular implementation, a SnpShared
or SnpSharedFwd
may be followed by a SnpUnique
or SnpCleanInvalid
. However, it's not possible to have concurrent snoops coming from the downstream cache.
Both incoming requests and snoops require the allocation of a TBE. To prevent deadlocks when transaction buffers are full, a separate buffer is used to allocate snoop TBEs. Snoops do not allow retry, so if the snoop TBE table is full messages in the snpIn port are stalled, potentially causing severe congestion in the snoop channel in the interconnect.
ReadNoSnpSep
instead of ReadNoSnp
if the enable_DMT_early_dealloc
configuration parameter is set. This allow the HNF to deallocate the TBE earlier.ReadReceipt
responses are never used except for ReadNoSnpSep
. Request ordering, when required, is enforced by Ruby by serializing requests at the requester. At the cache controller, requests to the same line are handled in the order of arrival. Requests to different lines can be handled in any order, however they are typically handled in order of arrival given that there are resources available.ReadUnique
at the protocol level.CompAck
response is always sent when stated as optional in the spec. Requesters always wait for CompAck
(if required or optional) before finalizing the transaction and deallocating resources.Comp
and DBIDresp
used only for WriteUnique
requests. DBIDresp
is sent after receiving all snoop responses; Comp
is sent after DBIDresp
and accounting for the front-end write latency (write_fe_latency
).DoNotGoToSD
field is not implemented.CBusy
is not implemented.WriteDataCancel
responses are never used.