Andrew Bardsley | 0e8a90f | 2014-07-23 16:09:04 -0500 | [diff] [blame] | 1 | # Copyright (c) 2014 ARM Limited |
| 2 | # All rights reserved |
| 3 | # |
| 4 | # The license below extends only to copyright in the software and shall |
| 5 | # not be construed as granting a license to any other intellectual |
| 6 | # property including but not limited to intellectual property relating |
| 7 | # to a hardware implementation of the functionality of the software |
| 8 | # licensed hereunder. You may use the software subject to the license |
| 9 | # terms below provided that you ensure that this notice is replicated |
| 10 | # unmodified and in its entirety in all distributions of the software, |
| 11 | # modified or unmodified, in source code or in binary form. |
| 12 | # |
| 13 | # Redistribution and use in source and binary forms, with or without |
| 14 | # modification, are permitted provided that the following conditions are |
| 15 | # met: redistributions of source code must retain the above copyright |
| 16 | # notice, this list of conditions and the following disclaimer; |
| 17 | # redistributions in binary form must reproduce the above copyright |
| 18 | # notice, this list of conditions and the following disclaimer in the |
| 19 | # documentation and/or other materials provided with the distribution; |
| 20 | # neither the name of the copyright holders nor the names of its |
| 21 | # contributors may be used to endorse or promote products derived from |
| 22 | # this software without specific prior written permission. |
| 23 | # |
| 24 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS |
| 25 | # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT |
| 26 | # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR |
| 27 | # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT |
| 28 | # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, |
| 29 | # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT |
| 30 | # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, |
| 31 | # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY |
| 32 | # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| 33 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
| 34 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 35 | # |
| 36 | # Authors: Andrew Bardsley |
| 37 | |
| 38 | namespace Minor |
| 39 | { |
| 40 | |
| 41 | /*! |
| 42 | |
| 43 | \page minor Inside the Minor CPU model |
| 44 | |
| 45 | \tableofcontents |
| 46 | |
| 47 | This document contains a description of the structure and function of the |
| 48 | Minor gem5 in-order processor model. It is recommended reading for anyone who |
| 49 | wants to understand Minor's internal organisation, design decisions, C++ |
| 50 | implementation and Python configuration. A familiarity with gem5 and some of |
| 51 | its internal structures is assumed. This document is meant to be read |
| 52 | alongside the Minor source code and to explain its general structure without |
| 53 | being too slavish about naming every function and data type. |
| 54 | |
| 55 | \section whatis What is Minor? |
| 56 | |
| 57 | Minor is an in-order processor model with a fixed pipeline but configurable |
| 58 | data structures and execute behaviour. It is intended to be used to model |
| 59 | processors with strict in-order execution behaviour and allows visualisation |
| 60 | of an instruction's position in the pipeline through the |
| 61 | MinorTrace/minorview.py format/tool. The intention is to provide a framework |
| 62 | for micro-architecturally correlating the model with a particular, chosen |
| 63 | processor with similar capabilities. |
| 64 | |
| 65 | \section philo Design philosophy |
| 66 | |
| 67 | \subsection mt Multithreading |
| 68 | |
| 69 | The model isn't currently capable of multithreading but there are THREAD |
| 70 | comments in key places where stage data needs to be arrayed to support |
| 71 | multithreading. |
| 72 | |
| 73 | \subsection structs Data structures |
| 74 | |
| 75 | Decorating data structures with large amounts of life-cycle information is |
| 76 | avoided. Only instructions (MinorDynInst) contain a significant proportion of |
| 77 | their data content whose values are not set at construction. |
| 78 | |
| 79 | All internal structures have fixed sizes on construction. Data held in queues |
| 80 | and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to |
| 81 | allow a distinct 'bubble'/no data value option for each type. |
| 82 | |
| 83 | Inter-stage 'struct' data is packaged in structures which are passed by value. |
| 84 | Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing |
| 85 | objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while |
| 86 | running the model. |
| 87 | |
| 88 | \section model Model structure |
| 89 | |
| 90 | Objects of class MinorCPU are provided by the model to gem5. MinorCPU |
| 91 | implements the interfaces of (cpu.hh) and can provide data and |
| 92 | instruction interfaces for connection to a cache system. The model is |
| 93 | configured in a similar way to other gem5 models through Python. That |
| 94 | configuration is passed on to MinorCPU::pipeline (of class Pipeline) which |
| 95 | actually implements the processor pipeline. |
| 96 | |
| 97 | The hierarchy of major unit ownership from MinorCPU down looks like this: |
| 98 | |
| 99 | <ul> |
| 100 | <li>MinorCPU</li> |
| 101 | <ul> |
| 102 | <li>Pipeline - container for the pipeline, owns the cyclic 'tick' |
| 103 | event mechanism and the idling (cycle skipping) mechanism.</li> |
| 104 | <ul> |
| 105 | <li>Fetch1 - instruction fetch unit responsible for fetching cache |
| 106 | lines (or parts of lines from the I-cache interface)</li> |
| 107 | <ul> |
| 108 | <li>Fetch1::IcachePort - interface to the I-cache from |
| 109 | Fetch1</li> |
| 110 | </ul> |
| 111 | <li>Fetch2 - line to instruction decomposition</li> |
| 112 | <li>Decode - instruction to micro-op decomposition</li> |
| 113 | <li>Execute - instruction execution and data memory |
| 114 | interface</li> |
| 115 | <ul> |
| 116 | <li>LSQ - load store queue for memory ref. instructions</li> |
| 117 | <li>LSQ::DcachePort - interface to the D-cache from |
| 118 | Execute</li> |
| 119 | </ul> |
| 120 | </ul> |
| 121 | </ul> |
| 122 | </ul> |
| 123 | |
| 124 | \section keystruct Key data structures |
| 125 | |
| 126 | \subsection ids Instruction and line identity: InstId (dyn_inst.hh) |
| 127 | |
| 128 | An InstId contains the sequence numbers and thread numbers that describe the |
| 129 | life cycle and instruction stream affiliations of individual fetched cache |
| 130 | lines and instructions. |
| 131 | |
| 132 | An InstId is printed in one of the following forms: |
| 133 | |
| 134 | - T/S.P/L - for fetched cache lines |
| 135 | - T/S.P/L/F - for instructions before Decode |
| 136 | - T/S.P/L/F.E - for instructions from Decode onwards |
| 137 | |
| 138 | for example: |
| 139 | |
| 140 | - 0/10.12/5/6.7 |
| 141 | |
| 142 | InstId's fields are: |
| 143 | |
| 144 | <table> |
| 145 | <tr> |
| 146 | <td><b>Field</b></td> |
| 147 | <td><b>Symbol</b></td> |
| 148 | <td><b>Generated by</b></td> |
| 149 | <td><b>Checked by</b></td> |
| 150 | <td><b>Function</b></td> |
| 151 | </tr> |
| 152 | |
| 153 | <tr> |
| 154 | <td>InstId::threadId</td> |
| 155 | <td>T</td> |
| 156 | <td>Fetch1</td> |
| 157 | <td>Everywhere the thread number is needed</td> |
| 158 | <td>Thread number (currently always 0).</td> |
| 159 | </tr> |
| 160 | |
| 161 | <tr> |
| 162 | <td>InstId::streamSeqNum</td> |
| 163 | <td>S</td> |
| 164 | <td>Execute</td> |
| 165 | <td>Fetch1, Fetch2, Execute (to discard lines/insts)</td> |
| 166 | <td>Stream sequence number as chosen by Execute. Stream |
| 167 | sequence numbers change after changes of PC (branches, exceptions) in |
| 168 | Execute and are used to separate pre and post branch instruction |
| 169 | streams.</td> |
| 170 | </tr> |
| 171 | |
| 172 | <tr> |
| 173 | <td>InstId::predictionSeqNum</td> |
| 174 | <td>P</td> |
| 175 | <td>Fetch2</td> |
| 176 | <td>Fetch2 (while discarding lines after prediction)</td> |
| 177 | <td>Prediction sequence numbers represent branch prediction decisions. |
| 178 | This is used by Fetch2 to mark lines/instructions according to the last |
| 179 | followed branch prediction made by Fetch2. Fetch2 can signal to Fetch1 |
| 180 | that it should change its fetch address and mark lines with a new |
| 181 | prediction sequence number (which it will only do if the stream sequence |
| 182 | number Fetch1 expects matches that of the request). </td> </tr> |
| 183 | |
| 184 | <tr> |
| 185 | <td>InstId::lineSeqNum</td> |
| 186 | <td>L</td> |
| 187 | <td>Fetch1</td> |
| 188 | <td>(Just for debugging)</td> |
| 189 | <td>Line fetch sequence number of this cache line or the line |
| 190 | this instruction was extracted from. |
| 191 | </td> |
| 192 | </tr> |
| 193 | |
| 194 | <tr> |
| 195 | <td>InstId::fetchSeqNum</td> |
| 196 | <td>F</td> |
| 197 | <td>Fetch2</td> |
| 198 | <td>Fetch2 (as the inst. sequence number for branches)</td> |
| 199 | <td>Instruction fetch order assigned by Fetch2 when lines |
| 200 | are decomposed into instructions. |
| 201 | </td> |
| 202 | </tr> |
| 203 | |
| 204 | <tr> |
| 205 | <td>InstId::execSeqNum</td> |
| 206 | <td>E</td> |
| 207 | <td>Decode</td> |
| 208 | <td>Execute (to check instruction identity in queues/FUs/LSQ)</td> |
| 209 | <td>Instruction order after micro-op decomposition.</td> |
| 210 | </tr> |
| 211 | |
| 212 | </table> |
| 213 | |
| 214 | The sequence number fields are all independent of each other and although, for |
| 215 | instance, InstId::execSeqNum for an instruction will always be >= |
| 216 | InstId::fetchSeqNum, the comparison is not useful. |
| 217 | |
| 218 | The originating stage of each sequence number field keeps a counter for that |
| 219 | field which can be incremented in order to generate new, unique numbers. |
| 220 | |
| 221 | \subsection insts Instructions: MinorDynInst (dyn_inst.hh) |
| 222 | |
| 223 | MinorDynInst represents an instruction's progression through the pipeline. An |
| 224 | instruction can be three things: |
| 225 | |
| 226 | <table> |
| 227 | <tr> |
| 228 | <td><b>Thing</b></td> |
| 229 | <td><b>Predicate</b></td> |
| 230 | <td><b>Explanation</b></td> |
| 231 | </tr> |
| 232 | <tr> |
| 233 | <td>A bubble</td> |
| 234 | <td>MinorDynInst::isBubble()</td> |
| 235 | <td>no instruction at all, just a space-filler</td> |
| 236 | </tr> |
| 237 | <tr> |
| 238 | <td>A fault</td> |
| 239 | <td>MinorDynInst::isFault()</td> |
| 240 | <td>a fault to pass down the pipeline in an instruction's clothing</td> |
| 241 | </tr> |
| 242 | <tr> |
| 243 | <td>A decoded instruction</td> |
| 244 | <td>MinorDynInst::isInst()</td> |
| 245 | <td>instructions are actually passed to the gem5 decoder in Fetch2 and so |
| 246 | are created fully decoded. MinorDynInst::staticInst is the decoded |
| 247 | instruction form.</td> |
| 248 | </tr> |
| 249 | </table> |
| 250 | |
| 251 | Instructions are reference counted using the gem5 RefCountingPtr |
| 252 | (base/refcnt.hh) wrapper. They therefore usually appear as MinorDynInstPtr in |
| 253 | code. Note that as RefCountingPtr initialises as nullptr rather than an |
| 254 | object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to |
| 255 | Queue%s and other similar structures from stage.hh without boxing is |
| 256 | dangerous. |
| 257 | |
| 258 | \subsection fld ForwardLineData (pipe_data.hh) |
| 259 | |
| 260 | ForwardLineData is used to pass cache lines from Fetch1 to Fetch2. Like |
| 261 | MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()), |
| 262 | fault-carrying or can contain a line (partial line) fetched by Fetch1. The |
| 263 | data carried by ForwardLineData is owned by a Packet object returned from |
| 264 | memory and is explicitly memory managed and do must be deleted once processed |
| 265 | (by Fetch2 deleting the Packet). |
| 266 | |
| 267 | \subsection fid ForwardInstData (pipe_data.hh) |
| 268 | |
| 269 | ForwardInstData can contain up to ForwardInstData::width() instructions in its |
| 270 | ForwardInstData::insts vector. This structure is used to carry instructions |
| 271 | between Fetch2, Decode and Execute and to store input buffer vectors in Decode |
| 272 | and Execute. |
| 273 | |
| 274 | \subsection fr Fetch1::FetchRequest (fetch1.hh) |
| 275 | |
| 276 | FetchRequests represent I-cache line fetch requests. The are used in the |
| 277 | memory queues of Fetch1 and are pushed into/popped from Packet::senderState |
| 278 | while traversing the memory system. |
| 279 | |
| 280 | FetchRequests contain a memory system Request (mem/request.hh) for that fetch |
| 281 | access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a |
| 282 | fault field that can be populated with a TLB-sourced prefetch fault (if any). |
| 283 | |
| 284 | \subsection lsqr LSQ::LSQRequest (execute.hh) |
| 285 | |
| 286 | LSQRequests are similar to FetchRequests but for D-cache accesses. They carry |
| 287 | the instruction associated with a memory access. |
| 288 | |
| 289 | \section pipeline The pipeline |
| 290 | |
| 291 | \verbatim |
| 292 | ------------------------------------------------------------------------------ |
| 293 | Key: |
| 294 | |
| 295 | [] : inter-stage BufferBuffer |
| 296 | ,--. |
| 297 | | | : pipeline stage |
| 298 | `--' |
| 299 | ---> : forward communication |
| 300 | <--- : backward communication |
| 301 | |
| 302 | rv : reservation information for input buffers |
| 303 | |
| 304 | ,------. ,------. ,------. ,-------. |
| 305 | (from --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1 |
| 306 | Execute) | | |<-[]-| |<-rv-| |<-rv-| | & Fetch2) |
| 307 | | `------'<-rv-| | | | | | |
| 308 | `-------------->| | | | | | |
| 309 | `------' `------' `-------' |
| 310 | ------------------------------------------------------------------------------ |
| 311 | \endverbatim |
| 312 | |
| 313 | The four pipeline stages are connected together by MinorBuffer FIFO |
| 314 | (stage.hh, derived ultimately from TimeBuffer) structures which allow |
| 315 | inter-stage delays to be modelled. There is a MinorBuffer%s between adjacent |
| 316 | stages in the forward direction (for example: passing lines from Fetch1 to |
| 317 | Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction |
| 318 | carrying branch predictions. |
| 319 | |
| 320 | Stages Fetch2, Decode and Execute have input buffers which, each cycle, can |
| 321 | accept input data from the previous stage and can hold that data if the stage |
| 322 | is not ready to process it. Input buffers store data in the same form as it |
| 323 | is received and so Decode and Execute's input buffers contain the output |
| 324 | instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages |
| 325 | with the instructions and bubbles in the same positions as a single buffer |
| 326 | entry. |
| 327 | |
| 328 | Stage input buffers provide a Reservable (stage.hh) interface to their |
| 329 | previous stages, to allow slots to be reserved in their input buffers, and |
| 330 | communicate their input buffer occupancy backwards to allow the previous stage |
| 331 | to plan whether it should make an output in a given cycle. |
| 332 | |
| 333 | \subsection events Event handling: MinorActivityRecorder (activity.hh, |
| 334 | pipeline.hh) |
| 335 | |
| 336 | Minor is essentially a cycle-callable model with some ability to skip cycles |
| 337 | based on pipeline activity. External events are mostly received by callbacks |
| 338 | (e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken |
| 339 | up to service advancing request queues. |
| 340 | |
| 341 | Ticked (sim/ticked.hh) is a base class bringing together an evaluate |
| 342 | member function and a provided SimObject. It provides a Ticked::start/stop |
| 343 | interface to start and pause clock events from being periodically issued. |
| 344 | Pipeline is a derived class of Ticked. |
| 345 | |
| 346 | During evaluate calls, stages can signal that they still have work to do in |
| 347 | the next cycle by calling either MinorCPU::activityRecorder->activity() (for |
| 348 | non-callable related activity) or MinorCPU::wakeupOnEvent(<stageId>) (for |
| 349 | stage callback-related 'wakeup' activity). |
| 350 | |
| 351 | Pipeline::evaluate contains calls to evaluate for each unit and a test for |
| 352 | pipeline idling which can turns off the clock tick if no unit has signalled |
| 353 | that it may become active next cycle. |
| 354 | |
| 355 | Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and |
| 356 | so will ::evaluate in reverse order) and their backwards data can be |
| 357 | read immediately after being written in each cycle allowing output decisions |
| 358 | to be 'perfect' (allowing synchronous stalling of the whole pipeline). Branch |
| 359 | predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making |
| 360 | fetch1ToFetch2BackwardDelay the only configurable delay which can be set as |
| 361 | low as 0 cycles. |
| 362 | |
| 363 | The MinorCPU::activateContext and MinorCPU::suspendContext interface can be |
| 364 | called to start and pause threads (threads in the MT sense) and to start and |
| 365 | pause the pipeline. Executing instructions can call this interface |
| 366 | (indirectly through the ThreadContext) to idle the CPU/their threads. |
| 367 | |
| 368 | \subsection stages Each pipeline stage |
| 369 | |
| 370 | In general, the behaviour of a stage (each cycle) is: |
| 371 | |
| 372 | \verbatim |
| 373 | evaluate: |
| 374 | push input to inputBuffer |
| 375 | setup references to input/output data slots |
| 376 | |
| 377 | do 'every cycle' 'step' tasks |
| 378 | |
| 379 | if there is input and there is space in the next stage: |
| 380 | process and generate a new output |
| 381 | maybe re-activate the stage |
| 382 | |
| 383 | send backwards data |
| 384 | |
| 385 | if the stage generated output to the following FIFO: |
| 386 | signal pipe activity |
| 387 | |
| 388 | if the stage has more processable input and space in the next stage: |
| 389 | re-activate the stage for the next cycle |
| 390 | |
| 391 | commit the push to the inputBuffer if that data hasn't all been used |
| 392 | \endverbatim |
| 393 | |
| 394 | The Execute stage differs from this model as its forward output (branch) data |
| 395 | is unconditionally sent to Fetch1 and Fetch2. To allow this behaviour, Fetch1 |
| 396 | and Fetch2 must be unconditionally receptive to that data. |
| 397 | |
| 398 | \subsection fetch1 Fetch1 stage |
| 399 | |
| 400 | Fetch1 is responsible for fetching cache lines or partial cache lines from the |
| 401 | I-cache and passing them on to Fetch2 to be decomposed into instructions. It |
| 402 | can receive 'change of stream' indications from both Execute and Fetch2 to |
| 403 | signal that it should change its internal fetch address and tag newly fetched |
| 404 | lines with new stream or prediction sequence numbers. When both Execute and |
| 405 | Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's |
| 406 | change. |
| 407 | |
| 408 | Every line issued by Fetch1 will bear a unique line sequence number which can |
| 409 | be used for debugging stream changes. |
| 410 | |
| 411 | When fetching from the I-cache, Fetch1 will ask for data from the current |
| 412 | fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the |
| 413 | parameter fetch1LineSnapWidth. Subsequent autonomous line fetches will fetch |
| 414 | whole lines at a snap boundary and of size fetch1LineWidth. |
| 415 | |
| 416 | Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2 |
| 417 | input buffer. That input buffer serves an the fetch queue/LFL for the system. |
| 418 | |
| 419 | Fetch1 contains two queues: requests and transfers to handle the stages of |
| 420 | translating the address of a line fetch (via the TLB) and accommodating the |
| 421 | request/response of fetches to/from memory. |
| 422 | |
| 423 | Fetch requests from Fetch1 are pushed into the requests queue as newly |
| 424 | allocated FetchRequest objects once they have been sent to the ITLB with a |
| 425 | call to itb->translateTiming. |
| 426 | |
| 427 | A response from the TLB moves the request from the requests queue to the |
| 428 | transfers queue. If there is more than one entry in each queue, it is |
| 429 | possible to get a TLB response for request which is not at the head of the |
| 430 | requests queue. In that case, the TLB response is marked up as a state change |
| 431 | to Translated in the request object, and advancing the request to transfers |
| 432 | (and the memory system) is left to calls to Fetch1::stepQueues which is called |
| 433 | in the cycle following any event is received. |
| 434 | |
| 435 | Fetch1::tryToSendToTransfers is responsible for moving requests between the |
| 436 | two queues and issuing requests to memory. Failed TLB lookups (prefetch |
| 437 | aborts) continue to occupy space in the queues until they are recovered at the |
| 438 | head of transfers. |
| 439 | |
| 440 | Responses from memory change the request object state to Complete and |
| 441 | Fetch1::evaluate can pick up response data, package it in the ForwardLineData |
| 442 | object, and forward it to Fetch2%'s input buffer. |
| 443 | |
| 444 | As space is always reserved in Fetch2::inputBuffer, setting the input buffer's |
| 445 | size to 1 results in non-prefetching behaviour. |
| 446 | |
| 447 | When a change of stream occurs, translated requests queue members and |
| 448 | completed transfers queue members can be unconditionally discarded to make way |
| 449 | for new transfers. |
| 450 | |
| 451 | \subsection fetch2 Fetch2 stage |
| 452 | |
| 453 | Fetch2 receives a line from Fetch1 into its input buffer. The data in the |
| 454 | head line in that buffer is iterated over and separated into individual |
| 455 | instructions which are packed into a vector of instructions which can be |
| 456 | passed to Decode. Packing instructions can be aborted early if a fault is |
| 457 | found in either the input line as a whole or a decomposed instruction. |
| 458 | |
| 459 | \subsubsection bp Branch prediction |
| 460 | |
| 461 | Fetch2 contains the branch prediction mechanism. This is a wrapper around the |
| 462 | branch predictor interface provided by gem5 (cpu/pred/...). |
| 463 | |
| 464 | Branches are predicted for any control instructions found. If prediction is |
| 465 | attempted for an instruction, the MinorDynInst::triedToPredict flag is set on |
| 466 | that instruction. |
| 467 | |
| 468 | When a branch is predicted to take, the MinorDynInst::predictedTaken flag is |
| 469 | set and MinorDynInst::predictedTarget is set to the predicted target PC value. |
| 470 | The predicted branch instruction is then packed into Fetch2%'s output vector, |
| 471 | the prediction sequence number is incremented, and the branch is communicated |
| 472 | to Fetch1. |
| 473 | |
| 474 | After signalling a prediction, Fetch2 will discard its input buffer contents |
| 475 | and will reject any new lines which have the same stream sequence number as |
| 476 | that branch but have a different prediction sequence number. This allows |
| 477 | following sequentially fetched lines to be rejected without ignoring new lines |
| 478 | generated by a change of stream indicated from a 'real' branch from Execute |
| 479 | (which will have a new stream sequence number). |
| 480 | |
| 481 | The program counter value provided to Fetch2 by Fetch1 packets is only updated |
| 482 | when there is a change of stream. Fetch2::havePC indicates whether the PC |
| 483 | will be picked up from the next processed input line. Fetch2::havePC is |
| 484 | necessary to allow line-wrapping instructions to be tracked through decode. |
| 485 | |
| 486 | Branches (and instructions predicted to branch) which are processed by Execute |
| 487 | will generate BranchData (pipe_data.hh) data explaining the outcome of the |
| 488 | branch which is sent forwards to Fetch1 and Fetch2. Fetch1 uses this data to |
| 489 | change stream (and update its stream sequence number and address for new |
| 490 | lines). Fetch2 uses it to update the branch predictor. Minor does not |
| 491 | communicate branch data to the branch predictor for instructions which are |
| 492 | discarded on the way to commit. |
| 493 | |
| 494 | BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios: |
| 495 | |
| 496 | <table> |
| 497 | <tr> |
| 498 | <td>Branch enum val.</td> |
| 499 | <td>In Execute</td> |
| 500 | <td>Fetch1 reaction</td> |
| 501 | <td>Fetch2 reaction</td> |
| 502 | </tr> |
| 503 | <tr> |
| 504 | <td>NoBranch</td> |
| 505 | <td>(output bubble data)</td> |
| 506 | <td>-</td> |
| 507 | <td>-</td> |
| 508 | </tr> |
| 509 | <tr> |
| 510 | <td>CorrectlyPredictedBranch</td> |
| 511 | <td>Predicted, taken</td> |
| 512 | <td>-</td> |
| 513 | <td>Update BP as taken branch</td> |
| 514 | </tr> |
| 515 | <tr> |
| 516 | <td>UnpredictedBranch</td> |
| 517 | <td>Not predicted, taken and was taken</td> |
| 518 | <td>New stream</td> |
| 519 | <td>Update BP as taken branch</td> |
| 520 | </tr> |
| 521 | <tr> |
| 522 | <td>BadlyPredictedBranch</td> |
| 523 | <td>Predicted, not taken</td> |
| 524 | <td>New stream to restore to old inst. source</td> |
| 525 | <td>Update BP as not taken branch</td> |
| 526 | </tr> |
| 527 | <tr> |
| 528 | <td>BadlyPredictedBranchTarget</td> |
| 529 | <td>Predicted, taken, but to a different target than predicted one</td> |
| 530 | <td>New stream</td> |
| 531 | <td>Update BTB to new target</td> |
| 532 | </tr> |
| 533 | <tr> |
| 534 | <td>SuspendThread</td> |
| 535 | <td>Hint to suspend fetching</td> |
| 536 | <td>Suspend fetch for this thread (branch to next inst. as wakeup |
| 537 | fetch addr)</td> |
| 538 | <td>-</td> |
| 539 | </tr> |
| 540 | <tr> |
| 541 | <td>Interrupt</td> |
| 542 | <td>Interrupt detected</td> |
| 543 | <td>New stream</td> |
| 544 | <td>-</td> |
| 545 | </tr> |
| 546 | </table> |
| 547 | |
| 548 | The parameter decodeInputWidth sets the number of instructions which can be |
| 549 | packed into the output per cycle. If the parameter fetch2CycleInput is true, |
| 550 | Decode can try to take instructions from more than one entry in its input |
| 551 | buffer per cycle. |
| 552 | |
| 553 | \subsection decode Decode stage |
| 554 | |
| 555 | Decode takes a vector of instructions from Fetch2 (via its input buffer) and |
| 556 | decomposes those instructions into micro-ops (if necessary) and packs them |
| 557 | into its output instruction vector. |
| 558 | |
| 559 | The parameter executeInputWidth sets the number of instructions which can be |
| 560 | packed into the output per cycle. If the parameter decodeCycleInput is true, |
| 561 | Decode can try to take instructions from more than one entry in its input |
| 562 | buffer per cycle. |
| 563 | |
| 564 | \subsection execute Execute stage |
| 565 | |
| 566 | Execute provides all the instruction execution and memory access mechanisms. |
| 567 | An instructions passage through Execute can take multiple cycles with its |
| 568 | precise timing modelled by a functional unit pipeline FIFO. |
| 569 | |
| 570 | A vector of instructions (possibly including fault 'instructions') is provided |
| 571 | to Execute by Decode and can be queued in the Execute input buffer before |
| 572 | being issued. Setting the parameter executeCycleInput allows execute to |
| 573 | examine more than one input buffer entry (more than one instruction vector). |
| 574 | The number of instructions in the input vector can be set with |
| 575 | executeInputWidth and the depth of the input buffer can be set with parameter |
| 576 | executeInputBufferSize. |
| 577 | |
| 578 | \subsubsection fus Functional units |
| 579 | |
| 580 | The Execute stage contains pipelines for each functional unit comprising the |
| 581 | computational core of the CPU. Functional units are configured via the |
| 582 | executeFuncUnits parameter. Each functional unit has a number of instruction |
| 583 | classes it supports, a stated delay between instruction issues, and a delay |
| 584 | from instruction issue to (possible) commit and an optional timing annotation |
| 585 | capable of more complicated timing. |
| 586 | |
| 587 | Each active cycle, Execute::evaluate performs this action: |
| 588 | |
| 589 | \verbatim |
| 590 | Execute::evaluate: |
| 591 | push input to inputBuffer |
| 592 | setup references to input/output data slots and branch output slot |
| 593 | |
| 594 | step D-cache interface queues (similar to Fetch1) |
| 595 | |
| 596 | if interrupt posted: |
| 597 | take interrupt (signalling branch to Fetch1/Fetch2) |
| 598 | else |
| 599 | commit instructions |
| 600 | issue new instructions |
| 601 | |
| 602 | advance functional unit pipelines |
| 603 | |
| 604 | reactivate Execute if the unit is still active |
| 605 | |
| 606 | commit the push to the inputBuffer if that data hasn't all been used |
| 607 | \endverbatim |
| 608 | |
| 609 | \subsubsection fifos Functional unit FIFOs |
| 610 | |
| 611 | Functional units are implemented as SelfStallingPipelines (stage.hh). These |
| 612 | are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires. They respond |
| 613 | to SelfStallingPipeline::advance in the same way as TimeBuffers <b>unless</b> |
| 614 | there is data at the far, 'pop', end of the FIFO. A 'stalled' flag is |
| 615 | provided for signalling stalling and to allow a stall to be cleared. The |
| 616 | intention is to provide a pipeline for each functional unit which will never |
| 617 | advance an instruction out of that pipeline until it has been processed and |
| 618 | the pipeline is explicitly unstalled. |
| 619 | |
| 620 | The actions 'issue', 'commit', and 'advance' act on the functional units. |
| 621 | |
| 622 | \subsubsection issue Issue |
| 623 | |
| 624 | Issuing instructions involves iterating over both the input buffer |
| 625 | instructions and the heads of the functional units to try and issue |
| 626 | instructions in order. The number of instructions which can be issued each |
| 627 | cycle is limited by the parameter executeIssueLimit, how executeCycleInput is |
| 628 | set, the availability of pipeline space and the policy used to choose a |
| 629 | pipeline in which the instruction can be issued. |
| 630 | |
| 631 | At present, the only issue policy is strict round-robin visiting of each |
| 632 | pipeline with the given instructions in sequence. For greater flexibility, |
| 633 | better (and more specific policies) will need to be possible. |
| 634 | |
| 635 | Memory operation instructions traverse their functional units to perform their |
| 636 | EA calculations. On 'commit', the ExecContext::initiateAcc execution phase is |
| 637 | performed and any memory access is issued (via. ExecContext::{read,write}Mem |
| 638 | calling LSQ::pushRequest) to the LSQ. |
| 639 | |
| 640 | Note that faults are issued as if they are instructions and can (currently) be |
| 641 | issued to *any* functional unit. |
| 642 | |
| 643 | Every issued instruction is also pushed into the Execute::inFlightInsts queue. |
| 644 | Memory ref. instructions are pushing into Execute::inFUMemInsts queue. |
| 645 | |
| 646 | \subsubsection commit Commit |
| 647 | |
| 648 | Instructions are committed by examining the head of the Execute::inFlightInsts |
| 649 | queue (which is decorated with the functional unit number to which the |
| 650 | instruction was issued). Instructions which can then be found in their |
| 651 | functional units are executed and popped from Execute::inFlightInsts. |
| 652 | |
| 653 | Memory operation instructions are committed into the memory queues (as |
| 654 | described above) and exit their functional unit pipeline but are not popped |
| 655 | from the Execute::inFlightInsts queue. The Execute::inFUMemInsts queue |
| 656 | provides ordering to memory operations as they pass through the functional |
| 657 | units (maintaining issue order). On entering the LSQ, instructions are popped |
| 658 | from Execute::inFUMemInsts. |
| 659 | |
| 660 | If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be |
| 661 | sent from their FU to the LSQ before reaching the head of |
| 662 | Execute::inFlightInsts but after their dependencies are met. |
| 663 | MinorDynInst::instToWaitFor is marked up with the latest dependent instruction |
| 664 | execSeqNum required to be committed for a memory operation to progress to the |
| 665 | LSQ. |
| 666 | |
| 667 | Once a memory response is available (by testing the head of |
| 668 | Execute::inFlightInsts against LSQ::findResponse), commit will process that |
| 669 | response (ExecContext::completeAcc) and pop the instruction from |
| 670 | Execute::inFlightInsts. |
| 671 | |
| 672 | Any branch, fault or interrupt will cause a stream sequence number change and |
| 673 | signal a branch to Fetch1/Fetch2. Only instructions with the current stream |
| 674 | sequence number will be issued and/or committed. |
| 675 | |
| 676 | \subsubsection advance Advance |
| 677 | |
| 678 | All non-stalled pipeline are advanced and may, thereafter, become stalled. |
| 679 | Potential activity in the next cycle is signalled if there are any |
| 680 | instructions remaining in any pipeline. |
| 681 | |
| 682 | \subsubsection sb Scoreboard |
| 683 | |
| 684 | The scoreboard (Scoreboard) is used to control instruction issue. It contains |
| 685 | a count of the number of in flight instructions which will write each general |
| 686 | purpose CPU integer or float register. Instructions will only be issued where |
| 687 | the scoreboard contains a count of 0 instructions which will write to one of |
| 688 | the instructions source registers. |
| 689 | |
| 690 | Once an instruction is issued, the scoreboard counts for each destination |
| 691 | register for an instruction will be incremented. |
| 692 | |
| 693 | The estimated delivery time of the instruction's result is marked up in the |
| 694 | scoreboard by adding the length of the issued-to FU to the current time. The |
| 695 | timings parameter on each FU provides a list of additional rules for |
| 696 | calculating the delivery time. These are documented in the parameter comments |
| 697 | in MinorCPU.py. |
| 698 | |
| 699 | On commit, (for memory operations, memory response commit) the scoreboard |
| 700 | counters for an instruction's source registers are decremented. will be |
| 701 | decremented. |
| 702 | |
| 703 | \subsubsection ifi Execute::inFlightInsts |
| 704 | |
| 705 | The Execute::inFlightInsts queue will always contain all instructions in |
| 706 | flight in Execute in the correct issue order. Execute::issue is the only |
| 707 | process which will push an instruction into the queue. Execute::commit is the |
| 708 | only process that can pop an instruction. |
| 709 | |
| 710 | \subsubsection lsq LSQ |
| 711 | |
| 712 | The LSQ can support multiple outstanding transactions to memory in a number of |
| 713 | conservative cases. |
| 714 | |
| 715 | There are three queues to contain requests: requests, transfers and the store |
| 716 | buffer. The requests and transfers queue operate in a similar manner to the |
| 717 | queues in Fetch1. The store buffer is used to decouple the delay of |
| 718 | completing store operations from following loads. |
| 719 | |
| 720 | Requests are issued to the DTLB as their instructions leave their functional |
| 721 | unit. At the head of requests, cacheable load requests can be sent to memory |
| 722 | and on to the transfers queue. Cacheable stores will be passed to transfers |
| 723 | unprocessed and progress that queue maintaining order with other transactions. |
| 724 | |
| 725 | The conditions in LSQ::tryToSendToTransfers dictate when requests can |
| 726 | be sent to memory. |
| 727 | |
| 728 | All uncacheable transactions, split transactions and locked transactions are |
| 729 | processed in order at the head of requests. Additionally, store results |
| 730 | residing in the store buffer can have their data forwarded to cacheable loads |
| 731 | (removing the need to perform a read from memory) but no cacheable load can be |
| 732 | issue to the transfers queue until that queue's stores have drained into the |
| 733 | store buffer. |
| 734 | |
| 735 | At the end of transfers, requests which are LSQ::LSQRequest::Complete (are |
| 736 | faulting, are cacheable stores, or have been sent to memory and received a |
| 737 | response) can be picked off by Execute and either committed |
| 738 | (ExecContext::completeAcc) and, for stores, be sent to the store buffer. |
| 739 | |
| 740 | Barrier instructions do not prevent cacheable loads from progressing to memory |
| 741 | but do cause a stream change which will discard that load. Stores will not be |
| 742 | committed to the store buffer if they are in the shadow of the barrier but |
| 743 | before the new instruction stream has arrived at Execute. As all other memory |
| 744 | transactions are delayed at the end of the requests queue until they are at |
| 745 | the head of Execute::inFlightInsts, they will be discarded by any barrier |
| 746 | stream change. |
| 747 | |
| 748 | After commit, LSQ::BarrierDataRequest requests are inserted into the |
| 749 | store buffer to track each barrier until all preceding memory transactions |
| 750 | have drained from the store buffer. No further memory transactions will be |
| 751 | issued from the ends of FUs until after the barrier has drained. |
| 752 | |
| 753 | \subsubsection drain Draining |
| 754 | |
| 755 | Draining is mostly handled by the Execute stage. When initiated by calling |
| 756 | MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit |
| 757 | each cycle and keeps the pipeline active until draining is complete. It is |
| 758 | Pipeline that signals the completion of draining. Execute is triggered by |
| 759 | MinorCPU::drain and starts stepping through its Execute::DrainState state |
| 760 | machine, starting from state Execute::NotDraining, in this order: |
| 761 | |
| 762 | <table> |
| 763 | <tr> |
| 764 | <td><b>State</b></td> |
| 765 | <td><b>Meaning</b></td> |
| 766 | </tr> |
| 767 | <tr> |
| 768 | <td>Execute::NotDraining</td> |
| 769 | <td>Not trying to drain, normal execution</td> |
| 770 | </tr> |
| 771 | <tr> |
| 772 | <td>Execute::DrainCurrentInst</td> |
| 773 | <td>Draining micro-ops to complete inst.</td> |
| 774 | </tr> |
| 775 | <tr> |
| 776 | <td>Execute::DrainHaltFetch</td> |
| 777 | <td>Halt fetching instructions</td> |
| 778 | </tr> |
| 779 | <tr> |
| 780 | <td>Execute::DrainAllInsts</td> |
| 781 | <td>Discarding all instructions presented</td> |
| 782 | </tr> |
| 783 | </table> |
| 784 | |
| 785 | When complete, a drained Execute unit will be in the Execute::DrainAllInsts |
| 786 | state where it will continue to discard instructions but has no knowledge of |
| 787 | the drained state of the rest of the model. |
| 788 | |
| 789 | \section debug Debug options |
| 790 | |
| 791 | The model provides a number of debug flags which can be passed to gem5 with |
| 792 | the --debug-flags option. |
| 793 | |
| 794 | The available flags are: |
| 795 | |
| 796 | <table> |
| 797 | <tr> |
| 798 | <td><b>Debug flag</b></td> |
| 799 | <td><b>Unit which will generate debugging output</b></td> |
| 800 | </tr> |
| 801 | <tr> |
| 802 | <td>Activity</td> |
| 803 | <td>Debug ActivityMonitor actions</td> |
| 804 | </tr> |
| 805 | <tr> |
| 806 | <td>Branch</td> |
| 807 | <td>Fetch2 and Execute branch prediction decisions</td> |
| 808 | </tr> |
| 809 | <tr> |
| 810 | <td>MinorCPU</td> |
| 811 | <td>CPU global actions such as wakeup/thread suspension</td> |
| 812 | </tr> |
| 813 | <tr> |
| 814 | <td>Decode</td> |
| 815 | <td>Decode</td> |
| 816 | </tr> |
| 817 | <tr> |
| 818 | <td>MinorExec</td> |
| 819 | <td>Execute behaviour</td> |
| 820 | </tr> |
| 821 | <tr> |
| 822 | <td>Fetch</td> |
| 823 | <td>Fetch1 and Fetch2</td> |
| 824 | </tr> |
| 825 | <tr> |
| 826 | <td>MinorInterrupt</td> |
| 827 | <td>Execute interrupt handling</td> |
| 828 | </tr> |
| 829 | <tr> |
| 830 | <td>MinorMem</td> |
| 831 | <td>Execute memory interactions</td> |
| 832 | </tr> |
| 833 | <tr> |
| 834 | <td>MinorScoreboard</td> |
| 835 | <td>Execute scoreboard activity</td> |
| 836 | </tr> |
| 837 | <tr> |
| 838 | <td>MinorTrace</td> |
| 839 | <td>Generate MinorTrace cyclic state trace output (see below)</td> |
| 840 | </tr> |
| 841 | <tr> |
| 842 | <td>MinorTiming</td> |
| 843 | <td>MinorTiming instruction timing modification operations</td> |
| 844 | </tr> |
| 845 | </table> |
| 846 | |
| 847 | The group flag Minor enables all of the flags beginning with Minor. |
| 848 | |
| 849 | \section trace MinorTrace and minorview.py |
| 850 | |
| 851 | The debug flag MinorTrace causes cycle-by-cycle state data to be printed which |
| 852 | can then be processed and viewed by the minorview.py tool. This output is |
| 853 | very verbose and so it is recommended it only be used for small examples. |
| 854 | |
| 855 | \subsection traceformat MinorTrace format |
| 856 | |
| 857 | There are three types of line outputted by MinorTrace: |
| 858 | |
| 859 | \subsubsection state MinorTrace - Ticked unit cycle state |
| 860 | |
| 861 | For example: |
| 862 | |
| 863 | \verbatim |
| 864 | 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0 |
| 865 | \endverbatim |
| 866 | |
| 867 | For each time step, the MinorTrace flag will cause one MinorTrace line to be |
| 868 | printed for every named element in the model. |
| 869 | |
| 870 | \subsubsection traceunit MinorInst - summaries of instructions issued by \ |
| 871 | Decode |
| 872 | |
| 873 | For example: |
| 874 | |
| 875 | \verbatim |
| 876 | 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \ |
| 877 | inst=" mov r0, #0" class=IntAlu |
| 878 | \endverbatim |
| 879 | |
| 880 | MinorInst lines are currently only generated for instructions which are |
| 881 | committed. |
| 882 | |
| 883 | \subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \ |
| 884 | Fetch1 |
| 885 | |
| 886 | For example: |
| 887 | |
| 888 | \verbatim |
| 889 | 92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \ |
| 890 | vaddr=0x5c paddr=0x5c |
| 891 | \endverbatim |
| 892 | |
| 893 | \subsection minorview minorview.py |
| 894 | |
| 895 | Minorview (util/minorview.py) can be used to visualise the data created by |
| 896 | MinorTrace. |
| 897 | |
| 898 | \verbatim |
| 899 | usage: minorview.py [-h] [--picture picture-file] [--prefix name] |
| 900 | [--start-time time] [--end-time time] [--mini-views] |
| 901 | event-file |
| 902 | |
| 903 | Minor visualiser |
| 904 | |
| 905 | positional arguments: |
| 906 | event-file |
| 907 | |
| 908 | optional arguments: |
| 909 | -h, --help show this help message and exit |
| 910 | --picture picture-file |
| 911 | markup file containing blob information (default: |
| 912 | <minorview-path>/minor.pic) |
| 913 | --prefix name name prefix in trace for CPU to be visualised |
| 914 | (default: system.cpu) |
| 915 | --start-time time time of first event to load from file |
| 916 | --end-time time time of last event to load from file |
| 917 | --mini-views show tiny views of the next 10 time steps |
| 918 | \endverbatim |
| 919 | |
| 920 | Raw debugging output can be passed to minorview.py as the event-file. It will |
| 921 | pick out the MinorTrace lines and use other lines where units in the |
| 922 | simulation are named (such as system.cpu.dcachePort in the above example) will |
| 923 | appear as 'comments' when units are clicked on the visualiser. |
| 924 | |
| 925 | Clicking on a unit which contains instructions or lines will bring up a speech |
| 926 | bubble giving extra information derived from the MinorInst/MinorLine lines. |
| 927 | |
| 928 | --start-time and --end-time allow only sections of debug files to be loaded. |
| 929 | |
| 930 | --prefix allows the name prefix of the CPU to be inspected to be supplied. |
| 931 | This defaults to 'system.cpu'. |
| 932 | |
| 933 | In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be |
| 934 | used to control the displayed simulation time. |
| 935 | |
| 936 | The diagonally striped coloured blocks are showing the InstId of the |
| 937 | instruction or line they represent. Note that lines in Fetch1 and f1ToF2.F |
| 938 | only show the id fields of a line and that instructions in Fetch2, f2ToD, and |
| 939 | decode.inputBuffer do not yet have execute sequence numbers. The T/S.P/L/F.E |
| 940 | buttons can be used to toggle parts of InstId on and off to make it easier to |
| 941 | understand the display. Useful combinations are: |
| 942 | |
| 943 | <table> |
| 944 | <tr> |
| 945 | <td><b>Combination</b></td> |
| 946 | <td><b>Reason</b></td> |
| 947 | </tr> |
| 948 | <tr> |
| 949 | <td>E</td> |
| 950 | <td>just show the final execute sequence number</td> |
| 951 | </tr> |
| 952 | <tr> |
| 953 | <td>F/E</td> |
| 954 | <td>show the instruction-related numbers</td> |
| 955 | </tr> |
| 956 | <tr> |
| 957 | <td>S/P</td> |
| 958 | <td>show just the stream-related numbers (watch the stream sequence |
| 959 | change with branches and not change with predicted branches)</td> |
| 960 | </tr> |
| 961 | <tr> |
| 962 | <td>S/E</td> |
| 963 | <td>show instructions and their stream</td> |
| 964 | </tr> |
| 965 | </table> |
| 966 | |
| 967 | The key to the right shows all the displayable colours (some of the colour |
| 968 | choices are quite bad!): |
| 969 | |
| 970 | <table> |
| 971 | <tr> |
| 972 | <td><b>Symbol</b></td> |
| 973 | <td><b>Meaning</b></td> |
| 974 | </tr> |
| 975 | <tr> |
| 976 | <td>U</td> |
| 977 | <td>Unknown data</td> |
| 978 | </tr> |
| 979 | <tr> |
| 980 | <td>B</td> |
| 981 | <td>Blocked stage</td> |
| 982 | </tr> |
| 983 | <tr> |
| 984 | <td>-</td> |
| 985 | <td>Bubble</td> |
| 986 | </tr> |
| 987 | <tr> |
| 988 | <td>E</td> |
| 989 | <td>Empty queue slot</td> |
| 990 | </tr> |
| 991 | <tr> |
| 992 | <td>R</td> |
| 993 | <td>Reserved queue slot</td> |
| 994 | </tr> |
| 995 | <tr> |
| 996 | <td>F</td> |
| 997 | <td>Fault</td> |
| 998 | </tr> |
| 999 | <tr> |
| 1000 | <td>r</td> |
| 1001 | <td>Read (used as the leftmost stripe on data in the dcachePort)</td> |
| 1002 | </tr> |
| 1003 | <tr> |
| 1004 | <td>w</td> |
| 1005 | <td>Write " "</td> |
| 1006 | </tr> |
| 1007 | <tr> |
| 1008 | <td>0 to 9</td> |
| 1009 | <td>last decimal digit of the corresponding data</td> |
| 1010 | </tr> |
| 1011 | </table> |
| 1012 | |
| 1013 | \verbatim |
| 1014 | |
| 1015 | ,---------------. .--------------. *U |
| 1016 | | |=|->|=|->|=| | ||=|||->||->|| | *- <- Fetch queues/LSQ |
| 1017 | `---------------' `--------------' *R |
| 1018 | === ====== *w <- Activity/Stage activity |
| 1019 | ,--------------. *1 |
| 1020 | ,--. ,. ,. | ============ | *3 <- Scoreboard |
| 1021 | | |-\[]-\||-\[]-\||-\[]-\| ============ | *5 <- Execute::inFlightInsts |
| 1022 | | | :[] :||-/[]-/||-/[]-/| -. -------- | *7 |
| 1023 | | |-/[]-/|| ^ || | | --------- | *9 |
| 1024 | | | || | || | | ------ | |
| 1025 | []->| | ->|| | || | | ---- | |
| 1026 | | |<-[]<-||<-+-<-||<-[]<-| | ------ |->[] <- Execute to Fetch1, |
| 1027 | '--` `' ^ `' | -' ------ | Fetch2 branch data |
| 1028 | ---. | ---. `--------------' |
| 1029 | ---' | ---' ^ ^ |
| 1030 | | ^ | `------------ Execute |
| 1031 | MinorBuffer ----' input `-------------------- Execute input buffer |
| 1032 | buffer |
| 1033 | \endverbatim |
| 1034 | |
| 1035 | Stages show the colours of the instructions currently being |
| 1036 | generated/processed. |
| 1037 | |
| 1038 | Forward FIFOs between stages show the data being pushed into them at the |
| 1039 | current tick (to the left), the data in transit, and the data available at |
| 1040 | their outputs (to the right). |
| 1041 | |
| 1042 | The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data. |
| 1043 | |
| 1044 | In general, all displayed data is correct at the end of a cycle's activity at |
| 1045 | the time indicated but before the inter-stage FIFOs are ticked. Each FIFO |
| 1046 | has, therefore an extra slot to show the asserted new input data, and all the |
| 1047 | data currently within the FIFO. |
| 1048 | |
| 1049 | Input buffers for each stage are shown below the corresponding stage and show |
| 1050 | the contents of those buffers as horizontal strips. Strips marked as reserved |
| 1051 | (cyan by default) are reserved to be filled by the previous stage. An input |
| 1052 | buffer with all reserved or occupied slots will, therefore, block the previous |
| 1053 | stage from generating output. |
| 1054 | |
| 1055 | Fetch queues and LSQ show the lines/instructions in the queues of each |
| 1056 | interface and show the number of lines/instructions in TLB and memory in the |
| 1057 | two striped colours of the top of their frames. |
| 1058 | |
| 1059 | Inside Execute, the horizontal bars represent the individual FU pipelines. |
| 1060 | The vertical bar to the left is the input buffer and the bar to the right, the |
| 1061 | instructions committed this cycle. The background of Execute shows |
| 1062 | instructions which are being committed this cycle in their original FU |
| 1063 | pipeline positions. |
| 1064 | |
| 1065 | The strip at the top of the Execute block shows the current streamSeqNum that |
| 1066 | Execute is committing. A similar stripe at the top of Fetch1 shows that |
| 1067 | stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its |
| 1068 | issuing predictionSeqNum. |
| 1069 | |
| 1070 | The scoreboard shows the number of instructions in flight which will commit a |
| 1071 | result to the register in the position shown. The scoreboard contains slots |
| 1072 | for each integer and floating point register. |
| 1073 | |
| 1074 | The Execute::inFlightInsts queue shows all the instructions in flight in |
| 1075 | Execute with the oldest instruction (the next instruction to be committed) to |
| 1076 | the right. |
| 1077 | |
| 1078 | 'Stage activity' shows the signalled activity (as E/1) for each stage (with |
| 1079 | CPU miscellaneous activity to the left) |
| 1080 | |
| 1081 | 'Activity' show a count of stage and pipe activity. |
| 1082 | |
| 1083 | \subsection picformat minor.pic format |
| 1084 | |
| 1085 | The minor.pic file (src/minor/minor.pic) describes the layout of the |
| 1086 | models blocks on the visualiser. Its format is described in the supplied |
| 1087 | minor.pic file. |
| 1088 | |
| 1089 | */ |
| 1090 | |
| 1091 | } |