In addition to any information which your simulation script prints out, after running gem5, there are three files generated in a directory called m5out
:
config.ini : Contains a list of every SimObject created for the simulation and the values for its parameters.
config.json : The same as config.ini, but in json format.
stats.txt : A text representation of all of the gem5 statistics registered for the simulation.
This file is the definitive version of what was simulated. All of the parameters for each SimObject that is simulated, whether they were set in the configuration scripts or the defaults were used, are shown in this file.
Below is pulled from the config.ini generated when the simple.py
configuration file from simple-config-chapter is run.
[root] type=Root children=system eventq_index=0 full_system=false sim_quantum=0 time_sync_enable=false time_sync_period=100000000000 time_sync_spin_threshold=100000000 [system] type=System children=clk_domain cpu dvfs_handler mem_ctrl membus boot_osflags=a cache_line_size=64 clk_domain=system.clk_domain default_p_state=UNDEFINED eventq_index=0 exit_on_work_items=false init_param=0 kernel= kernel_addr_check=true kernel_extras= kvm_vm=Null load_addr_mask=18446744073709551615 load_offset=0 mem_mode=timing ... [system.membus] type=CoherentXBar children=snoop_filter clk_domain=system.clk_domain default_p_state=UNDEFINED eventq_index=0 forward_latency=4 frontend_latency=3 p_state_clk_gate_bins=20 p_state_clk_gate_max=1000000000000 p_state_clk_gate_min=1000 point_of_coherency=true point_of_unification=true power_model= response_latency=2 snoop_filter=system.membus.snoop_filter snoop_response_latency=4 system=system use_default_range=false width=16 master=system.cpu.interrupts.pio system.cpu.interrupts.int_slave system.mem_ctrl.port slave=system.cpu.icache_port system.cpu.dcache_port system.cpu.interrupts.int_master system.system_port [system.membus.snoop_filter] type=SnoopFilter eventq_index=0 lookup_latency=1 max_capacity=8388608 system=system
Here we see that at the beginning of the description of each SimObject is first its name as created in the configuration file surrounded by square brackets (e.g., [system.membus]
).
Next, every parameter of the SimObject is shown with its value, including parameters not explicitly set in the configuration file. For instance, the configuration file sets the clock domain to be 1 GHz (1000 ticks in this case). However, it did not set the cache line size (which is 64 in the system
) object.
The config.ini
file is a valuable tool for ensuring that you are simulating what you think you're simulating. There are many possible ways to set default values, and to override default values, in gem5. It is a “best-practice” to always check the config.ini
as a sanity check that values set in the configuration file are propagated to the actual SimObject instantiation.
gem5 has a flexible statistics generating system. gem5 statistics is covered in some detail on the gem5 wiki site. Each instantiation of a SimObject has it's own statistics. At the end of simulation, or when special statistic-dumping commands are issued, the current state of the statistics for all SimObjects is dumped to a file.
First, the statistics file contains general statistics about the execution:
---------- Begin Simulation Statistics ---------- simSeconds 0.000057 # Number of seconds simulated (Second) simTicks 57467000 # Number of ticks simulated (Tick) finalTick 57467000 # Number of ticks from beginning of simulation (restored from checkpoints and never reset) (Tick) simFreq 1000000000000 # The number of ticks per simulated second ((Tick/Second)) hostSeconds 0.03 # Real time elapsed on the host (Second) hostTickRate 2295882330 # The number of ticks simulated per host second (ticks/s) ((Tick/Second)) hostMemory 665792 # Number of bytes of host memory used (Byte) simInsts 6225 # Number of instructions simulated (Count) simOps 11204 # Number of ops (including micro ops) simulated (Count) hostInstRate 247382 # Simulator instruction rate (inst/s) ((Count/Second)) hostOpRate 445086 # Simulator op (including micro ops) rate (op/s) ((Count/Second)) ---------- Begin Simulation Statistics ---------- simSeconds 0.000490 # Number of seconds simulated (Second) simTicks 490394000 # Number of ticks simulated (Tick) finalTick 490394000 # Number of ticks from beginning of simulation (restored from checkpoints and never reset) (Tick) simFreq 1000000000000 # The number of ticks per simulated second ((Tick/Second)) hostSeconds 0.03 # Real time elapsed on the host (Second) hostTickRate 15979964060 # The number of ticks simulated per host second (ticks/s) ((Tick/Second)) hostMemory 657488 # Number of bytes of host memory used (Byte) simInsts 6225 # Number of instructions simulated (Count) simOps 11204 # Number of ops (including micro ops) simulated (Count) hostInstRate 202054 # Simulator instruction rate (inst/s) ((Count/Second)) hostOpRate 363571 # Simulator op (including micro ops) rate (op/s) ((Count/Second))
The statistic dump begins with ---------- Begin Simulation Statistics ----------
. There may be multiple of these in a single file if there are multiple statistic dumps during the gem5 execution. This is common for long running applications, or when restoring from checkpoints.
Each statistic has a name (first column), a value (second column), and a description (last column preceded by #) followed by the unit of the statistic.
Most of the statistics are self explanatory from their descriptions. A couple of important statistics are sim_seconds
which is the total simulated time for the simulation, sim_insts
which is the number of instructions committed by the CPU, and host_inst_rate
which tells you the performance of gem5.
Next, the SimObjects' statistics are printed. For instance, the CPU statistics, which contains information on the number of syscalls, statistics for cache system and translation buffers, etc.
system.clk_domain.clock 1000 # Clock period in ticks (Tick) system.clk_domain.voltage_domain.voltage 1 # Voltage in Volts (Volt) system.cpu.numCycles 57467 # Number of cpu cycles simulated (Cycle) system.cpu.numWorkItemsStarted 0 # Number of work items this cpu started (Count) system.cpu.numWorkItemsCompleted 0 # Number of work items this cpu completed (Count) system.cpu.dcache.demandHits::cpu.data 1941 # number of demand (read+write) hits (Count) system.cpu.dcache.demandHits::total 1941 # number of demand (read+write) hits (Count) system.cpu.dcache.overallHits::cpu.data 1941 # number of overall hits (Count) system.cpu.dcache.overallHits::total 1941 # number of overall hits (Count) system.cpu.dcache.demandMisses::cpu.data 133 # number of demand (read+write) misses (Count) system.cpu.dcache.demandMisses::total 133 # number of demand (read+write) misses (Count) system.cpu.dcache.overallMisses::cpu.data 133 # number of overall misses (Count) system.cpu.dcache.overallMisses::total 133 # number of overall misses (Count) system.cpu.dcache.demandMissLatency::cpu.data 14301000 # number of demand (read+write) miss ticks (Tick) system.cpu.dcache.demandMissLatency::total 14301000 # number of demand (read+write) miss ticks (Tick) system.cpu.dcache.overallMissLatency::cpu.data 14301000 # number of overall miss ticks (Tick) system.cpu.dcache.overallMissLatency::total 14301000 # number of overall miss ticks (Tick) system.cpu.dcache.demandAccesses::cpu.data 2074 # number of demand (read+write) accesses (Count) system.cpu.dcache.demandAccesses::total 2074 # number of demand (read+write) accesses (Count) system.cpu.dcache.overallAccesses::cpu.data 2074 # number of overall (read+write) accesses (Count) system.cpu.dcache.overallAccesses::total 2074 # number of overall (read+write) accesses (Count) system.cpu.dcache.demandMissRate::cpu.data 0.064127 # miss rate for demand accesses (Ratio) system.cpu.dcache.demandMissRate::total 0.064127 # miss rate for demand accesses (Ratio) system.cpu.dcache.overallMissRate::cpu.data 0.064127 # miss rate for overall accesses (Ratio) system.cpu.dcache.overallMissRate::total 0.064127 # miss rate for overall accesses (Ratio) system.cpu.dcache.demandAvgMissLatency::cpu.data 107526.315789 # average overall miss latency ((Cycle/Count)) system.cpu.dcache.demandAvgMissLatency::total 107526.315789 # average overall miss latency ((Cycle/Count)) system.cpu.dcache.overallAvgMissLatency::cpu.data 107526.315789 # average overall miss latency ((Cycle/Count)) system.cpu.dcache.overallAvgMissLatency::total 107526.315789 # average overall miss latency ((Cycle/Count)) ... system.cpu.mmu.dtb.rdAccesses 1123 # TLB accesses on read requests (Count) system.cpu.mmu.dtb.wrAccesses 953 # TLB accesses on write requests (Count) system.cpu.mmu.dtb.rdMisses 11 # TLB misses on read requests (Count) system.cpu.mmu.dtb.wrMisses 9 # TLB misses on write requests (Count) system.cpu.mmu.dtb.walker.power_state.pwrStateResidencyTicks::UNDEFINED 57467000 # Cumulative time (in ticks) in various power states (Tick) system.cpu.mmu.itb.rdAccesses 0 # TLB accesses on read requests (Count) system.cpu.mmu.itb.wrAccesses 7940 # TLB accesses on write requests (Count) system.cpu.mmu.itb.rdMisses 0 # TLB misses on read requests (Count) system.cpu.mmu.itb.wrMisses 37 # TLB misses on write requests (Count) system.cpu.mmu.itb.walker.power_state.pwrStateResidencyTicks::UNDEFINED 57467000 # Cumulative time (in ticks) in various power states (Tick) system.cpu.power_state.pwrStateResidencyTicks::ON 57467000 # Cumulative time (in ticks) in various power states (Tick) system.cpu.thread_0.numInsts 0 # Number of Instructions committed (Count) system.cpu.thread_0.numOps 0 # Number of Ops committed (Count) system.cpu.thread_0.numMemRefs 0 # Number of Memory References (Count) system.cpu.workload.numSyscalls 11 # Number of system calls (Count)
Later in the file is memory controller statistics. This has information like the bytes read by each component and the average bandwidth used by those components.
system.mem_ctrl.bytesReadWrQ 0 # Total number of bytes read from write queue (Byte) system.mem_ctrl.bytesReadSys 23168 # Total read bytes from the system interface side (Byte) system.mem_ctrl.bytesWrittenSys 0 # Total written bytes from the system interface side (Byte) system.mem_ctrl.avgRdBWSys 403153113.96105593 # Average system read bandwidth in Byte/s ((Byte/Second)) system.mem_ctrl.avgWrBWSys 0.00000000 # Average system write bandwidth in Byte/s ((Byte/Second)) system.mem_ctrl.totGap 57336000 # Total gap between requests (Tick) system.mem_ctrl.avgGap 158386.74 # Average gap between requests ((Tick/Count)) system.mem_ctrl.requestorReadBytes::cpu.inst 14656 # Per-requestor bytes read from memory (Byte) system.mem_ctrl.requestorReadBytes::cpu.data 8512 # Per-requestor bytes read from memory (Byte) system.mem_ctrl.requestorReadRate::cpu.inst 255033323.472601681948 # Per-requestor bytes read from memory rate ((Byte/Second)) system.mem_ctrl.requestorReadRate::cpu.data 148119790.488454252481 # Per-requestor bytes read from memory rate ((Byte/Second)) system.mem_ctrl.requestorReadAccesses::cpu.inst 229 # Per-requestor read serviced memory accesses (Count) system.mem_ctrl.requestorReadAccesses::cpu.data 133 # Per-requestor read serviced memory accesses (Count) system.mem_ctrl.requestorReadTotalLat::cpu.inst 6234000 # Per-requestor read total memory access latency (Tick) system.mem_ctrl.requestorReadTotalLat::cpu.data 4141000 # Per-requestor read total memory access latency (Tick) system.mem_ctrl.requestorReadAvgLat::cpu.inst 27222.71 # Per-requestor read average memory access latency ((Tick/Count)) system.mem_ctrl.requestorReadAvgLat::cpu.data 31135.34 # Per-requestor read average memory access latency ((Tick/Count)) system.mem_ctrl.dram.bytesRead::cpu.inst 14656 # Number of bytes read from this memory (Byte) system.mem_ctrl.dram.bytesRead::cpu.data 8512 # Number of bytes read from this memory (Byte) system.mem_ctrl.dram.bytesRead::total 23168 # Number of bytes read from this memory (Byte) system.mem_ctrl.dram.bytesInstRead::cpu.inst 14656 # Number of instructions bytes read from this memory (Byte) system.mem_ctrl.dram.bytesInstRead::total 14656 # Number of instructions bytes read from this memory (Byte) system.mem_ctrl.dram.numReads::cpu.inst 229 # Number of read requests responded to by this memory (Count) system.mem_ctrl.dram.numReads::cpu.data 133 # Number of read requests responded to by this memory (Count) system.mem_ctrl.dram.numReads::total 362 # Number of read requests responded to by this memory (Count) system.mem_ctrl.dram.bwRead::cpu.inst 255033323 # Total read bandwidth from this memory ((Byte/Second)) system.mem_ctrl.dram.bwRead::cpu.data 148119790 # Total read bandwidth from this memory ((Byte/Second)) system.mem_ctrl.dram.bwRead::total 403153114 # Total read bandwidth from this memory ((Byte/Second)) system.mem_ctrl.dram.bwInstRead::cpu.inst 255033323 # Instruction read bandwidth from this memory ((Byte/Second)) system.mem_ctrl.dram.bwInstRead::total 255033323 # Instruction read bandwidth from this memory ((Byte/Second)) system.mem_ctrl.dram.bwTotal::cpu.inst 255033323 # Total bandwidth to/from this memory ((Byte/Second)) system.mem_ctrl.dram.bwTotal::cpu.data 148119790 # Total bandwidth to/from this memory ((Byte/Second)) system.mem_ctrl.dram.bwTotal::total 403153114 # Total bandwidth to/from this memory ((Byte/Second)) system.mem_ctrl.dram.readBursts 362 # Number of DRAM read bursts (Count) system.mem_ctrl.dram.writeBursts 0 # Number of DRAM write bursts (Count)