layout: post title: “gem5-22.1 Released!” author: Bobby R. Bruce date: 2022-12-30 categories: project

The latter half of 2022 saw 500 commits submitted to gem5, from 48 unique contributors

Our top 10 contributors for the v22.1 release were:

  • Bobby R. Bruce (143 commits)
  • Giacomo Travaglini (63 commits)
  • Gabe Black (53 commits)
  • Matthew Poremba (49 commits)
  • Jason Lowe-Power (19 commits)
  • Yu-hsin Wang (18 commits)
  • Zhantong Qiu (11 commits)
  • Earl Ou (10 commits)
  • Tiago Mück (9 commits)
  • Quentin Forcioli (9 commits)

We wish to thank all of the gem5 community who made this gem5 release possible. We look forward to your continued support for the upcoming v23.0 release.

The latest version of gem5 can be obtained by pulling the latest version of the gem5 git repo stable branch:

git clone https://gem5.googlesource.com/public/gem5

# or, within the gem5 repo

git switch stable
git pull

Major New Features

Several new features were included in the v22.1 release. Here we outline some key highlights and how to use them. For a more comprehensive list of contributions, please consult the Release Notes.

Multiple ISA Compilations

Prior to v22.1 users of gem5 could only compile a single ISA target into a gem5 binary. E.g., scons build/X86/gem5.opt would create a gem5 binary with the X86 ISA but no other. In gem5 v22.1 users can compile any combination of ISAs into a gem5 binary. The build_opts directory includes ALL which can be used to compile a binary containing all ISA targets with scons build/ALL/gem5.opt.

With multi-ISA binaries the ISA used in simulation specified via which core is used. For example, using an X86TimingSimpleCPU will use the X86 ISA. When using the standard library API, you can carry out simulations as before by specifying the ISA to setting up the system processor.

Below is a simple ARM program in SE mode:

from gem5.isas import ISA
from gem5.utils.requires import requires
from gem5.resources.resource import Resource
from gem5.components.memory import SingleChannelDDR3_1600
from gem5.components.processors.cpu_types import CPUTypes
from gem5.components.boards.simple_board import SimpleBoard
from gem5.components.cachehierarchies.classic.no_cache import NoCache
from gem5.components.processors.simple_processor import SimpleProcessor
from gem5.simulate.simulator import Simulator

# This ensures that the ARM ISA is compiled into the binary.
requires(isa_required=ISA.ARM)

cache_hierarchy = NoCache()
memory = SingleChannelDDR3_1600(size="32MB")

# Here we must specify the ISA we are using via the `isa` parameter.
processor = SimpleProcessor(cpu_type=CPUTypes.TIMING, isa=ISA.ARM, num_cores=1)

board = SimpleBoard(
    clk_freq="3GHz",
    processor=processor,
    memory=memory,
    cache_hierarchy=cache_hierarchy,
)

board.set_se_binary_workload(Resource("arm-hello64-static"))

simulator = Simulator(board=board)
simulator.run()

This can be run with build/ALL/gem5.opt and ARM ISA will automatically be used as it was specified via the isa parameter when setting SimpleProcessor.

API to set tick-based Exit Events

It is common when running gem5 to want the simulation to Exit at a particular simulation tick. The typical manner in which this is achieved is by moving the MAX_TICK value of the simulation but this is limiting as it only allows for a single tick to be specified as en exit event. Until v22.1, the API for specifying exits at others ticks was not well exposed and difficult to use. As such the following functions have been added to the m5 Python module:

  • setMaxTick(tick) : Used to to specify the maximum simulation tick.
  • getMaxTick() : Used to obtain the maximum simulation tick value.
  • getTicksUntilMax(): Used to get the number of ticks remaining until the maximum tick is reached.
  • scheduleTickExitFromCurrent(tick) : Used to schedule an exit exit event a specified number of ticks in the future.
  • scheduleTickExitAbsolute(tick) : Used to schedule an exit event as a specified tick.

The setMaxTick function provides a cleaner interface for setting the maximum tick the simulation is to run to. When reached a MAX_TICK exit event is returned by the simulator. The scheduleTickExit functions allow for the scheduling of any number of tick exit events. When reached a SCHEDULED_TICK exit event is returned by the simulator. The scheduleTickExitFromCurrent function schedules the exit event N ticks in the future, with N being provided by the user. The scheduleTickExitAbsolute function allows for the scheduling at a specific simulation tick (e.g., at Tick 1,000).

Below is a simple simulation showing the scheduling of SCHEDULED_TICK exit events at 100, 1000, 10000, and 100000 ticks in the future.

from gem5.resources.resource import Resource
from gem5.isas import ISA
from gem5.components.memory import SingleChannelDDR3_1600
from gem5.components.boards.simple_board import SimpleBoard
from gem5.components.cachehierarchies.classic.no_cache import NoCache
from gem5.components.processors.simple_processor import SimpleProcessor
from gem5.components.processors.cpu_types import CPUTypes
from gem5.simulate.simulator import Simulator
from gem5.simulate.exit_event import ExitEvent

import m5

board = SimpleBoard(
    clk_freq="3GHz",
    processor=SimpleProcessor(
        cpu_type=CPUTypes.TIMING,
        isa=ISA.X86,
        num_cores=1,
    ),
    memory=SingleChannelDDR3_1600(),
    cache_hierarchy=NoCache(),
)
board.set_se_binary_workload(Resource("x86-hello64-static"))


def scheduled_tick_generator():
    while True:
        print(f"Exiting at: {m5.curTick()}")
        yield False


simulator = Simulator(
    board=board,
    on_exit_event={ExitEvent.SCHEDULED_TICK: scheduled_tick_generator()},
)

m5.scheduleTickExitFromCurrent(100)
m5.scheduleTickExitFromCurrent(1000)
m5.scheduleTickExitFromCurrent(10000)
m5.scheduleTickExitFromCurrent(100000)

simulator.run()

In this script the Simulator's on_exit_event parameter is utilized to handle the exit event by printing the tick number at exit, then continuing the simulation. The output of this simulation will be:

Exiting at: 100
Exiting at: 1000
Exiting at: 10000
Exiting at: 100000

The RISCVMatchedBoard

For some time there has been a desire to distribute SimObjects, stdlib components, and systems with “known-good” properties. That is, configurations capable of running simulations with reasonable fidelity to their real-world counter parts. While still early days, the RISCVMatchedBoard is our first step into this endeavor.

The RISCVMatchedBoard is based on SiFive's HiFive Unmatched board: a RISC-V, 64 bit Linux development platform with a SiFive Freedom U740 multi-core processor. Research at UC Davis sat down with the HiFive Unmatched board and carefully benchmarked its properties, then translated this to a gem5 design. The design has been incorporated into the gem5 Standard Library and can be found at “src/python/gem5/prebuilt/riscvmatched/riscvmatched_board.py”.

Below is an example of using the RISCVMatchedBoard to run a Full-System simulation.

from gem5.prebuilt.riscvmatched.riscvmatched_board import RISCVMatchedBoard
from gem5.utils.requires import requires
from gem5.isas import ISA
from gem5.simulate.simulator import Simulator
from gem5.resources.workload import Workload

requires(isa_required=ISA.RISCV)

board = RISCVMatchedBoard(
    clk_freq="1.2GHz",
    l2_size="2MB",
    is_fs=True,
)

workload = Workload("riscv-ubuntu-20.04-boot")
board.set_workload(workload)

simulator = Simulator(board=board)
simulator.run()

Work is still on-going to fine tune the parameters of this board to have it more-so emulate the real-world board's behavior and details will be published in the future to highlight our overall success. We will continue to develop “known-good” systems for gem5 and incorporate them as part of future gem5 releases.

SimPoints

An API for SimPoints has been added. SimPointing is a technique to substantially improve simulation time. It works by only sampling representative parts of a simulation then extrapolating statistical data accordingly. By doing this remaining parts of a simulated program can be skipped.

In combination with gem5 Workloads (see the section below for more information on Workloads) we can distribute binaries with SimPoint information and gem5 Checkpoints. These can then be executed via the SimPoint API, thus producing faster simulation runs.

Examples of using SimPoints with gem5 can be found in “configs/example/gem5_library/checkpoints/simpoints-se-checkpoint.py” and “configs/example/gem5_library/checkpoints/simpoints-se-restore.py”.

We are going to continue to expand and fine-tune this API. Of note, we are going to incorporate LoopPoints into the gem5 framework, which will allow for SimPoints to work in multi-core simulations (a current limitation of the SimPoints framework).

Workloads

As an expansion of the gem5-resources infrastructure, the concept of a Workload has been introduced. The gem5-resources infrastructure has, for several major releases, provides resources. As an example:

from gem5.resources.resource import Resource

image = Resource("x86-npb")
kernel = Resource("x86-linux-kernel-5.4.49")
binary = Resource("x86-print-this")

...

board.set_se_simpoint_workload(
    binary=binary,
    arguments=["hello", 1500]
)

Here we three resources are requested. “x86-npb” is an X86 disk image containing the NAS Parallel Benchmark suite built atop the Ubuntu operating system, “x86-linux-kernel-5.4.49” is the v5.4.49 Linux Kernel, and “x86-print-this” is a binary which accepts two arguments: a string to be printed, and the number of times to print it". The board is then set to run the “x86-print-this” binary with the arguments “hello” and “1500”.

While powerful, obtaining resources in this manner has some limitations. First of all, running a simulation may require multiple resources to be maintained. Some resources will almost always require other resources to run. For example, our “x86-npb” disk image resource is useless without a kernel.

The other issue beyond obvious couplings of resources is resources may require particular parameters to be passed to be useful. For example, the “x86-npb” contains a suite of benchmark applications but specific command line parameters must be passed to specify what benchmark with what input is to be run. In efforts to simplify usage of gem5, we want users to simply specify what they want their simulated system to run. For example, “x86-npb-FS-input-A”.

The solution to these problems are Workloads. Workloads allow for the bundling of resources and any input parameters. User's need only specify the workload they wish to run and the gem5 Standard Library, interfacing, with the gem5-resources infrastructure, will setup the simulation to run correctly.

from gem5.prebuilt.demo.x86_demo_board import X86DemoBoard
from gem5.resources.workload import Workload
from gem5.simulate.simulator import Simulator

board = X86DemoBoard()

board.set_workload(Workload("x86-ubuntu-18.04-boot"))

simulator = Simulator(board=board)
simulator.run()

“kernel” : “x86-linux-kernel-5.4.49”, “disk_image”:“x86-ubuntu-18.04-img”

Below we show using the “x86-ubuntu-18.04-boot” workload. This workload will use the “x86-linux-kernel-5.4.49” resource for the simulation kernel and the “x86-ubuntu-18.04-img” resource for the disk image. Upon boot completion the simulation will exit.

Another example would be the “x86-print-this-15000-with-simpoints” Workload. This specifies an SE workload running the “x86-print-this” binary resource, passing the parameters “print this” and “1500” with the “x86-print-this-1500-simpoints” simpoint resource.

At the time of writing the following Workloads are available to use:

  • “x86-ubuntu-18.04-boot” : Runs an X86 Ubuntu 18.04 boot.
  • “riscv-ubuntu-20.04-boot” : Runs an RISC-V Ubuntu 20.04 boot.
  • “arm64-ubuntu-20.04-boot” : Runs an ARM-64 Ubuntu 20.04 boot.
  • “x86-print-this-15000-with-simpoints” : Runs the “print-this” binary, print 15000 “print this” string to the terminal, with SimPoints.
  • “x86-print-this-15000-with-simpoints-and-checkpoint” : Runs the “print-this” binary, print 15000 “print this” string to the terminal, with SimPoints and checkpoints.

More Workloads will be added overtime to provide the gem5 community with a rich variety of workloads they may plug into their simulations.

pre-commit checks for developers

In order to help gem5 developers adhere to the gem5 style guide, we have added the pre-commit framework to the gem5 repository. The pre-commit framework manages git hooks for the repository. We use this to run a suite of checks on code users wish to submit to code-review via a hook run when the git commit command is executed.

In particular, we now utilize the black Python code formatter on all Python code submitted to the gem5 repository.

In v22.1, when compiling gem5, the user will be asked if they wish to install the pre-commit hook if not already installed. With consent from the user, the hook will be installed. If a user wishes to install the hook manually they may do so by running the following in the gem5 repository:

pip install -r requirements.txt
./util/pre-commit-install.sh

Once installed, any staged code will be processed when git commit is executed. Users may also run the pre-commit on any file or directory they wish by using the following command:

pre-commit run --files path/to/file/or/directory

We strongly advise users to install the pre-commit hooks. Our code-review pre-submit CI tests have been updated with the pre-commit tests. If they fail, users will be unable to incorporate their patches to the code-base until the code is refactored so the pre-commit tests pass.

GPU Full-System mode

In v21.2 we introduced GPU support with Syscall-Emulation (SE) mode and in v22.0 we laid the early framework for Full-System (FS) Support support. In v22.1 we're happy to announce substantially improved support for GPU simulation in FS mode.

Until v22.1, SE mode was preferred due to its relative stability. However, GPU SE simulation needed the user to have a very specific environment on their host system, primarily due to the GPU simulator requiring a very specific ROCm software stack for dynamic linking to workloads. A Docker container was created to provide this environment this, naturally, required all simulations to be carried out within the container.

The full incorporation of FS mode for GPU simulation removes all host requirements and all SE-mode simulations may be run as an FS mode simulation.

The GPU FS mode has also has improved simulated speed by functionally simulating memory copies, and provides an easier update path for gem5 developers.

For the meantime, we strongly advise users to run GPU FS mode on an X86 host using KVM mode to skip boot and other irrelevant simulation tasks. GPU simulation is very resource intensive so care should be taken as to not simulate unnecessary code.

Community Affairs

In the summer of 2022 we held a gem5 Bootcamp at UC Davis. The first of its kind, the Bootcamp was setup to give early-stage computer architecture researchers an opportunity to learn gem5 for an intensive 5 day course. With over 80 applicants we selected 50 due to venue constraints and held the Bootcamp in July, hosting all attendees in UC Davis accommodation.

The course was carefully designed to take the researchers, most of who were 1st year PhD students, from a very basic understanding of gem5 through to complex tasks such as adding ISA instructions, developing models, learning about various differences in gem5 CPU models, accelerating simulations, and the gem5 GPU model. Leveraging expertise and UC Davis, we were also able to give attendees tailored advice on how to use gem5 for their particular research agendas in special 1-on-1 sessions. In a survey taken after the event, it was found that 95% of attendees “strongly agreed” that the Bootcamp was valuable to them in getting starting with gem5 and 100% of attendees were "more likely or “much more likely” to use gem5 in their research.

We have made efforts to archive the events via our YouTube channel and have encapsulated teaching materials in the Bootcamp website. We encourage users to utilize these resources for learning gem5.

Future events

In the February 2023 we'll be holding a gem5 Tutorial at HPCA. This tutorial will give attendees a 3-hour crash course in using gem5. This will include emphasis on new gem5 features, such as that incorporated into the v22.1 release. Based on feedback from previous tutorials, we will also include a short session on using the GPU FS model in gem5.

In July (10th to the 14th), UC Davis will be hosting the 2nd gem5 Bootcamp. With similar goals to the Bootcamp held in 2023, this event will give an intensive 5 day course for early-career computer architecture researchers, particularly those in the first two years of a PhD programme. Please join the gem5 announce mailing list or keep an eye on or gem5 events page for upcoming information on registering interest for this event.