layout: post title: “gem5-20.1 Roadmap” author: Jason Lowe-Power and Bobby Bruce date: 2020-07-01 categories: project

After our [successful release]({% post_url 2020-05-21-gem5-20 %}) of gem5-20, it‘s time to start thinking about gem5’s next release, gem5-20.1!

We received great feedback during the gem5 town hall held with the gem5 workshop about the new features that the community would like to see for the next release.

Based on this feedback, below are the major projects, with links to the Jira issues, that we are planning to complete for the gem5-20.1 release. We welcome any feedback on this roadmap! Please use the gem5-dev mailing list to let us know what you think!

If you would like to suggest other features/fixes to be included in gem5-20.1, please mark them as “fix version” “gem5 20.1” on Jira. You can find all of the current issues that we are planning to complete by the 20.1 release on this Jira page (note: you may need be logged in to see the release page).

We're planning on releasing gem5-20.1 at the end of August. We expect to create the staging branch and close the merge window on August 24, giving everyone a little more than 6 weeks to get these major changes done!

Continuing to define gem5's APIs

In gem5-20, we started towards defining internal APIs for gem5. You can see the work we did in the Doxygen documentation. In gem5-20, our goal was modest. We only added information on SimObject and the classes that SimObject depends on. Specifically, we codified APIs for SimObject, Drain, Serialize, EventQueue, and Statistics.

To add these to the API, we have tagged the methods that are part of the API with Doxygen groups. For now, we will rely on code review to catch any changes to the API. We will strive to keep APIs constant, but in order to improve the simulator, APIs will have to change. However, APIs will only change at “major” releases, and we will add information to the release notes with details on how to update your code if it was using the changed API.

For gem5-20.1, we will be further expanding the methods covered by the API definitions. We will focus on the following:

  • Adding more documentation to the current APIs including the statistics, drain, serialize, event queue, and SimObject.
  • Define the “external” APIs for the event queue. These are useful to other simulators that want to hook into gem5 either by driving the event queue themselves or having gem5's event queue drive the external simulator.
  • The port interface. Ports are one of the most widely used objects in gem5. They facilitate communication between all memory system objects. This interface has changed somewhat over the past few years and the documentation hasn't kept up. We will make sure this interface is well documented.
  • “Base” utility classes. There are many classes in the base/ directory that are used throughout the codebase. We will add these as APIs to make sure these widely used interfaces don't change without documentation.

We are also considering the best way to document the input and output of the simulator. Specifically, we believe that the output files (e.g., stats.txt, config.ini, checkpoints, etc.) should be stable across releases. However, Doxygen is probably not the right way to document these interfaces. We'd love to hear your ideas on how best to document and enforce these interfaces.

See this Jira Epic to follow the current progress on this issue.

Improve statistics

The most common feedback we received at the gem5 users town hall was that statistics were difficult to use and not flexible enough. To this end, we'll be working to improve the statistics in gem5-20.1.

Our first step, will be to convert the stats for all SimObjects to be “new style” statistics. About a year ago, we added support for hierarchical statistics. This change significantly improves the usability of the statistics package. However, we decided to keep backwards compatibility, and at the time, we didn't update all of the SimObjects to use this new API. For gem5-20.1, we will migrate all of the SimObjects to use the new style, and we will officially deprecate the old API.

The other major feedback we received about statistics was that the current output was not easily machine readable. To this end, we will add new output formats that are more machine readable. The major formats we're currently considering are csv, something compatible with pandas dataframes, and protobuf for streaming time series stats.

In addition to the new Jira issues linked above, there is some ongoing work with changesets currently on gerrit (or recently merged).

Improve python interfaces

The Python<->C++ interface is one of gem5's best features. However, much of this code was written 10+ years ago, and there have been many additions to this code. This is especially true of the “default” configuration files, se.py and fs.py which have grown to be 1000s of lines of code (when the imported files are included).

For gem5-20.1, we will begin to refactor some of this python code to improve the design, add more documentation, and make things easier to use.

For now, this will be a new library (gem5 instead of m5) to provide backwards compatibility for all of the configs that are currently used.

We will also develop a “unit tests” interface for testing python level SimObjects. This won‘t be exactly unit tests, but will allow SimObject designers to test their objects without having to run an entire gem5 simulation. This will require building all of gem5 for now, but we’ll think about a future where this isn't required.

We will document the Python API with pydoc documentation. This will allow us to define this API more clearly. We will also begin to include Python type annotations. However, since type annotations require Python 3.5+, we will initially just document this in the documentation strings, and move it into the main code at a later date.

Finally, we will start developing a pure-Python model library. This model library will allow users to write simpler Python scripts to configure the simulator. Eventually, this library will supplant se.py and fs.py. Additionally, it is going to be the basis of gem5's “known-good configurations” that will be publicly validated models based on real hardware.

Testing

There are a number of ways we would like to improve testing for the gem5-20 release. One of the highest priority, is replacing kokoro with a self-hosted CI solution. The Google-hosted kokoro process has been very helpful to catch bugs. However, it's difficult for people outside of Google to configure, and currently it uses an old VM/docker container with GCC 4.8. Moving to a self-hosted solution will give us more flexibility and control. It will also help us accomplish some of our other testing goals.

Another goal is to get nightly builds up and running. We have both “quick” and “long” regressions marked, but we currently don‘t run most of the “long” regressions (they take 16-24 hours). We are planning to run these nightly, but this wasn’t possible with the kokoro system. We're also going to look into adding other kinds of tests like testing to make sure gem5 builds will all of our supported compilers and running address sanitizers and other static analysis tools.

To accomplish these goals, we believe the best way forward is to use our own self-hosted Jenkins instance. For now, we will run this instance on Google's cloud infrastructure. However, one of the benefits of Jenkins is that we can easily move it to any public cloud or local compute resources.

We'll also be working on fixing some failing integration tests, fixing some of the failing Linux boot tests, and continue to work on increasing our unittest coverage.

Full system GPU support

As discussed in the [gem5 users workshop]({% post_url 2020-06-01-towards-full %}), developers at AMD are working to make it possible to use the GPU model in full system mode in gem5. The goal is to be able to use the GPU model with the upstream driver and runtimes. Currently, the SE mode-based GPU model is tied to an old version of the RoCM stack, and by implementing this full system support we'll be able to support newer runtime versions.

The current status of this support can be found on this Jira Epic. There's a lot of little things to get done here, but the developers are hard at work on them!

Memory system improvements

There are a number of improvements to gem5's memory system that we are targeting for gem5-20.1 including an NVM model, support for transactional memory, and a new Ruby protocol that supports a flexible cache hierarchy.

NVM Model

Wendy Elsasser and other developers at Arm have been working on an improved memory controller that has both DRAM and non-volatile memory attached. More details on this development can be found in her [gem5 workshop presentation]({% post_url 2020-05-29-memory-controller %}).

There are a set of patches on gerrit right now for this new support. We fully expect these to be integrated into gem5 before the gem5-20.1 release.

You can follow the development on this Jira issue.

HTM Model

Hardware transactional memory has moved from research to products with Intel and ARM supporting some form of transactional memory in their ISAs.

There is current work to support HTM in both the ISA models and Ruby. We expect that these changesets on gerrit will be merged before the release of gem5-20.1.

You can follow the progress on the Ruby Jira issue and the ARM TME Jira issue.

Heterogarnet (Garnet 3.0)

As described in [the gem5 workshop talk]({% post_url 2020-05-29-heterogarnet %}) there has been work to extend Garnet to support 3D integration and other heterogeneous systems. We will be working to merge this support into the gem5-20.1 release.

A discussion on ISA support

gem5 supports a large number of ISAs. This is one of gem5‘s best features, but the support for each ISA isn’t uniform. Each ISA has varying completeness, support for the latest revisions, fidelity, system architecture support, and testing.

We are going to solicit feedback on how to proceed with our ISA support during the gem5 20.1 development cycle.

This issue is about having a conversation about removing support for some ISAs. Whether or not we remove these ISAs is up to the community at large.

Other goals