blob: 917ed0af6d6e53d4865cb8fbfabee77da1d0c6de [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>gem5</title>
<!-- SITE FAVICON -->
<link rel="shortcut icon" type="image/gif" href="/assets/img/gem5ColorVert.gif"/>
<link rel="canonical" href="http://localhost:4000/2015/06/09/gem5-horrors.html">
<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,300,700,800,600' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Muli:400,300' rel='stylesheet' type='text/css'>
<!-- FAVICON -->
<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">
<!-- BOOTSTRAP -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
<!-- CUSTOM CSS -->
<link rel="stylesheet" href="/css/main.css">
</head>
<body>
<nav class="navbar navbar-expand-md navbar-light bg-light">
<a class="navbar-brand" href="/">
<img src="/assets/img/gem5ColorLong.gif" alt="gem5" height=45px>
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavDropdown" aria-controls="navbarNavDropdown" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarNavDropdown">
<ul class="navbar-nav ml-auto">
<li class="nav-item ">
<a class="nav-link" href="/">Home</a>
</li>
<li class="nav-item dropdown ">
<a class="nav-link dropdown-toggle" href="/about" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
About
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
<a class="dropdown-item" href="/about">About</a>
<a class="dropdown-item" href="/publications">Publications</a>
<a class="dropdown-item" href="/governance">Governance</a>
</div>
</li>
<li class="nav-item dropdown ">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Documentation
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
<!-- Pull navigation from _data/documentation.yml -->
<a class="dropdown-item" href="/introduction">Introduction</a>
<a class="dropdown-item" href="/building">Getting Started</a>
<a class="dropdown-item" href="/environment">Modifying/Extending</a>
<a class="dropdown-item" href="/MSIintro">Modeling Cache Coherence with Ruby</a>
</div>
</li>
<li class="nav-item ">
<a class="nav-link" href="/contributing">Contributing</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="/blog">Blog</a>
</li>
<li class="nav-item ">
<a class="nav-link" href="/search">Search</a>
</li>
</ul>
</div>
</nav>
<main>
<br><br>
<div class="container blog">
<h1>gem5 Horrors and what we can do about it</h1>
<time>Jun 9, 2015 • Jason Lowe-Power</time>
<hr>
<p><img src="/assets/img/gem5-horrors.png" alt="image" /></p>
<p>This post is a post which mostly follows the talk that I am giving at
the <a href="http://gem5.org/User_workshop_2015">gem5 Users Workshop</a>. This post
contains some more details on problems that I skipped in my talk and
some references that I was not able to include in a presentation. You
can view my presentation on Google Drive
<a href="https://docs.google.com/presentation/d/1QGA5UVaVJkkMITF2TXCY_KlwmfWef1KBzfDP6ocbj7I/pub?start=false&amp;loop=false&amp;delayms=3000">here</a>.</p>
<h2 id="i-3-gem5">I &lt;3 gem5</h2>
<p>Before I get into the negative aspects of <a href="http://gem5.org">gem5</a>, I
first want to point out that it is a great tool. gem5 is used by a large
number of computer architecture researchers, both in industry and in
academia. Here at Wisconsin, and at other universities, gem5 is used in
the classroom to teach students about computer architecture and how to
do computer architecture research.</p>
<p>gem5 is, without a doubt, the most full-featured architecture simulator.
It leverages execute-at-execute semantics for high-fidelity
cycle-by-cycle simulation. gem5 can boot a mostly unmodified Linux
image. It has multiple different CPU and memory models. gem5 has a
modular design which makes it simple to embed and extend. This has
allowed gem5 to be used a large number of projects (see
<a href="http://gem5.org/Projects">http://gem5.org/Projects</a> and <a href="http://gem5.org/Publications">http://gem5.org/Publications</a>).</p>
<p>However, as great as gem5 is, its growth has not been without pain. Now
that gem5 is nearing 15 years of development (if you include the
original m5 and GEMS project from which gem5 was born), I believe it’s
time to look at some of its deficiencies and talk about what we can do
to mitigate them.</p>
<h2 id="gem5-horrors">gem5 horrors</h2>
<p>Below, I discuss a few specific pain points with that I and others have
experienced with gem5. However, before I get to that, I’d like to talk
about what I think the root of these issues are. gem5 has two main
problems</p>
<p>1) There is no formal governance model.
2) The gem5 developers do not think of the user first.</p>
<p>Later, after I give some examples of specific problems, I will discuss
what I think can be done to fix these to issues.</p>
<p>Next I discuss four specific “gem5 horrors” that either I have
personally experienced or I have talked to others who have experienced
them. These issues are deeper that just bugs, even if sometimes they can
be solved with simple changes. After describing each issue, I will also
quickly discuss a possible way to mitigate the problem.</p>
<h3 id="horror-1-merges">Horror 1: Merges</h3>
<p>There are a number of projects that build on top of gem5. In fact, I
would argue that this is the main use case for gem5. Everyone that I
know who uses gem5 for research, takes the mainline gem5 and builds
their own changes on top of it.</p>
<p>The problem with this model, where people build on top of gem5, is that
when new features are added or bugs are fixed in the mainline,
downstream users have to consume these changes. If the downstream users
do a good job managing their patch queues, this should be a
straightforward thing to do. However, I have found that even when
careful development practices are followed, merging gem5 changes is
incredibly difficult.</p>
<p>Below I discuss a few specific problems that I have run into when
merging new changes in gem5. I believe the problems can be summed up
with two high-level issued we currently have in gem5.</p>
<p>1) There is no well-defined static API. The interface to different
modules is constantly in a state of flux.
2) The regression suite we have in gem5 has poor coverage. There are
many features that users depend on that are not covered by the
regression tester.</p>
<h4 id="merge-headache-1-pointless-code-changes">Merge headache #1: Pointless code changes</h4>
<p>Examples from Ruby and Slicc and packet.</p>
<h4 id="merge-headache-2-features-break-between-versions">Merge headache #2: Features break between versions</h4>
<p>Ruby backing store, checkpointing</p>
<h4 id="merge-headache-3-apis-are-a-moving-target">Merge headache #3: APIs are a moving target</h4>
<p>Example with the minimal gem5 script.</p>
<h4 id="how-to-mitigate">How to mitigate</h4>
<p>I believe that there are two things we can can do as the gem5
development community to make merging upstream changes much easier.
First, we need a stable set of APIs. Second, we need a robust testing
and regression structure. I discuss some specifics of these two
characteristics below.</p>
<h4 id="stable-apis">Stable APIs</h4>
<p>Today in gem5, it is just as easy to change widely used interfaces, like
the port interface, as it is to change the implementation of a rarely
used function. We need to change this. I think that we need to choose a
set of interfaces and make them stable. This is similar to how the Linux
kernel operates.</p>
<p>Once we have chosen a set of stable interfaces, I’m not suggesting that
they never change, only that it should be more onerous to change stable
APIs than other things. Additionally, this has the added benefit that
“gem5-stable” can actually mean something. We can now have a stable
version, which has non-changing APIs, and a dev version that we can’t
necessarily count on to have constant APIs.</p>
<p>I personally do not know what the API should be. I would like to see the
community come together and talk about what they see as important
interfaces. Then, once we find these interfaces, we can architect these
interfaces and hopefully make gem5 easier to use.</p>
<h4 id="testing-structure">Testing structure</h4>
<p>I do not think that this is a very controversial issue, but gem5 needs a
better regression structure. If all of the features that we used in
gem5-gpu had been part of the regression suite, then we would have had
many less problems.</p>
<p>Again, I do not know exactly how to make the regression suite better,
but I do think a good idea would be to require new features, and bug
fixes, to include a unit-test or something like that. We really need a
softeare engineer to sit down and architect a new regression system.
This would be a great project for someone who is new to the gem5
codebase.</p>
<h3 id="horror-2-configuration-files">Horror 2: Configuration files</h3>
<p>gem5 has an incredibly flexible configuration system. But with
flexibility often come complexity. In fact, I ran SLOCcount on the
configs directory and found there was more than 4000 lines of Python
code. According to the SLOCcount tool, this means there was 16
person-months and a quarter of a million dollars worth of code here!</p>
<p>All of this complexity causes a number of issues. In my talk, I touched
on the fact that the defaults are confusing, and in some cases
inconsistent.</p>
<h4 id="how-to-mitigate-1">How to mitigate</h4>
<p>Since the m5 and GEMS integration, I have noticed a trend that the
number of command line parameters has continued to grow significantly.
It seems that every time a new feature has been added, we have added
some new command line parameters as well. I think this is the wrong way
to do it.</p>
<p>There is an amazing C++-Python wrapper in gem5. We should be taking
advantage of the scripting capabilities of Python.</p>
<p>I have created a simple script that is under 30 lines of Python. I think
we need to encourage our users to script in Python instead of adding
more and more command line parameters. Which, in my experience, really
just leads to scripting in bash instead of in Python anyway.</p>
<h3 id="horror-3-unexpected-results">Horror 3: Unexpected results</h3>
<p>This was a very surprising error that I ran into while working on
creating a homework assignment for a graduate-level computer
architecture course. The point of the homework was to compare the
performance of instruction latency versus instruction throughput. I
wanted the students to take a particular instruction and change the
number of execution units, the latency, and how much the units were
pipelined. To do this, we looked at the divide instruction, since it is
a long latency instruction. Below is the code that we used:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for (int i = 0; i &lt; N; i++) {
Y[i] = X[i] / alpha + Y[i];
}
</code></pre></div></div>
<p>In this code, every divide is totally independent from every other.
Therefore, we would expect that with he out-of-order CPU, that if the
divide is pipelined it the code will speedup by how much the divide unit
is pipelined.</p>
<p>To test this, I looked at two different configurations, a 10 cycle
latency divide with <em>no</em> pipelining, and a 10 cycle latency divide that
is fully pipelined. Below is the data I found for ARM and x86. I only
changed the “obvious” options. Each functional unit has an option for
the execution latency and issue latency. If the issue latency is 1, then
the functional unit is fully pipelined. (Now this is a boolean flag.)
All of the data is relative to x86 with no pipelining.</p>
<p>Configuration Latency Issue lat. x86 Perf ARM Perf
————— ———– ———— ———- ————-
No Pipeline 10 cycles 10 cycles 1.0x 8.0x
Full Pipeline 10 cycles 1 cycle 1.0x 9.6x (1.2x)</p>
<p>There are two very weird results in this data. First, when we fully
pipelined the divide unit, there was no performance change (at all!!) in
x86. Second, when running the exact same cod with ARM, there was a 8x
speedup compared to x86! I find it very hard to believe that the ARM ISA
is inherently better at divide than x86.</p>
<h4 id="how-to-mitigate-2">How to mitigate</h4>
<p>This is a much harder problem to mitigate than the others on this list.
Nilay Vaish has taken a step in the right directions with these two
patches on reviewboard <a href="http://reviews.gem5.org/r/2744/">http://reviews.gem5.org/r/2744/</a> and
<a href="http://reviews.gem5.org/r/2744/">http://reviews.gem5.org/r/2744/</a>, which have been incorporated in gem5.</p>
<p>The underlying problem is that the implementation for ARM and x86 are
totally distinct. It is not clear to me what the right way to unify the
ISA implementation are. As a stop-gap, developers who are working on
implementing x86 features, need to make sure that they perform similarly
to ARM features. Maybe a solution is to have a single set of C programs
which exercise all ISAs and compare the performance across ISAs. There
should be some performance differences, but not an order of magnitude.</p>
<h3 id="horror-4-lack-of-new-user-support">Horror 4: Lack of new-user support</h3>
<h4 id="how-to-mitigate-3">How to mitigate</h4>
<p>What I think we need to do is to create a “gem5 for Dummies” book or a
“Learning gem5” book. This book would be similar to Learning Python or
Learning Mercurial. The book would be open source for anyone to
contribute to. In fact, it should be required to update the book if a
developer makes an API-breaking change.</p>
<p>An initial implementation of this book, which currently only includes
about a chapter of “getting started” and is in fact already out of date
can be found here:
<a href="http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/gem5-tutorial/index.html">gem5-tutorial</a>.
I began working on this in conjunction with the graduate computer
architecture class at Wisconsin, so it may currently have some
Wisconsin-specific text. I hope to continue working on this in my
<em>copious free time</em>.</p>
<p>There are many other horrors that other people experience as well. Here
I only discussed some of the horrors that I have heard people
discussing. The purpose in presenting these horrors is not to say that
gem5 is a bad simulator! The purpose is to highlight how there are
currently issues that need to be addressed by the gem5 development
community.</p>
<h2 id="what-can-we-do-about-it">What can we do about it?</h2>
<p>A lot of the problems that I have discussed above come down to poor
software engineering. And yes, we are architects, not software
engineers, and there are a lot of things we could do better if we just
focused on software engineering. However, I do not think that this is
the underlying issue.</p>
<p>I believe these four horror stem from two systemic problems in the gem5
development community.</p>
<p>1) There is no formal governance model.
2) The gem5 developers do not think of the user first.</p>
<p>I believe that if we start to solve these high-level issues, gem5 will
be a much better tool for everyone. Next, I discuss one possible way to
address these two points.</p>
<h2 id="gem5-foundation">gem5 Foundation</h2>
<p>First, I want to say that I do not believe this is the only way, or the
right way, to move gem5 forward. This is one possibility that I believe
will make gem5 a better tool. I hope that this is a place to begin the
discussion and I am sure that others in our community can come up with
even better suggestions that this!</p>
<p><em>I think we should create a gem5 Foundation.</em> The gem5 Foundation will
be the center for the gem5 community. It will be a formal way for the
community to set goals and push gem5 forward.</p>
<p>There are two main things I think the gem5 Foundation can help us with.
It can set up a formal governance structure and be a place for outside
interests to contribute money towards making gem5 better for everyone.</p>
<h3 id="formalizing-a-governance-structure">Formalizing a governance structure</h3>
<p>First, we need a governance structure. This is a document which defines
how decisions are made in the community, what matters to the community,
how to contribute to the community, etc.</p>
<p>There is a lot of documentation on how to write governance models and
what they are. <a href="http://oss-watch.ac.uk/">OSS-Watch</a> is a great source
for this. Here is a link to a definition of a governance model, which
does a much better job that I can explaining it.
<a href="http://oss-watch.ac.uk/resources/governancemodels">http://oss-watch.ac.uk/resources/governancemodels</a> Additionally, here
is a link to an example governance model from an academic open-source
project:
<a href="http://www.taverna.org.uk/about/legal-stuff/taverna-governance-model/">http://www.taverna.org.uk/about/legal-stuff/taverna-governance-model/</a></p>
<h3 id="money-money-money">Money, Money, Money</h3>
<p>What I think the main solution to all of these problems is to pay
software developers <em>not computer architects!</em> to solve some of these
problems. Already, within ARM and AMD there are a number of people who
get paid to work on gem5. However, these companies do not have gem5’s
best interests as their key focus. Their focus is what ARM and AMD find
interesting.</p>
<p>So, I think that if we have something like the gem5 Foundation, these
companies and academia, can donate money towards coding things that are
good for the community as a whole. The gem5 Foundation can hire software
engineers to work on the parts of gem5 that grad students and
researchers do not want to do. If you look at other academic
communities, they often hire non researchers to do the “grunt work”.
Overall, I think this is a good idea for computer architects too, and
specifically for gem5.</p>
<p>I recognize that this may be a crazy idea. I would love to hear what
others think. I am sure we will have some interesting discussion at the
gem5 workshop, and hopefully I will write another post with what other
people thought! Feel free to leave comments below.</p>
</div>
</main>
<footer class="page-footer">
<div class="container">
<div class="row">
<div class="col-12 col-sm-4">
<p>gem5</p>
<p><a href="/about">About</a></p>
<p><a href="/publications">Publications</a></p>
<p><a href="/contributing">Contributing</a></p>
<p><a href="/governance">Governance</a></p>
<br></div>
<div class="col-12 col-sm-4">
<p>Docs</p>
<p><a href="/introduction">Documentation</a></p>
<p><a href="http://gem5.org/Documentation">Old Documentation</a></p>
<p><a href="https://gem5.googlesource.com/public/gem5">Source</a></p>
<br></div>
<div class="col-12 col-sm-4">
<p>Help</p>
<p><a href="/search">Search</a></p>
<p><a href="#">Mailing Lists</a></p>
<p><a href="https://github.com/gem5/new-website/tree/master/">Source For This Site</a></p>
<br></div>
</div>
</div>
</footer>
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
<script>
// When the user scrolls down 20px from the top of the document, show the button
window.onscroll = function() {scrollFunction()};
function scrollFunction() {
if (document.body.scrollTop > 100 || document.documentElement.scrollTop > 20) {
document.getElementById("myBtn").style.display = "block";
} else {
document.getElementById("myBtn").style.display = "none";
}
}
// When the user clicks on the button, scroll to the top of the document
function topFunction() {
document.body.scrollTop = 0;
document.documentElement.scrollTop = 0;
}
</script>
</body>
</html>