| <?xml version="1.0" encoding="UTF-8"?> |
| <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> |
| <channel> |
| <title>gem5</title> |
| <description></description> |
| <link>http://localhost:4000/</link> |
| <atom:link href="http://localhost:4000/feed.xml" rel="self" type="application/rss+xml" /> |
| <pubDate>Mon, 21 Jan 2019 12:53:57 -0800</pubDate> |
| <lastBuildDate>Mon, 21 Jan 2019 12:53:57 -0800</lastBuildDate> |
| <generator>Jekyll v3.7.4</generator> |
| |
| <item> |
| <title>Visualizing Spectre with gem5</title> |
| <description><p><a href="https://meltdownattack.com/">Spectre and Meltdown</a> took much of our |
| community by surprise. I personally found these attacks fascinating |
| because they didn’t rely on a <em>bug</em> in any particular hardware |
| implementation, but leveraged undefined behavior. Specifically, Spectre |
| and Meltdown can exfiltrate potentially secret memory data by detecting |
| the effects of speculative instructions <em>that are later squashed</em>.</p> |
| |
| <p>Very cool!</p> |
| |
| <p>Out of order processors are very complex. It would make it easier to |
| understand exactly what causes speculation attacks like Spectre and |
| Meltdown if we had a way to <em>visualize</em> the attacks. Luckily, gem5 |
| already has a way to view the details of it’s out of order CPU’s |
| pipeline.</p> |
| |
| <p><img src="/assets/img/o3-example.png" alt="o3 pipeline view example" /></p> |
| |
| <p>The image above was created using the O3 pipeline viewer that is |
| included with gem5. In this post, I’ll explain how to use the O3 |
| pipeline viewer and how to generate images like the above. There is also |
| a new project which makes it easier to navigate large pipeline traces |
| and it is useful for comparing different pipeline designs: |
| <a href="https://github.com/shioyadan/Konata">Konata</a> created by Ryota Shioya. |
| Ryota gave a presentation on Konata at a recent <a href="http://learning.gem5.org/tutorial/index.html">Learning gem5 |
| tutorial</a>. You can find |
| the pdf of his presentation |
| <a href="http://learning.gem5.org/tutorial/presentations/vis-o3-gem5.pdf">here</a>. |
| Konata is a cool tool that’s written in javascript and Ryota describes |
| it as “Google maps for an out of order pipeline”.</p> |
| |
| <h2 id="running-spectre">Running Spectre</h2> |
| |
| <p>The first step to visualizing what is going on in the pipeline during a |
| Spectre attack is getting proof of concept exploit code. I used the code |
| that was posted to a github gist by Erik August soon after the attack |
| was announced. You can get that code here: |
| <a href="https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6">https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6</a>.</p> |
| |
| <p>First, you need to compile the proof of concept code on your native |
| machine (note: I’ll be using x86 for all of my examples).</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc spectre.c -o spectre -static |
| </code></pre></div></div> |
| |
| <p>I used gcc 7.2 (the default on Ubuntu 17.10) for my tests, and you may |
| want to do the same. <a href="#effects-of-compilers">Below</a> I discuss the |
| effects different compilers have on the Specre attack. For instance, if |
| you use clang instead you may not be able to reproduce the Spectre |
| attack in gem5.</p> |
| |
| <p>My native machine is still vulnerable to Spectre so when I run the |
| binary generated above, I get the following output.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Reading 40 bytes: |
| Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x50=’P’ score=9 (second best: 0x06 score=2) |
| Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2 |
| </code></pre></div></div> |
| |
| <h3 id="running-spectre-in-gem5">Running Spectre in gem5</h3> |
| |
| <p>To find out if gem5’s out of order CPU implementation is vulnerable to |
| Spectre, we need to run the code in gem5. The simplest and fastest way |
| to do this is by running in gem5’s syscall-emulation (SE) mode. In SE |
| mode we won’t be modeling an OS or any user-mode to kernel-mode |
| interaction, but this okay for Spectre since this proof of concept code |
| is all in user-mode. If we were investigating Metldown, we would have to |
| use full-system (FS) mode since Meltdown specifically allows user-mode |
| processes to read data that should only be accessible in kernel mode.</p> |
| |
| <p>So, when running something in gem5, the first step is to create a Python |
| runscript since this is <a href="http://learning.gem5.org/book/part1/simple_config.html">the “interface” to |
| gem5</a>. For this |
| example, what we need is a system with one CPU, an L1 cache, and memory. |
| For simplicity, I’m going to modify one of the existing script, |
| specifically the <code class="highlighter-rouge">two_level.py</code> script from the <a href="http://learning.gem5.org/">Learning gem5 |
| book</a>.</p> |
| |
| <p>In the file <code class="highlighter-rouge">gem5/configs/learning_gem5/part1/two_level.py</code>, I simply |
| changed the CPU from <code class="highlighter-rouge">TimingSimpleCPU()</code> to |
| <code class="highlighter-rouge">DerivO3CPU(branchPred=LTAGE())</code>. I also set the O3CPU to use the LTAGE |
| branch predictor instead of the default tournament branch predictor. |
| It’s important to use the LTAGE branch predictor as better branch |
| predictors actually make Spectre easier to exploit as discussed further |
| <a href="#effects-of-branch-predictor">below</a>.</p> |
| |
| <p>Now, we simply need to build gem5 and run it.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>scons -j8 build/X86/gem5.opt |
| |
| build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre |
| </code></pre></div></div> |
| |
| <p>And, the output that I get is the following, just like above when I ran |
| the <code class="highlighter-rouge">spectre</code> natively.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem5 Simulator System. http://gem5.org |
| gem5 is copyrighted software; use the --copyright option for details. |
| |
| gem5 compiled May 10 2018 09:40:08 |
| gem5 started May 24 2018 11:21:16 |
| gem5 executing on palisade, pid 27173 |
| command line: build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre |
| |
| Global frequency set at 1000000000000 ticks per second |
| warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) |
| 0: system.remote_gdb: listening for remote gdb on port 7000 |
| Beginning simulation! |
| info: Entering event queue @ 0. Starting simulation... |
| warn: readlink() called on '/proc/self/exe' may yield unexpected results in various settings. |
| Returning '/home/jlp/Code/gem5/spectre-vis/spectre' |
| info: Increasing stack size by one page. |
| warn: ignoring syscall access(...) |
| Reading 40 bytes: tput cols |
| Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x4F=’O’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2 |
| Exiting @ tick 113568969000 because exiting with last active thread context |
| </code></pre></div></div> |
| |
| <h2 id="visualizing-the-out-of-order-pipeline">Visualizing the out of order pipeline</h2> |
| |
| <p>To generate pipeline visualizations, we first need to generate a trace |
| file of all of the instructions executed by the out of order CPU. To |
| create this trace, we can use the <code class="highlighter-rouge">O3PipeView</code> debug flag.</p> |
| |
| <p>Now, the trace for the O3 CPU can be <em>very</em> large, up to many GBs. When |
| creating this trace, you need to be careful to create the smallest trace |
| possible. Also, it’s important to dump the trace to a file and not to |
| <code class="highlighter-rouge">stdout</code>, which is the default when using debug flags. You can redirect |
| the trace to a file by using the <code class="highlighter-rouge">--debug-file</code> option to gem5.</p> |
| |
| <p>To create the trace file, I used the following methodology:</p> |
| |
| <ol> |
| <li>Start running spectre in gem5, then hit ctrl-c after the first |
| couple of letters. At this point, I wrote down the tick which gem5 |
| exited (13062347000 for me).</li> |
| <li>Run gem5 with the debug flag <code class="highlighter-rouge">O3PipeView</code> enabled.</li> |
| <li>Watch the output and kill gem5 with ctrl-c after two more letters |
| appeared than in step 1.</li> |
| </ol> |
| |
| <p>To generate the trace, I ran the following command. Note: you may have a |
| different value for when to start the debugging trace. Also note: when |
| producing the trace gem5 will run <em>much</em> slower.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build/X86/gem5.opt --debug-flags=O3PipeView --debug-file=pipeview.txt --debug-start=13062347000 configs/learning_gem5/part1/two_level.py spectre |
| </code></pre></div></div> |
| |
| <p>My tracefile (<code class="highlighter-rouge">pipeview.txt</code>) was 600 MB for catching just two letters |
| in the output.</p> |
| |
| <p>Now, we can process this file to generate the visualization with a |
| script: <code class="highlighter-rouge">util/o3-pipeview.py</code>. This script requires the path to the file |
| that contains the output generated with the <code class="highlighter-rouge">O3PipeView</code> debug flag. |
| Above, we put the output into the file <code class="highlighter-rouge">pipeview.txt</code>, and this file was |
| created in the default output directory of gem5 (<code class="highlighter-rouge">m5out/</code>).</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>util/o3-pipeview.py --store_completions m5out/pipeview.txt --color -w 150 |
| </code></pre></div></div> |
| |
| <p>In the above command, I wanted to see when the stores completed |
| (<code class="highlighter-rouge">--store_completions</code>) and specified to use color (<code class="highlighter-rouge">--color</code>) in the |
| output and use a width of 150 characters (<code class="highlighter-rouge">-w 150</code>). Processing a large |
| file like this one of 600 MB may take a few minutes. The output will be |
| in a file called <code class="highlighter-rouge">o3-pipeview.out</code> in the current working directory.</p> |
| |
| <p>You can view this file with <code class="highlighter-rouge">less -r o3-pipeview.out</code>. You may want to |
| use the <code class="highlighter-rouge">-S</code> option with less if your terminal is less than 150 |
| characters wide (or whatever width value you used). Below is a |
| screenshot of the top of my trace.</p> |
| |
| <h3 id="understanding-the-o3-pipeline-viewer">Understanding the O3 pipeline viewer</h3> |
| |
| <p><img src="/assets/img/o3-example-annotated.png" alt="o3 pipeline view example" /></p> |
| |
| <p>The above image details how to interpret the output from the pipeline |
| viewer. Each <code class="highlighter-rouge">.</code> or <code class="highlighter-rouge">=</code> represents one cycle of time, which moves from |
| left to right. The “tick” column shows the tick of the leftmost <code class="highlighter-rouge">.</code> or |
| <code class="highlighter-rouge">=</code>. <code class="highlighter-rouge">=</code> is used to mark the instructions that were later squashed. The |
| address of the instruction (and the micro-op number) as well as the |
| disassembly is also shown. The sequence number can be ignored as it is |
| always monotonically increasing and is the total order of every dynamic |
| instruction. Finally, each stage of the O3 pipeline is shown with a |
| different letter and color.</p> |
| |
| <h2 id="digging-deeper-into-spectre">Digging deeper into Spectre</h2> |
| |
| <p>First, let’s examine the actual instructions that are executed during |
| the Spectre attack. The vulnerability is in the <code class="highlighter-rouge">victim_function</code> in |
| <code class="highlighter-rouge">spectre.c</code>.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| if (x &lt; array1_size) { |
| temp &amp;= array2[array1[x] * 512]; |
| } |
| } |
| </code></pre></div></div> |
| |
| <p>When this is compiled and then dumped with <code class="highlighter-rouge">objdump</code>, we get the |
| following instructions that will be executed. Your code my be slightly |
| different, especially the exact addresses of each instruction, depending |
| on the version of the compiler and other system-specific configurations.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># NOTE: the movzbl below is MOVZX_B_R_M in gem5. |
| # it is implemented with the following microcode. |
| # ld t1, seg, sib, disp, dataSize=1 |
| # zexti reg, t1, 7 |
| # |
| 000000000040105e &lt;victim_function&gt;: |
| 40105e: 55 push %rbp |
| 40105f: 48 89 e5 mov %rsp,%rbp |
| 401062: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| 401066: 8b 05 14 f0 2b 00 mov 0x2bf014(%rip),%eax # 6c0080 &lt;array1_size&gt; load array1_size (first time is always a miss) |
| 40106c: 89 c0 mov %eax,%eax |
| 40106e: 48 3b 45 f8 cmp -0x8(%rbp),%rax # if (x &lt; array1_size) rax is array1_size, -8(%rbp) is x |
| 401072: 76 2b jbe 40109f &lt;victim_function+0x41&gt; # if (x &lt; array1_size) |
| 401074: 48 8b 45 f8 mov -0x8(%rbp),%rax # load x from the stack into rax |
| 401078: 48 05 a0 00 6c 00 add $0x6c00a0,%rax # calculate array1 offset (x+array1) |
| 40107e: 0f b6 00 movzbl (%rax),%eax # load array1[x] |
| 401081: 0f b6 c0 movzbl %al,%eax # zero extend to 32 bits |
| 401084: c1 e0 09 shl $0x9,%eax # multiply by 512 |
| 401087: 48 98 cltq # sign-extend eax |
| 401089: 0f b6 90 80 1d 6c 00 movzbl 0x6c1d80(%rax),%edx # load array2[array1[x]*512] **** This is the magic! |
| 401090: 0f b6 05 e9 0c 2e 00 movzbl 0x2e0ce9(%rip),%eax # 6e1d80 &lt;temp&gt; Load temp. |
| 401097: 21 d0 and %edx,%eax |
| 401099: 88 05 e1 0c 2e 00 mov %al,0x2e0ce1(%rip) # 6e1d80 &lt;temp&gt; |
| 40109f: 5d pop %rbp |
| 4010a0: c3 retq |
| </code></pre></div></div> |
| |
| <p>Now, we can search for the instruction that we care about in the trace. |
| In this case, we want to find a time where the <code class="highlighter-rouge">movzbl</code> at address |
| <code class="highlighter-rouge">0x401089</code> is executed speculatively. When searching through the |
| pipeline viewer (use <code class="highlighter-rouge">\</code> in less), we’re looking for a time where the |
| load completes for the instruction at <code class="highlighter-rouge">0x401089</code> and it is later |
| squashed (surrounded by <code class="highlighter-rouge">=</code>). An example is shown below.</p> |
| |
| <p><img src="/assets/img/o3-spectre-annotated.png" alt="annotated O3 pipeline view of |
| spectre" /></p> |
| |
| <p>The image above is from my presentation at <a href="http://caslab.csl.yale.edu/workshops/hasp2018/">Hardware and Architectural |
| Support for Security and Privacy (HASP) |
| 2018</a>.</p> |
| |
| <p>What we see in this image is that the instruction at <code class="highlighter-rouge">0x401066</code> causes a |
| cache miss (there is a long time between when the load is issued and the |
| data is returned from memory). Since the load of <code class="highlighter-rouge">array1_size</code> was a |
| cache miss, the jump at <code class="highlighter-rouge">0x401072</code> is speculated to be <em>not</em> taken |
| (incorrectly). This causes the following instructions to be executed |
| speculatively, and, eventually, squashed.</p> |
| |
| <p>The key thing in this trace that <em>is</em> the Spectre vulnerability is that |
| the load for the instruction at <code class="highlighter-rouge">0x40107e</code>, which loads secret data |
| happens during the mis-speculated instructions. Then, this data is |
| loaded into the registers and operated on (instruction <code class="highlighter-rouge">0x401084</code>). |
| Finally, the load at address <code class="highlighter-rouge">0x401089</code> is executed and loads the value |
| from memory <em>that is dependent on the secret data loaded previously</em>. |
| Thus, we can later probe the cache to retrieve the secret data.</p> |
| |
| <h3 id="effects-of-compilers">Effects of compilers</h3> |
| |
| <p>As previously mentioned, the specific compiler version and compiler |
| options have a significant effect on the attack. Below are two traces, |
| one from GCC 7.2 and one from clang 4.0.</p> |
| |
| <h4 id="gcc-72">GCC 7.2</h4> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| 400b2d: 55 push %rbp |
| 400b2e: 48 89 e5 mov %rsp,%rbp |
| 400b31: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| if (x &lt; array1_size) { |
| 400b35: 8b 05 c5 c5 2c 00 mov 0x2cc5c5(%rip),%eax # 6cd100 &lt;array1_size&gt; |
| 400b3b: 89 c0 mov %eax,%eax |
| 400b3d: 48 39 45 f8 cmp %rax,-0x8(%rbp) |
| 400b41: 73 34 jae 400b77 &lt;victim_function+0x4a&gt; |
| temp &amp;= array2[array1[x] * 512]; |
| 400b43: 48 8d 15 d6 c5 2c 00 lea 0x2cc5d6(%rip),%rdx # 6cd120 &lt;array1&gt; |
| 400b4a: 48 8b 45 f8 mov -0x8(%rbp),%rax |
| 400b4e: 48 01 d0 add %rdx,%rax |
| 400b51: 0f b6 00 movzbl (%rax),%eax |
| 400b54: 0f b6 c0 movzbl %al,%eax |
| 400b57: c1 e0 09 shl $0x9,%eax |
| 400b5a: 48 63 d0 movslq %eax,%rdx |
| 400b5d: 48 8d 05 9c f6 2c 00 lea 0x2cf69c(%rip),%rax # 6d0200 &lt;array2&gt; |
| 400b64: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx |
| 400b68: 0f b6 05 91 e1 2c 00 movzbl 0x2ce191(%rip),%eax # 6ced00 &lt;temp&gt; |
| 400b6f: 21 d0 and %edx,%eax |
| 400b71: 88 05 89 e1 2c 00 mov %al,0x2ce189(%rip) # 6ced00 &lt;temp&gt; |
| } |
| } |
| 400b77: 90 nop |
| 400b78: 5d pop %rbp |
| 400b79: c3 retq |
| </code></pre></div></div> |
| |
| <iframe height="500" src="/assets/img/gcc72-static-tage.html" frameborder="0"> |
| </iframe> |
| <p>However, clang generates the following code.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| 400ac0: 55 push %rbp |
| 400ac1: 48 89 e5 mov %rsp,%rbp |
| 400ac4: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| if (x &lt; array1_size) { |
| 400ac8: 48 8b 7d f8 mov -0x8(%rbp),%rdi |
| 400acc: 8b 04 25 90 c0 6c 00 mov 0x6cc090,%eax |
| 400ad3: 89 c1 mov %eax,%ecx |
| 400ad5: 48 39 cf cmp %rcx,%rdi |
| 400ad8: 0f 83 2f 00 00 00 jae 400b0d &lt;victim_function+0x4d&gt; |
| temp &amp;= array2[array1[x] * 512]; |
| 400ade: 48 8b 45 f8 mov -0x8(%rbp),%rax |
| 400ae2: 0f b6 0c 05 a0 c0 6c movzbl 0x6cc0a0(,%rax,1),%ecx |
| 400ae9: 00 |
| 400aea: c1 e1 09 shl $0x9,%ecx |
| 400aed: 48 63 c1 movslq %ecx,%rax |
| 400af0: 0f b6 0c 05 40 f2 6c movzbl 0x6cf240(,%rax,1),%ecx |
| 400af7: 00 |
| 400af8: 0f b6 14 25 50 dc 6c movzbl 0x6cdc50,%edx |
| 400aff: 00 |
| 400b00: 21 ca and %ecx,%edx |
| 400b02: 40 88 d6 mov %dl,%sil |
| 400b05: 40 88 34 25 50 dc 6c mov %sil,0x6cdc50 |
| 400b0c: 00 |
| } |
| } |
| 400b0d: 5d pop %rbp |
| 400b0e: c3 retq |
| 400b0f: 90 nop |
| </code></pre></div></div> |
| |
| <iframe height="500" src="/assets/img/clang-static-tage.html" frameborder="0"> |
| </iframe> |
| <p>Interestingly, the clang-compiled <code class="highlighter-rouge">spectre</code> binary is not able to read |
| the secret data! (At least not in gem5. It is able to read the secret |
| data on my native machine.)</p> |
| |
| <p>We can look into the two traces to see the difference between the clang |
| version and the GCC version.</p> |
| |
| <p>The main difference is that in the clang version, the load generated by |
| the instruction at <code class="highlighter-rouge">0x400af0</code> never completes (and thus, must not have |
| been issued to the memory system).</p> |
| |
| <p>I’m not sure the exact cause of this difference. It could be that the |
| instruction uses a different addressing mode |
| (<code class="highlighter-rouge">movzbl 0x6cf240(,%rax,1),%ecx</code> in clang vs <code class="highlighter-rouge">movzbl (%rdx,%rax,1),%edx</code> |
| in GCC). If you have ideas, please leave a comment!</p> |
| |
| <p>Either way, minor differences in the code generated can have large |
| impacts on the speculative execution!</p> |
| |
| <h3 id="effects-of-branch-predictor">Effects of branch predictor</h3> |
| |
| <p>When I was first playing around with Spectre and gem5, I ran into a |
| problem where I could only <em>sometimes</em> get Spectre to “work” with the |
| out of order CPU. After significant digging, I found that the branch |
| predictor chosen makes a big difference to how quickly the vulnerability |
| happens. The trace below (with the same code as GCC 4.8 above) shows |
| what happens when using the tournament branch predictor.</p> |
| |
| <iframe height="500" src="/assets/img/gcc-static-tourn.html" frameborder="0"> |
| </iframe> |
| <p>Here, we see that the original branch misprediction comes much earlier |
| than the jump instruction in <code class="highlighter-rouge">victim_function</code> that is at address |
| <code class="highlighter-rouge">0x401072</code>. Thus, by the time the load instructions in <code class="highlighter-rouge">victim_function</code> |
| are executed, the ROB and load-store queue resources have been taken by |
| other instructions and the rogue loads are not issued to memory. There |
| are still a few times that the two loads are executed speculatively, but |
| it is much more rare than with the TAGE predictor. When using the TAGE |
| branch predictor, only the exact branch that the attacker wants to |
| mispredict is mispredicted.</p> |
| |
| <p>This interestingly shows that a “smarter” system is actually <em>more</em> |
| vulnerable to speculation-based attacks!</p> |
| </description> |
| <pubDate>Fri, 01 Jun 2018 00:00:00 -0700</pubDate> |
| <link>http://localhost:4000/2018/06/01/gem5-spectre.html</link> |
| <guid isPermaLink="true">http://localhost:4000/2018/06/01/gem5-spectre.html</guid> |
| |
| |
| </item> |
| |
| <item> |
| <title>Setting up gem5 full system</title> |
| <description><p>This is partially a followup to <a href="http://www.lowepower.com/jason/creating-disk-images-for-gem5.html">Creating disk images for |
| gem5</a> |
| and partially how to setup x86 full system for gem5. In this post, I’ll |
| discuss how to create a disk image from scratch and start using it with |
| gem5.</p> |
| |
| <p>It is important for computer architecture research to use the most |
| up-to-date software on the systems we are simulating. Too much computer |
| architecture research reports results using kernels from 5+ years ago or |
| ancient system software Hopefully, this post will help others be able to |
| keep up with the ever-changing system software. This way, researchers |
| can use up-to-date versions of Linux and easily update their kernels.</p> |
| |
| <p>This post takes a different approach than <a href="http://www.lowepower.com/jason/creating-disk-images-for-gem5.html">Creating disk images for |
| gem5</a>. |
| Instead of using the gem5 tools, this post uses qemu to create, edit, |
| and set up the disk for gem5 usage.</p> |
| |
| <p>This post assumes that you have installed qemu on your system. In |
| Ubuntu, this can be done with</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils |
| </code></pre></div></div> |
| |
| <p>I also assume you have downloaded and built gem5. All of the full system |
| examples use the simple full system scripts that are covered in |
| <a href="http://learning.gem5.org/book/part3/index.html">Learning gem5</a>.</p> |
| |
| <h2 id="step-1-create-an-empty-disk">Step 1: Create an empty disk</h2> |
| |
| <p>Using the qemu disk tools, create a blank raw disk image. In this case, |
| I chose to create a disk named “ubuntu-test.img” that is 8GB.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qemu-img create ubuntu-test.img 8G |
| </code></pre></div></div> |
| |
| <h2 id="step-2-install-ubuntu-with-qemu">Step 2: Install ubuntu with qemu</h2> |
| |
| <p>Now that we have a blank disk, we are going to use qemu to install |
| Ubuntu on the disk. I would encourage you to use the server version of |
| Ubuntu since gem5 does not have great support for displays. Thus, the |
| desktop environment isn’t very useful.</p> |
| |
| <p>First, you need to download the installation CD image from the <a href="https://www.ubuntu.com/download/server">Ubuntu |
| website</a>.</p> |
| |
| <p>Next, use qemu to boot off of the CD image, and set the disk in the |
| system to be the blank disk you created above. Ubuntu needs at least 1GB |
| of memory to install correctly, so be sure to configure qemu to use at |
| least 1GB memory.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qemu-system-x86_64 -hda ../gem5-fs-testing/ubuntu-test.img -cdrom ubuntu-16.04.1-server-amd64.iso -m 1024 -enable-kvm -boot d |
| </code></pre></div></div> |
| |
| <p>With this, you can simply follow the on-screen directions to install |
| Ubuntu to the disk image. The only gotcha in the installation is that |
| gem5’s IDE drivers don’t seem to play nicely with logical paritions. |
| Thus, during the Ubuntu install, be sure to manually partition the disk |
| and remove any logical partitions. You don’t need any swap space on the |
| disk anyway, unless you’re doing something specifically with swap space.</p> |
| |
| <h2 id="step-3-boot-up-and-install-needed-software">Step 3: Boot up and install needed software</h2> |
| |
| <p>Once you have installed Ubuntu on the disk, quit qemu and remove the |
| <code class="highlighter-rouge">-boot d</code> option so that you are not booting off of the CD anymore. Now, |
| you can again boot off of the main disk image you have installed Ubuntu |
| on.</p> |
| |
| <p>Since we’re using qemu, you should have a network connection (although |
| <a href="http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29">ping won’t |
| work</a>). |
| When booting in qemu, you can just use <code class="highlighter-rouge">sudo apt-get install</code> and |
| install any software you need on your disk.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qemu-system-x86_64 -hda ../gem5-fs-testing/ubuntu-test.img -cdrom ubuntu-16.04.1-server-amd64.iso -m 1024 -enable-kvm |
| </code></pre></div></div> |
| |
| <h2 id="step-4-build-a-kernel">Step 4: Build a kernel</h2> |
| |
| <p>Next, you need to build a Linux kernel. Unfortunately, the |
| out-of-the-box Ubuntu kernel doesn’t play well with gem5. See the |
| error below_.</p> |
| |
| <p>First, you need to download latest kernel from |
| <a href="https://www.kernel.org/">kernel.org</a>. Then, to build the kernel, you |
| are going to want to start with a known-good config file. |
| The config file that I’m used for kernel version 4.8.13 can be |
| downloaded <a href="{filename}files/config">here</a>. Then, you need to move the |
| good config to <code class="highlighter-rouge">.config</code> and the run <code class="highlighter-rouge">make oldconfig</code> which starts the |
| kernel configuration process with an existing config file.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mv &lt;good config&gt; .config |
| make oldconfig |
| </code></pre></div></div> |
| |
| <p>At this point you can select any extra drivers you want to build into |
| the kernel. Note: You cannot use any kernel modules unless you are |
| planning on copying the modules onto the guest disk at the correct |
| location. All drivers must be built into the kernel binary.</p> |
| |
| <p>It may be possible to use modules by compiling the binary on the guest |
| disk via qemu, but I have not tested this.</p> |
| |
| <p>Finally, you need to build the kernel.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make -j5 |
| </code></pre></div></div> |
| |
| <h2 id="step-5-update-init-script">Step 5: Update init script</h2> |
| |
| <p>By default, gem5 expects a modified init script which loads a script off |
| of the host to execute in the guest. To use this feature, you need to |
| follow the steps below.</p> |
| |
| <p>Alternatively, you can install the precompiled binaries for x86 found on |
| my website: From qemu, you can run the following, which completes the |
| above steps for you.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget http://cs.wisc.edu/~powerjg/files/gem5-guest-tools-x86.tgz |
| tar xzvf gem5-guest-tools-x86.tgz |
| cd gem5-guest-tools/ |
| sudo ./install |
| </code></pre></div></div> |
| |
| <p>Now, you can use the <code class="highlighter-rouge">system.readfile</code> parameter in your Python config |
| scripts. This file will automatically be loaded (by the <code class="highlighter-rouge">gem5init</code> |
| script) and executed.</p> |
| |
| <h3 id="manually-installing-the-gem5-init-script">Manually installing the gem5 init script</h3> |
| |
| <p>First, build the m5 binary on the host.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd util/m5 |
| make -f Makefile.x86 |
| </code></pre></div></div> |
| |
| <p>Then, copy this binary to the guest and put it in <code class="highlighter-rouge">/sbin</code>. Also, create |
| a link from <code class="highlighter-rouge">/sbin/gem5</code>.</p> |
| |
| <p>Then, to get the init script to execute when gem5 boots, create file |
| /lib/systemd/system/gem5.service with the following:</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Unit] |
| Description=gem5 init script |
| Documentation=http://gem5.org |
| After=getty.target |
| |
| [Service] |
| Type=idle |
| ExecStart=/sbin/gem5init |
| StandardOutput=tty |
| StandardInput=tty-force |
| StandardError=tty |
| |
| [Install] |
| WantedBy=default.target |
| </code></pre></div></div> |
| |
| <p>Enable the gem5 service and disable the ttyS0 service.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>systemctl enable gem5.service |
| </code></pre></div></div> |
| |
| <p>Finally, create the init script that is executed by the service. In |
| <code class="highlighter-rouge">/sbin/gem5init</code>:</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash -</span> |
| |
| <span class="nv">CPU</span><span class="o">=</span><span class="sb">`</span><span class="nb">cat</span> /proc/cpuinfo | <span class="nb">grep </span>vendor_id | head <span class="nt">-n</span> 1 | cut <span class="nt">-d</span> <span class="s1">' '</span> <span class="nt">-f2-</span><span class="sb">`</span> |
| <span class="nb">echo</span> <span class="s2">"Got CPU type: </span><span class="nv">$CPU</span><span class="s2">"</span> |
| |
| <span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$CPU</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"M5 Simulator"</span> <span class="o">]</span><span class="p">;</span> |
| <span class="k">then |
| </span><span class="nb">echo</span> <span class="s2">"Not in gem5. Not loading script"</span> |
| <span class="nb">exit </span>0 |
| <span class="k">fi</span> |
| |
| <span class="c"># Try to read in the script from the host system</span> |
| /sbin/m5 readfile <span class="o">&gt;</span> /tmp/script |
| chmod 755 /tmp/script |
| <span class="k">if</span> <span class="o">[</span> <span class="nt">-s</span> /tmp/script <span class="o">]</span> |
| <span class="k">then</span> |
| <span class="c"># If there is a script, execute the script and then exit the simulation</span> |
| su root <span class="nt">-c</span> <span class="s1">'/tmp/script'</span> <span class="c"># gives script full privileges as root user in multi-user mode</span> |
| sync |
| sleep 10 |
| /sbin/m5 <span class="nb">exit |
| </span><span class="k">fi |
| </span><span class="nb">echo</span> <span class="s2">"No script found"</span> |
| </code></pre></div></div> |
| |
| <h2 id="problems-and-some-solutions">Problems and (some) solutions</h2> |
| |
| <h3 id="failed-to-early-mount-api-filesystems">Failed to early mount API filesystems</h3> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Write protecting the kernel read-only data: 8192k |
| Freeing unused kernel memory: 1956K (ffff880001417000 - ffff880001600000) |
| Freeing unused kernel memory: 456K (ffff88000178e000 - ffff880001800000) |
| [!!!!!!] Failed to early mount API filesystems, freezing. |
| </code></pre></div></div> |
| |
| <p>Solutions tried: Enable cgroups in the kernel. I think. Nope! I think |
| this is the same as the problem below mount-problem_.</p> |
| |
| <h3 id="cant-mount-dev">Can’t mount /dev</h3> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Failed to mount devtmpfs at /dev: No such device |
| Freezing execution. |
| </code></pre></div></div> |
| |
| <p>Something like the above (this was taken from arch linux boot). The |
| problem is that that the right devfs is not compiled into the kernel. |
| You need to make sure that devtmpfs is enabled.</p> |
| |
| <h3 id="panic-kvm-unexpected-exit-exit_reason-8">panic: KVM: Unexpected exit (exit_reason: 8)</h3> |
| |
| <p>Exit reason 8 is “shutdown”. See |
| <a href="http://lxr.free-electrons.com/source/include/uapi/linux/kvm.h#L188">http://lxr.free-electrons.com/source/include/uapi/linux/kvm.h#L188</a>. |
| This seems to happen when there is a triple fault: |
| <a href="http://lxr.free-electrons.com/source/arch/x86/kvm/x86.c#L6498">http://lxr.free-electrons.com/source/arch/x86/kvm/x86.c#L6498</a></p> |
| |
| <p>I get this error every time I try to boot the unmodified Ubuntu kernel. |
| I don’t know how to solve this problem. Instead of trying to solve the |
| problem, I used a different config file for “oldconfig” when I compiled |
| the kernel from scratch.</p> |
| |
| <h3 id="slow-boot">Slow boot</h3> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[ TIME ] Timed out waiting for device dev-di...\x2da115\x2de3f263d7b53a.device. |
| [DEPEND] Dependency failed for /dev/disk/by-...382-f41d-4c99-a115-e3f263d7b53a. |
| [DEPEND] Dependency failed for Swap. |
| </code></pre></div></div> |
| |
| <p>This may happen if you have changed the disk without updating the fstab |
| on the disk. To fix it, you can boot the disk in qemu and update fstab |
| with the correct UUID.</p> |
| |
| <p>I ran into this when I was resizing the disk.</p> |
| |
| <h3 id="disk-is-too-small-for-what-you-want-to-do">Disk is too small for what you want to do</h3> |
| |
| <p>Resizing an iso is pretty easy. You can use the same method you would if |
| you wanted to resize a partition on a regular hard drive.</p> |
| |
| <p>First, you need to resize the iso with qemu-image:</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qemu-img resize ubuntu-test.img +8G |
| </code></pre></div></div> |
| |
| <p>Now, you have a disk that has 8 GB of free space at the end of the disk. |
| You need to resize the partitions to use this free space. To do this, I |
| suggest using gparted just like you would for a real hard drive.</p> |
| |
| <p>You can download a gparted ISO from <a href="http://gparted.org/livecd.php">http://gparted.org/livecd.php</a>. |
| Once you download the ISO, you can boot it with qemu the same way as we |
| booted the installation CD. Then, once its booted you can select the |
| disk you want to modify and follow the howto |
| (<a href="http://gparted.org/display-doc.php%3Fname%3Dhelp-manual">http://gparted.org/display-doc.php%3Fname%3Dhelp-manual</a>).</p> |
| </description> |
| <pubDate>Fri, 13 Jan 2017 00:00:00 -0800</pubDate> |
| <link>http://localhost:4000/tools/2017/01/13/gem5-fs.html</link> |
| <guid isPermaLink="true">http://localhost:4000/tools/2017/01/13/gem5-fs.html</guid> |
| |
| |
| <category>tools</category> |
| |
| </item> |
| |
| <item> |
| <title>Creating disk images for gem5</title> |
| <description><p>When using gem5 in full-system mode, you have to have a disk image with |
| the operating system and all of your data on it. This is just like |
| having a physical disk in a physical machine. In this post, I’m going to |
| walk through how to create a new disk and install a (semi-)current |
| version of Ubuntu on the disk. By the end of this post, you should be |
| able to create your own disk with whatever extra data and applications |
| you want.</p> |
| |
| <p>This post assumes that you have already checked out a version of gem5 |
| and can build and run gem5 in full-system mode. The <a href="http://www.lowepower.com/jason/learning_gem5/">Learning |
| gem5</a> documentation is a |
| good place to start. This post uses the x86 ISA for gem5, and is mostly |
| applicable to other ISAs. More details on setting up ARM systems can be |
| found on the gem5 wiki: |
| <a href="http://gem5.org/Ubuntu_Disk_Image_for_ARM_Full_System">http://gem5.org/Ubuntu_Disk_Image_for_ARM_Full_System</a>.</p> |
| |
| <p>In the future, this post may be folded into <a href="http://www.lowepower.com/jason/learning_gem5/">Learning |
| gem5</a>.</p> |
| |
| <h2 id="creating-a-blank-disk-image">Creating a blank disk image</h2> |
| |
| <p>The first step is to create a blank disk image (usually a .img file). |
| Luckily, the gem5 developers have already made this easy with a tool |
| that is simple to use. To create a blank disk image, which is formatted |
| with ext2 by default, simply run the following.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; util/gem5img.py init ubuntu-14.04.img 4096 |
| </code></pre></div></div> |
| |
| <p>This command creates a new image, called “ubuntu-14.04.img” that is 4096 |
| MB. This command may require you to enter the sudo password, if you |
| don’t have permission to create loopback devices. <em>You should never run |
| commands as the root user that you don’t understand! You should look at |
| the file util/gem5img.py and ensure that it isn’t going to do anything |
| malicious to your computer!</em></p> |
| |
| <p>We will be using util/gem5img.py heavily throughout this post, so you |
| may want to understand it better. If you just run <code class="highlighter-rouge">util/gem5img.py</code>, it |
| displays all of the possible commands.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Usage: %s [command] &lt;command arguments&gt; |
| where [command] is one of |
| init: Create an image with an empty file system. |
| mount: Mount the first partition in the disk image. |
| umount: Unmount the first partition in the disk image. |
| new: File creation part of "init". |
| partition: Partition part of "init". |
| format: Formatting part of "init". |
| Watch for orphaned loopback devices and delete them with |
| losetup -d. Mounted images will belong to root, so you may need |
| to use sudo to modify their contents |
| </code></pre></div></div> |
| |
| <h2 id="copying-root-files-to-the-disk">Copying root files to the disk</h2> |
| |
| <p>Now that we have created a blank disk, we need to populate it with all |
| of the OS files. Ubuntu distributes a set of files explicitly for this |
| purpose. You can find the <a href="https://wiki.ubuntu.com/Core">Ubuntu core</a> |
| distribution for 14.04 at |
| <a href="http://cdimage.ubuntu.com/ubuntu-core/releases/14.04/release/">http://cdimage.ubuntu.com/ubuntu-core/releases/14.04/release/</a> Since I |
| am simulating an x86 machine, I chose the file |
| <code class="highlighter-rouge">ubuntu-core-14.04-core-amd64.tar.gz</code>. Download whatever image is |
| appropriate for the system you are simulating.</p> |
| |
| <p>Next, we need to mount the blank disk and copy all of the files onto the |
| disk.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir mnt |
| ../../util/gem5img.py mount ubuntu-14.04.img mnt |
| wget http://cdimage.ubuntu.com/ubuntu-core/releases/14.04/release/ubuntu-core-14.04-core-amd64.tar.gz |
| sudo tar xzvf ubuntu-core-14.04-core-amd64.tar.gz -C mnt |
| </code></pre></div></div> |
| |
| <p>The next step is to copy a few required files from your working system |
| onto the disk so we can chroot into the new disk. We need to copy |
| <code class="highlighter-rouge">/etc/resolv.conf</code> onto the new disk.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo cp /etc/resolv.conf mnt/etc/ |
| </code></pre></div></div> |
| |
| <h2 id="setting-up-gem5-specific-files">Setting up gem5-specific files</h2> |
| |
| <h3 id="create-a-serial-terminal">Create a serial terminal</h3> |
| |
| <p>By default, gem5 uses the serial port to allow communication from the |
| host system to the simulated system. To use this, we need to create a |
| serial tty. Since Ubuntu uses upstart to control the init process, we |
| need to add a file to /etc/init which will initialize our terminal. |
| Also, in this file, we will add some code to detect if there was a |
| script passed to the simulated system. If there is a script, we will |
| execute the script instead of creating a terminal.</p> |
| |
| <p>Put the following code into a file called /etc/init/tty-gem5.conf</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ttyS0 - getty |
| # |
| # This service maintains a getty on ttyS0 from the point the system is |
| # started until it is shut down again, unless there is a script passed to gem5. |
| # If there is a script, the script is executed then simulation is stopped. |
| |
| start on stopped rc RUNLEVEL=[12345] |
| stop on runlevel [!12345] |
| |
| console owner |
| respawn |
| script |
| # Create the serial tty if it doesn't already exist |
| if [ ! -c /dev/ttyS0 ] |
| then |
| mknod /dev/ttyS0 -m 660 /dev/ttyS0 c 4 64 |
| fi |
| |
| # Try to read in the script from the host system |
| /sbin/m5 readfile &gt; /tmp/script |
| chmod 755 /tmp/script |
| if [ -s /tmp/script ] |
| then |
| # If there is a script, execute the script and then exit the simulation |
| exec su root -c '/tmp/script' # gives script full privileges as root user in multi-user mode |
| /sbin/m5 exit |
| else |
| # If there is no script, login the root user and drop to a console |
| # Use m5term to connect to this console |
| exec /sbin/getty --autologin root -8 38400 ttyS0 |
| fi |
| end script |
| </code></pre></div></div> |
| |
| <h3 id="setup-localhost">Setup localhost</h3> |
| |
| <p>We also need to set up the localhost loopback device if we are going to |
| use any applications that use it. To do this, we need to add the |
| following to the <code class="highlighter-rouge">/etc/hosts</code> file.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>127.0.0.1 localhost |
| ::1 localhost ip6-localhost ip6-loopback |
| fe00::0 ip6-localnet |
| ff00::0 ip6-mcastprefix |
| ff02::1 ip6-allnodes |
| ff02::2 ip6-allrouters |
| ff02::3 ip6-allhosts |
| </code></pre></div></div> |
| |
| <h3 id="update-fstab">Update fstab</h3> |
| |
| <p>Next, we need to create an entry in <code class="highlighter-rouge">/etc/fstab</code> for each partition we |
| want to be able to access from the simulated system. Only one partition |
| is absolutely required (<code class="highlighter-rouge">/</code>); however, you may want to add additional |
| partitions, like a swap partition.</p> |
| |
| <p>The following should appear in the file <code class="highlighter-rouge">/etc/fstab</code>.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># /etc/fstab: static file system information. |
| # |
| # Use 'blkid' to print the universally unique identifier for a |
| # device; this may be used with UUID= as a more robust way to name devices |
| # that works even if disks are added and removed. See fstab(5). |
| # |
| # &lt;file system&gt; &lt;mount point&gt; &lt;type&gt; &lt;options&gt; &lt;dump&gt; &lt;pass&gt; |
| /dev/hda1 / ext3 noatime 0 1 |
| </code></pre></div></div> |
| |
| <h3 id="copy-the-m5-binary-to-the-disk">Copy the <code class="highlighter-rouge">m5</code> binary to the disk</h3> |
| |
| <p>gem5 comes with an extra binary application that executes |
| pseudo-instructions to allow the simulated system to interact with the |
| host system. To build this binary, run <code class="highlighter-rouge">make -f Makefile.&lt;isa&gt;</code> in the |
| <code class="highlighter-rouge">gem5/m5</code> directory, where <code class="highlighter-rouge">&lt;isa&gt;</code> is the ISA that you are simulating |
| (e.g., x86). After this, you should have an <code class="highlighter-rouge">m5</code> binary file. Copy this |
| file to /sbin on your newly created disk.</p> |
| |
| <p>After updating the disk with all of the gem5-specific files, unless you |
| are going on to add more applications or copying additional files, |
| unmount the disk image.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; util/gem5img.py umount mnt |
| </code></pre></div></div> |
| |
| <h2 id="install-new-applications">Install new applications</h2> |
| |
| <p>The easiest way to install new applications on to your disk, is to use |
| <code class="highlighter-rouge">chroot</code>. This program logically changes the root directory (“/”) to a |
| different directory, mnt in this case. Before you can change the root, |
| you first have to set up the special directories in your new root. To do |
| this, we use <code class="highlighter-rouge">mount -o bind</code>.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; sudo /bin/mount -o bind /sys mnt/sys |
| &gt; sudo /bin/mount -o bind /dev mnt/dev |
| &gt; sudo /bin/mount -o bind /proc mnt/proc |
| </code></pre></div></div> |
| |
| <p>After binding those directories, you can now <code class="highlighter-rouge">chroot</code>:</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; sudo /usr/sbin/chroot mnt /bin/bash |
| </code></pre></div></div> |
| |
| <p>At this point you will see a root prompt and you will be in the <code class="highlighter-rouge">/</code> |
| directory of your new disk.</p> |
| |
| <p>You should update your repository information.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; apt-get update |
| </code></pre></div></div> |
| |
| <p>You may want to add the universe repositories to your list with the |
| following commands. Note: The first command is require in 14.04.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; apt-get install software-properties-common |
| &gt; add-apt-repository universe |
| &gt; apt-get update |
| </code></pre></div></div> |
| |
| <p>Now, you are able to install any applications you could install on a |
| native Ubuntu machine via <code class="highlighter-rouge">apt-get</code>.</p> |
| |
| <p>Remember, after you exit you need to unmount all of the directories we |
| used bind on.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; sudo /bin/umount mnt/sys |
| &gt; sudo /bin/umount mnt/proc |
| &gt; sudo /bin/umount mnt/dev |
| </code></pre></div></div> |
| </description> |
| <pubDate>Tue, 24 Nov 2015 00:00:00 -0800</pubDate> |
| <link>http://localhost:4000/jekyll/update/2015/11/24/gem5-disks.html</link> |
| <guid isPermaLink="true">http://localhost:4000/jekyll/update/2015/11/24/gem5-disks.html</guid> |
| |
| |
| <category>jekyll</category> |
| |
| <category>update</category> |
| |
| </item> |
| |
| <item> |
| <title>gem5 Horrors and what we can do about it</title> |
| <description><p><img src="/assets/img/gem5-horrors.png" alt="image" /></p> |
| |
| <p>This post is a post which mostly follows the talk that I am giving at |
| the <a href="http://gem5.org/User_workshop_2015">gem5 Users Workshop</a>. This post |
| contains some more details on problems that I skipped in my talk and |
| some references that I was not able to include in a presentation. You |
| can view my presentation on Google Drive |
| <a href="https://docs.google.com/presentation/d/1QGA5UVaVJkkMITF2TXCY_KlwmfWef1KBzfDP6ocbj7I/pub?start=false&amp;loop=false&amp;delayms=3000">here</a>.</p> |
| |
| <h2 id="i-3-gem5">I &lt;3 gem5</h2> |
| |
| <p>Before I get into the negative aspects of <a href="http://gem5.org">gem5</a>, I |
| first want to point out that it is a great tool. gem5 is used by a large |
| number of computer architecture researchers, both in industry and in |
| academia. Here at Wisconsin, and at other universities, gem5 is used in |
| the classroom to teach students about computer architecture and how to |
| do computer architecture research.</p> |
| |
| <p>gem5 is, without a doubt, the most full-featured architecture simulator. |
| It leverages execute-at-execute semantics for high-fidelity |
| cycle-by-cycle simulation. gem5 can boot a mostly unmodified Linux |
| image. It has multiple different CPU and memory models. gem5 has a |
| modular design which makes it simple to embed and extend. This has |
| allowed gem5 to be used a large number of projects (see |
| <a href="http://gem5.org/Projects">http://gem5.org/Projects</a> and <a href="http://gem5.org/Publications">http://gem5.org/Publications</a>).</p> |
| |
| <p>However, as great as gem5 is, its growth has not been without pain. Now |
| that gem5 is nearing 15 years of development (if you include the |
| original m5 and GEMS project from which gem5 was born), I believe it’s |
| time to look at some of its deficiencies and talk about what we can do |
| to mitigate them.</p> |
| |
| <h2 id="gem5-horrors">gem5 horrors</h2> |
| |
| <p>Below, I discuss a few specific pain points with that I and others have |
| experienced with gem5. However, before I get to that, I’d like to talk |
| about what I think the root of these issues are. gem5 has two main |
| problems</p> |
| |
| <p>1) There is no formal governance model. |
| 2) The gem5 developers do not think of the user first.</p> |
| |
| <p>Later, after I give some examples of specific problems, I will discuss |
| what I think can be done to fix these to issues.</p> |
| |
| <p>Next I discuss four specific “gem5 horrors” that either I have |
| personally experienced or I have talked to others who have experienced |
| them. These issues are deeper that just bugs, even if sometimes they can |
| be solved with simple changes. After describing each issue, I will also |
| quickly discuss a possible way to mitigate the problem.</p> |
| |
| <h3 id="horror-1-merges">Horror 1: Merges</h3> |
| |
| <p>There are a number of projects that build on top of gem5. In fact, I |
| would argue that this is the main use case for gem5. Everyone that I |
| know who uses gem5 for research, takes the mainline gem5 and builds |
| their own changes on top of it.</p> |
| |
| <p>The problem with this model, where people build on top of gem5, is that |
| when new features are added or bugs are fixed in the mainline, |
| downstream users have to consume these changes. If the downstream users |
| do a good job managing their patch queues, this should be a |
| straightforward thing to do. However, I have found that even when |
| careful development practices are followed, merging gem5 changes is |
| incredibly difficult.</p> |
| |
| <p>Below I discuss a few specific problems that I have run into when |
| merging new changes in gem5. I believe the problems can be summed up |
| with two high-level issued we currently have in gem5.</p> |
| |
| <p>1) There is no well-defined static API. The interface to different |
| modules is constantly in a state of flux. |
| 2) The regression suite we have in gem5 has poor coverage. There are |
| many features that users depend on that are not covered by the |
| regression tester.</p> |
| |
| <h4 id="merge-headache-1-pointless-code-changes">Merge headache #1: Pointless code changes</h4> |
| |
| <p>Examples from Ruby and Slicc and packet.</p> |
| |
| <h4 id="merge-headache-2-features-break-between-versions">Merge headache #2: Features break between versions</h4> |
| |
| <p>Ruby backing store, checkpointing</p> |
| |
| <h4 id="merge-headache-3-apis-are-a-moving-target">Merge headache #3: APIs are a moving target</h4> |
| |
| <p>Example with the minimal gem5 script.</p> |
| |
| <h4 id="how-to-mitigate">How to mitigate</h4> |
| |
| <p>I believe that there are two things we can can do as the gem5 |
| development community to make merging upstream changes much easier. |
| First, we need a stable set of APIs. Second, we need a robust testing |
| and regression structure. I discuss some specifics of these two |
| characteristics below.</p> |
| |
| <h4 id="stable-apis">Stable APIs</h4> |
| |
| <p>Today in gem5, it is just as easy to change widely used interfaces, like |
| the port interface, as it is to change the implementation of a rarely |
| used function. We need to change this. I think that we need to choose a |
| set of interfaces and make them stable. This is similar to how the Linux |
| kernel operates.</p> |
| |
| <p>Once we have chosen a set of stable interfaces, I’m not suggesting that |
| they never change, only that it should be more onerous to change stable |
| APIs than other things. Additionally, this has the added benefit that |
| “gem5-stable” can actually mean something. We can now have a stable |
| version, which has non-changing APIs, and a dev version that we can’t |
| necessarily count on to have constant APIs.</p> |
| |
| <p>I personally do not know what the API should be. I would like to see the |
| community come together and talk about what they see as important |
| interfaces. Then, once we find these interfaces, we can architect these |
| interfaces and hopefully make gem5 easier to use.</p> |
| |
| <h4 id="testing-structure">Testing structure</h4> |
| |
| <p>I do not think that this is a very controversial issue, but gem5 needs a |
| better regression structure. If all of the features that we used in |
| gem5-gpu had been part of the regression suite, then we would have had |
| many less problems.</p> |
| |
| <p>Again, I do not know exactly how to make the regression suite better, |
| but I do think a good idea would be to require new features, and bug |
| fixes, to include a unit-test or something like that. We really need a |
| softeare engineer to sit down and architect a new regression system. |
| This would be a great project for someone who is new to the gem5 |
| codebase.</p> |
| |
| <h3 id="horror-2-configuration-files">Horror 2: Configuration files</h3> |
| |
| <p>gem5 has an incredibly flexible configuration system. But with |
| flexibility often come complexity. In fact, I ran SLOCcount on the |
| configs directory and found there was more than 4000 lines of Python |
| code. According to the SLOCcount tool, this means there was 16 |
| person-months and a quarter of a million dollars worth of code here!</p> |
| |
| <p>All of this complexity causes a number of issues. In my talk, I touched |
| on the fact that the defaults are confusing, and in some cases |
| inconsistent.</p> |
| |
| <h4 id="how-to-mitigate-1">How to mitigate</h4> |
| |
| <p>Since the m5 and GEMS integration, I have noticed a trend that the |
| number of command line parameters has continued to grow significantly. |
| It seems that every time a new feature has been added, we have added |
| some new command line parameters as well. I think this is the wrong way |
| to do it.</p> |
| |
| <p>There is an amazing C++-Python wrapper in gem5. We should be taking |
| advantage of the scripting capabilities of Python.</p> |
| |
| <p>I have created a simple script that is under 30 lines of Python. I think |
| we need to encourage our users to script in Python instead of adding |
| more and more command line parameters. Which, in my experience, really |
| just leads to scripting in bash instead of in Python anyway.</p> |
| |
| <h3 id="horror-3-unexpected-results">Horror 3: Unexpected results</h3> |
| |
| <p>This was a very surprising error that I ran into while working on |
| creating a homework assignment for a graduate-level computer |
| architecture course. The point of the homework was to compare the |
| performance of instruction latency versus instruction throughput. I |
| wanted the students to take a particular instruction and change the |
| number of execution units, the latency, and how much the units were |
| pipelined. To do this, we looked at the divide instruction, since it is |
| a long latency instruction. Below is the code that we used:</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for (int i = 0; i &lt; N; i++) { |
| Y[i] = X[i] / alpha + Y[i]; |
| } |
| </code></pre></div></div> |
| |
| <p>In this code, every divide is totally independent from every other. |
| Therefore, we would expect that with he out-of-order CPU, that if the |
| divide is pipelined it the code will speedup by how much the divide unit |
| is pipelined.</p> |
| |
| <p>To test this, I looked at two different configurations, a 10 cycle |
| latency divide with <em>no</em> pipelining, and a 10 cycle latency divide that |
| is fully pipelined. Below is the data I found for ARM and x86. I only |
| changed the “obvious” options. Each functional unit has an option for |
| the execution latency and issue latency. If the issue latency is 1, then |
| the functional unit is fully pipelined. (Now this is a boolean flag.) |
| All of the data is relative to x86 with no pipelining.</p> |
| |
| <p>Configuration Latency Issue lat. x86 Perf ARM Perf |
| ————— ———– ———— ———- ————- |
| No Pipeline 10 cycles 10 cycles 1.0x 8.0x |
| Full Pipeline 10 cycles 1 cycle 1.0x 9.6x (1.2x)</p> |
| |
| <p>There are two very weird results in this data. First, when we fully |
| pipelined the divide unit, there was no performance change (at all!!) in |
| x86. Second, when running the exact same cod with ARM, there was a 8x |
| speedup compared to x86! I find it very hard to believe that the ARM ISA |
| is inherently better at divide than x86.</p> |
| |
| <h4 id="how-to-mitigate-2">How to mitigate</h4> |
| |
| <p>This is a much harder problem to mitigate than the others on this list. |
| Nilay Vaish has taken a step in the right directions with these two |
| patches on reviewboard <a href="http://reviews.gem5.org/r/2744/">http://reviews.gem5.org/r/2744/</a> and |
| <a href="http://reviews.gem5.org/r/2744/">http://reviews.gem5.org/r/2744/</a>, which have been incorporated in gem5.</p> |
| |
| <p>The underlying problem is that the implementation for ARM and x86 are |
| totally distinct. It is not clear to me what the right way to unify the |
| ISA implementation are. As a stop-gap, developers who are working on |
| implementing x86 features, need to make sure that they perform similarly |
| to ARM features. Maybe a solution is to have a single set of C programs |
| which exercise all ISAs and compare the performance across ISAs. There |
| should be some performance differences, but not an order of magnitude.</p> |
| |
| <h3 id="horror-4-lack-of-new-user-support">Horror 4: Lack of new-user support</h3> |
| |
| <h4 id="how-to-mitigate-3">How to mitigate</h4> |
| |
| <p>What I think we need to do is to create a “gem5 for Dummies” book or a |
| “Learning gem5” book. This book would be similar to Learning Python or |
| Learning Mercurial. The book would be open source for anyone to |
| contribute to. In fact, it should be required to update the book if a |
| developer makes an API-breaking change.</p> |
| |
| <p>An initial implementation of this book, which currently only includes |
| about a chapter of “getting started” and is in fact already out of date |
| can be found here: |
| <a href="http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/gem5-tutorial/index.html">gem5-tutorial</a>. |
| I began working on this in conjunction with the graduate computer |
| architecture class at Wisconsin, so it may currently have some |
| Wisconsin-specific text. I hope to continue working on this in my |
| <em>copious free time</em>.</p> |
| |
| <p>There are many other horrors that other people experience as well. Here |
| I only discussed some of the horrors that I have heard people |
| discussing. The purpose in presenting these horrors is not to say that |
| gem5 is a bad simulator! The purpose is to highlight how there are |
| currently issues that need to be addressed by the gem5 development |
| community.</p> |
| |
| <h2 id="what-can-we-do-about-it">What can we do about it?</h2> |
| |
| <p>A lot of the problems that I have discussed above come down to poor |
| software engineering. And yes, we are architects, not software |
| engineers, and there are a lot of things we could do better if we just |
| focused on software engineering. However, I do not think that this is |
| the underlying issue.</p> |
| |
| <p>I believe these four horror stem from two systemic problems in the gem5 |
| development community.</p> |
| |
| <p>1) There is no formal governance model. |
| 2) The gem5 developers do not think of the user first.</p> |
| |
| <p>I believe that if we start to solve these high-level issues, gem5 will |
| be a much better tool for everyone. Next, I discuss one possible way to |
| address these two points.</p> |
| |
| <h2 id="gem5-foundation">gem5 Foundation</h2> |
| |
| <p>First, I want to say that I do not believe this is the only way, or the |
| right way, to move gem5 forward. This is one possibility that I believe |
| will make gem5 a better tool. I hope that this is a place to begin the |
| discussion and I am sure that others in our community can come up with |
| even better suggestions that this!</p> |
| |
| <p><em>I think we should create a gem5 Foundation.</em> The gem5 Foundation will |
| be the center for the gem5 community. It will be a formal way for the |
| community to set goals and push gem5 forward.</p> |
| |
| <p>There are two main things I think the gem5 Foundation can help us with. |
| It can set up a formal governance structure and be a place for outside |
| interests to contribute money towards making gem5 better for everyone.</p> |
| |
| <h3 id="formalizing-a-governance-structure">Formalizing a governance structure</h3> |
| |
| <p>First, we need a governance structure. This is a document which defines |
| how decisions are made in the community, what matters to the community, |
| how to contribute to the community, etc.</p> |
| |
| <p>There is a lot of documentation on how to write governance models and |
| what they are. <a href="http://oss-watch.ac.uk/">OSS-Watch</a> is a great source |
| for this. Here is a link to a definition of a governance model, which |
| does a much better job that I can explaining it. |
| <a href="http://oss-watch.ac.uk/resources/governancemodels">http://oss-watch.ac.uk/resources/governancemodels</a> Additionally, here |
| is a link to an example governance model from an academic open-source |
| project: |
| <a href="http://www.taverna.org.uk/about/legal-stuff/taverna-governance-model/">http://www.taverna.org.uk/about/legal-stuff/taverna-governance-model/</a></p> |
| |
| <h3 id="money-money-money">Money, Money, Money</h3> |
| |
| <p>What I think the main solution to all of these problems is to pay |
| software developers <em>not computer architects!</em> to solve some of these |
| problems. Already, within ARM and AMD there are a number of people who |
| get paid to work on gem5. However, these companies do not have gem5’s |
| best interests as their key focus. Their focus is what ARM and AMD find |
| interesting.</p> |
| |
| <p>So, I think that if we have something like the gem5 Foundation, these |
| companies and academia, can donate money towards coding things that are |
| good for the community as a whole. The gem5 Foundation can hire software |
| engineers to work on the parts of gem5 that grad students and |
| researchers do not want to do. If you look at other academic |
| communities, they often hire non researchers to do the “grunt work”. |
| Overall, I think this is a good idea for computer architects too, and |
| specifically for gem5.</p> |
| |
| <p>I recognize that this may be a crazy idea. I would love to hear what |
| others think. I am sure we will have some interesting discussion at the |
| gem5 workshop, and hopefully I will write another post with what other |
| people thought! Feel free to leave comments below.</p> |
| </description> |
| <pubDate>Tue, 09 Jun 2015 00:00:00 -0700</pubDate> |
| <link>http://localhost:4000/2015/06/09/gem5-horrors.html</link> |
| <guid isPermaLink="true">http://localhost:4000/2015/06/09/gem5-horrors.html</guid> |
| |
| |
| </item> |
| |
| </channel> |
| </rss> |