| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="x-ua-compatible" content="ie=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <title>gem5</title> |
| |
| <!-- SITE FAVICON --> |
| <link rel="shortcut icon" type="image/gif" href="/assets/img/gem5ColorVert.gif"/> |
| |
| <link rel="canonical" href="http://localhost:4000/2018/06/01/gem5-spectre.html"> |
| <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,300,700,800,600' rel='stylesheet' type='text/css'> |
| <link href='https://fonts.googleapis.com/css?family=Muli:400,300' rel='stylesheet' type='text/css'> |
| |
| <!-- FAVICON --> |
| <link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css"> |
| |
| <!-- BOOTSTRAP --> |
| <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous"> |
| |
| <!-- CUSTOM CSS --> |
| <link rel="stylesheet" href="/css/main.css"> |
| </head> |
| |
| |
| <body> |
| <nav class="navbar navbar-expand-md navbar-light bg-light"> |
| <a class="navbar-brand" href="/"> |
| <img src="/assets/img/gem5ColorLong.gif" alt="gem5" height=45px> |
| </a> |
| <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavDropdown" aria-controls="navbarNavDropdown" aria-expanded="false" aria-label="Toggle navigation"> |
| <span class="navbar-toggler-icon"></span> |
| </button> |
| <div class="collapse navbar-collapse" id="navbarNavDropdown"> |
| <ul class="navbar-nav ml-auto"> |
| <li class="nav-item "> |
| <a class="nav-link" href="/">Home</a> |
| </li> |
| |
| <li class="nav-item dropdown "> |
| <a class="nav-link dropdown-toggle" href="/about" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| About |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> |
| <a class="dropdown-item" href="/about">About</a> |
| <a class="dropdown-item" href="/publications">Publications</a> |
| <a class="dropdown-item" href="/governance">Governance</a> |
| </div> |
| </li> |
| |
| <li class="nav-item dropdown "> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Documentation |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink"> |
| <!-- Pull navigation from _data/documentation.yml --> |
| |
| <a class="dropdown-item" href="/introduction">Introduction</a> |
| |
| <a class="dropdown-item" href="/building">Getting Started</a> |
| |
| <a class="dropdown-item" href="/environment">Modifying/Extending</a> |
| |
| <a class="dropdown-item" href="/MSIintro">Modeling Cache Coherence with Ruby</a> |
| |
| </div> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="/contributing">Contributing</a> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="/blog">Blog</a> |
| </li> |
| |
| <li class="nav-item "> |
| <a class="nav-link" href="/search">Search</a> |
| </li> |
| </ul> |
| </div> |
| </nav> |
| |
| <main> |
| <br><br> |
| <div class="container post"> |
| <h1>Visualizing Spectre with gem5</h1> |
| <time>Jun 1, 2018 • Jason Lowe-Power</time> |
| <hr> |
| <p><a href="https://meltdownattack.com/">Spectre and Meltdown</a> took much of our |
| community by surprise. I personally found these attacks fascinating |
| because they didn’t rely on a <em>bug</em> in any particular hardware |
| implementation, but leveraged undefined behavior. Specifically, Spectre |
| and Meltdown can exfiltrate potentially secret memory data by detecting |
| the effects of speculative instructions <em>that are later squashed</em>.</p> |
| |
| <p>Very cool!</p> |
| |
| <p>Out of order processors are very complex. It would make it easier to |
| understand exactly what causes speculation attacks like Spectre and |
| Meltdown if we had a way to <em>visualize</em> the attacks. Luckily, gem5 |
| already has a way to view the details of it’s out of order CPU’s |
| pipeline.</p> |
| |
| <p><img src="/assets/img/o3-example.png" alt="o3 pipeline view example" /></p> |
| |
| <p>The image above was created using the O3 pipeline viewer that is |
| included with gem5. In this post, I’ll explain how to use the O3 |
| pipeline viewer and how to generate images like the above. There is also |
| a new project which makes it easier to navigate large pipeline traces |
| and it is useful for comparing different pipeline designs: |
| <a href="https://github.com/shioyadan/Konata">Konata</a> created by Ryota Shioya. |
| Ryota gave a presentation on Konata at a recent <a href="http://learning.gem5.org/tutorial/index.html">Learning gem5 |
| tutorial</a>. You can find |
| the pdf of his presentation |
| <a href="http://learning.gem5.org/tutorial/presentations/vis-o3-gem5.pdf">here</a>. |
| Konata is a cool tool that’s written in javascript and Ryota describes |
| it as “Google maps for an out of order pipeline”.</p> |
| |
| <h2 id="running-spectre">Running Spectre</h2> |
| |
| <p>The first step to visualizing what is going on in the pipeline during a |
| Spectre attack is getting proof of concept exploit code. I used the code |
| that was posted to a github gist by Erik August soon after the attack |
| was announced. You can get that code here: |
| <a href="https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6">https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6</a>.</p> |
| |
| <p>First, you need to compile the proof of concept code on your native |
| machine (note: I’ll be using x86 for all of my examples).</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc spectre.c -o spectre -static |
| </code></pre></div></div> |
| |
| <p>I used gcc 7.2 (the default on Ubuntu 17.10) for my tests, and you may |
| want to do the same. <a href="#effects-of-compilers">Below</a> I discuss the |
| effects different compilers have on the Specre attack. For instance, if |
| you use clang instead you may not be able to reproduce the Spectre |
| attack in gem5.</p> |
| |
| <p>My native machine is still vulnerable to Spectre so when I run the |
| binary generated above, I get the following output.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Reading 40 bytes: |
| Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x50=’P’ score=9 (second best: 0x06 score=2) |
| Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2 |
| </code></pre></div></div> |
| |
| <h3 id="running-spectre-in-gem5">Running Spectre in gem5</h3> |
| |
| <p>To find out if gem5’s out of order CPU implementation is vulnerable to |
| Spectre, we need to run the code in gem5. The simplest and fastest way |
| to do this is by running in gem5’s syscall-emulation (SE) mode. In SE |
| mode we won’t be modeling an OS or any user-mode to kernel-mode |
| interaction, but this okay for Spectre since this proof of concept code |
| is all in user-mode. If we were investigating Metldown, we would have to |
| use full-system (FS) mode since Meltdown specifically allows user-mode |
| processes to read data that should only be accessible in kernel mode.</p> |
| |
| <p>So, when running something in gem5, the first step is to create a Python |
| runscript since this is <a href="http://learning.gem5.org/book/part1/simple_config.html">the “interface” to |
| gem5</a>. For this |
| example, what we need is a system with one CPU, an L1 cache, and memory. |
| For simplicity, I’m going to modify one of the existing script, |
| specifically the <code class="highlighter-rouge">two_level.py</code> script from the <a href="http://learning.gem5.org/">Learning gem5 |
| book</a>.</p> |
| |
| <p>In the file <code class="highlighter-rouge">gem5/configs/learning_gem5/part1/two_level.py</code>, I simply |
| changed the CPU from <code class="highlighter-rouge">TimingSimpleCPU()</code> to |
| <code class="highlighter-rouge">DerivO3CPU(branchPred=LTAGE())</code>. I also set the O3CPU to use the LTAGE |
| branch predictor instead of the default tournament branch predictor. |
| It’s important to use the LTAGE branch predictor as better branch |
| predictors actually make Spectre easier to exploit as discussed further |
| <a href="#effects-of-branch-predictor">below</a>.</p> |
| |
| <p>Now, we simply need to build gem5 and run it.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>scons -j8 build/X86/gem5.opt |
| |
| build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre |
| </code></pre></div></div> |
| |
| <p>And, the output that I get is the following, just like above when I ran |
| the <code class="highlighter-rouge">spectre</code> natively.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem5 Simulator System. http://gem5.org |
| gem5 is copyrighted software; use the --copyright option for details. |
| |
| gem5 compiled May 10 2018 09:40:08 |
| gem5 started May 24 2018 11:21:16 |
| gem5 executing on palisade, pid 27173 |
| command line: build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre |
| |
| Global frequency set at 1000000000000 ticks per second |
| warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes) |
| 0: system.remote_gdb: listening for remote gdb on port 7000 |
| Beginning simulation! |
| info: Entering event queue @ 0. Starting simulation... |
| warn: readlink() called on '/proc/self/exe' may yield unexpected results in various settings. |
| Returning '/home/jlp/Code/gem5/spectre-vis/spectre' |
| info: Increasing stack size by one page. |
| warn: ignoring syscall access(...) |
| Reading 40 bytes: tput cols |
| Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x4F=’O’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2 |
| Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2 |
| Exiting @ tick 113568969000 because exiting with last active thread context |
| </code></pre></div></div> |
| |
| <h2 id="visualizing-the-out-of-order-pipeline">Visualizing the out of order pipeline</h2> |
| |
| <p>To generate pipeline visualizations, we first need to generate a trace |
| file of all of the instructions executed by the out of order CPU. To |
| create this trace, we can use the <code class="highlighter-rouge">O3PipeView</code> debug flag.</p> |
| |
| <p>Now, the trace for the O3 CPU can be <em>very</em> large, up to many GBs. When |
| creating this trace, you need to be careful to create the smallest trace |
| possible. Also, it’s important to dump the trace to a file and not to |
| <code class="highlighter-rouge">stdout</code>, which is the default when using debug flags. You can redirect |
| the trace to a file by using the <code class="highlighter-rouge">--debug-file</code> option to gem5.</p> |
| |
| <p>To create the trace file, I used the following methodology:</p> |
| |
| <ol> |
| <li>Start running spectre in gem5, then hit ctrl-c after the first |
| couple of letters. At this point, I wrote down the tick which gem5 |
| exited (13062347000 for me).</li> |
| <li>Run gem5 with the debug flag <code class="highlighter-rouge">O3PipeView</code> enabled.</li> |
| <li>Watch the output and kill gem5 with ctrl-c after two more letters |
| appeared than in step 1.</li> |
| </ol> |
| |
| <p>To generate the trace, I ran the following command. Note: you may have a |
| different value for when to start the debugging trace. Also note: when |
| producing the trace gem5 will run <em>much</em> slower.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build/X86/gem5.opt --debug-flags=O3PipeView --debug-file=pipeview.txt --debug-start=13062347000 configs/learning_gem5/part1/two_level.py spectre |
| </code></pre></div></div> |
| |
| <p>My tracefile (<code class="highlighter-rouge">pipeview.txt</code>) was 600 MB for catching just two letters |
| in the output.</p> |
| |
| <p>Now, we can process this file to generate the visualization with a |
| script: <code class="highlighter-rouge">util/o3-pipeview.py</code>. This script requires the path to the file |
| that contains the output generated with the <code class="highlighter-rouge">O3PipeView</code> debug flag. |
| Above, we put the output into the file <code class="highlighter-rouge">pipeview.txt</code>, and this file was |
| created in the default output directory of gem5 (<code class="highlighter-rouge">m5out/</code>).</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>util/o3-pipeview.py --store_completions m5out/pipeview.txt --color -w 150 |
| </code></pre></div></div> |
| |
| <p>In the above command, I wanted to see when the stores completed |
| (<code class="highlighter-rouge">--store_completions</code>) and specified to use color (<code class="highlighter-rouge">--color</code>) in the |
| output and use a width of 150 characters (<code class="highlighter-rouge">-w 150</code>). Processing a large |
| file like this one of 600 MB may take a few minutes. The output will be |
| in a file called <code class="highlighter-rouge">o3-pipeview.out</code> in the current working directory.</p> |
| |
| <p>You can view this file with <code class="highlighter-rouge">less -r o3-pipeview.out</code>. You may want to |
| use the <code class="highlighter-rouge">-S</code> option with less if your terminal is less than 150 |
| characters wide (or whatever width value you used). Below is a |
| screenshot of the top of my trace.</p> |
| |
| <h3 id="understanding-the-o3-pipeline-viewer">Understanding the O3 pipeline viewer</h3> |
| |
| <p><img src="/assets/img/o3-example-annotated.png" alt="o3 pipeline view example" /></p> |
| |
| <p>The above image details how to interpret the output from the pipeline |
| viewer. Each <code class="highlighter-rouge">.</code> or <code class="highlighter-rouge">=</code> represents one cycle of time, which moves from |
| left to right. The “tick” column shows the tick of the leftmost <code class="highlighter-rouge">.</code> or |
| <code class="highlighter-rouge">=</code>. <code class="highlighter-rouge">=</code> is used to mark the instructions that were later squashed. The |
| address of the instruction (and the micro-op number) as well as the |
| disassembly is also shown. The sequence number can be ignored as it is |
| always monotonically increasing and is the total order of every dynamic |
| instruction. Finally, each stage of the O3 pipeline is shown with a |
| different letter and color.</p> |
| |
| <h2 id="digging-deeper-into-spectre">Digging deeper into Spectre</h2> |
| |
| <p>First, let’s examine the actual instructions that are executed during |
| the Spectre attack. The vulnerability is in the <code class="highlighter-rouge">victim_function</code> in |
| <code class="highlighter-rouge">spectre.c</code>.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| if (x < array1_size) { |
| temp &= array2[array1[x] * 512]; |
| } |
| } |
| </code></pre></div></div> |
| |
| <p>When this is compiled and then dumped with <code class="highlighter-rouge">objdump</code>, we get the |
| following instructions that will be executed. Your code my be slightly |
| different, especially the exact addresses of each instruction, depending |
| on the version of the compiler and other system-specific configurations.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># NOTE: the movzbl below is MOVZX_B_R_M in gem5. |
| # it is implemented with the following microcode. |
| # ld t1, seg, sib, disp, dataSize=1 |
| # zexti reg, t1, 7 |
| # |
| 000000000040105e <victim_function>: |
| 40105e: 55 push %rbp |
| 40105f: 48 89 e5 mov %rsp,%rbp |
| 401062: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| 401066: 8b 05 14 f0 2b 00 mov 0x2bf014(%rip),%eax # 6c0080 <array1_size> load array1_size (first time is always a miss) |
| 40106c: 89 c0 mov %eax,%eax |
| 40106e: 48 3b 45 f8 cmp -0x8(%rbp),%rax # if (x < array1_size) rax is array1_size, -8(%rbp) is x |
| 401072: 76 2b jbe 40109f <victim_function+0x41> # if (x < array1_size) |
| 401074: 48 8b 45 f8 mov -0x8(%rbp),%rax # load x from the stack into rax |
| 401078: 48 05 a0 00 6c 00 add $0x6c00a0,%rax # calculate array1 offset (x+array1) |
| 40107e: 0f b6 00 movzbl (%rax),%eax # load array1[x] |
| 401081: 0f b6 c0 movzbl %al,%eax # zero extend to 32 bits |
| 401084: c1 e0 09 shl $0x9,%eax # multiply by 512 |
| 401087: 48 98 cltq # sign-extend eax |
| 401089: 0f b6 90 80 1d 6c 00 movzbl 0x6c1d80(%rax),%edx # load array2[array1[x]*512] **** This is the magic! |
| 401090: 0f b6 05 e9 0c 2e 00 movzbl 0x2e0ce9(%rip),%eax # 6e1d80 <temp> Load temp. |
| 401097: 21 d0 and %edx,%eax |
| 401099: 88 05 e1 0c 2e 00 mov %al,0x2e0ce1(%rip) # 6e1d80 <temp> |
| 40109f: 5d pop %rbp |
| 4010a0: c3 retq |
| </code></pre></div></div> |
| |
| <p>Now, we can search for the instruction that we care about in the trace. |
| In this case, we want to find a time where the <code class="highlighter-rouge">movzbl</code> at address |
| <code class="highlighter-rouge">0x401089</code> is executed speculatively. When searching through the |
| pipeline viewer (use <code class="highlighter-rouge">\</code> in less), we’re looking for a time where the |
| load completes for the instruction at <code class="highlighter-rouge">0x401089</code> and it is later |
| squashed (surrounded by <code class="highlighter-rouge">=</code>). An example is shown below.</p> |
| |
| <p><img src="/assets/img/o3-spectre-annotated.png" alt="annotated O3 pipeline view of |
| spectre" /></p> |
| |
| <p>The image above is from my presentation at <a href="http://caslab.csl.yale.edu/workshops/hasp2018/">Hardware and Architectural |
| Support for Security and Privacy (HASP) |
| 2018</a>.</p> |
| |
| <p>What we see in this image is that the instruction at <code class="highlighter-rouge">0x401066</code> causes a |
| cache miss (there is a long time between when the load is issued and the |
| data is returned from memory). Since the load of <code class="highlighter-rouge">array1_size</code> was a |
| cache miss, the jump at <code class="highlighter-rouge">0x401072</code> is speculated to be <em>not</em> taken |
| (incorrectly). This causes the following instructions to be executed |
| speculatively, and, eventually, squashed.</p> |
| |
| <p>The key thing in this trace that <em>is</em> the Spectre vulnerability is that |
| the load for the instruction at <code class="highlighter-rouge">0x40107e</code>, which loads secret data |
| happens during the mis-speculated instructions. Then, this data is |
| loaded into the registers and operated on (instruction <code class="highlighter-rouge">0x401084</code>). |
| Finally, the load at address <code class="highlighter-rouge">0x401089</code> is executed and loads the value |
| from memory <em>that is dependent on the secret data loaded previously</em>. |
| Thus, we can later probe the cache to retrieve the secret data.</p> |
| |
| <h3 id="effects-of-compilers">Effects of compilers</h3> |
| |
| <p>As previously mentioned, the specific compiler version and compiler |
| options have a significant effect on the attack. Below are two traces, |
| one from GCC 7.2 and one from clang 4.0.</p> |
| |
| <h4 id="gcc-72">GCC 7.2</h4> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| 400b2d: 55 push %rbp |
| 400b2e: 48 89 e5 mov %rsp,%rbp |
| 400b31: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| if (x < array1_size) { |
| 400b35: 8b 05 c5 c5 2c 00 mov 0x2cc5c5(%rip),%eax # 6cd100 <array1_size> |
| 400b3b: 89 c0 mov %eax,%eax |
| 400b3d: 48 39 45 f8 cmp %rax,-0x8(%rbp) |
| 400b41: 73 34 jae 400b77 <victim_function+0x4a> |
| temp &= array2[array1[x] * 512]; |
| 400b43: 48 8d 15 d6 c5 2c 00 lea 0x2cc5d6(%rip),%rdx # 6cd120 <array1> |
| 400b4a: 48 8b 45 f8 mov -0x8(%rbp),%rax |
| 400b4e: 48 01 d0 add %rdx,%rax |
| 400b51: 0f b6 00 movzbl (%rax),%eax |
| 400b54: 0f b6 c0 movzbl %al,%eax |
| 400b57: c1 e0 09 shl $0x9,%eax |
| 400b5a: 48 63 d0 movslq %eax,%rdx |
| 400b5d: 48 8d 05 9c f6 2c 00 lea 0x2cf69c(%rip),%rax # 6d0200 <array2> |
| 400b64: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx |
| 400b68: 0f b6 05 91 e1 2c 00 movzbl 0x2ce191(%rip),%eax # 6ced00 <temp> |
| 400b6f: 21 d0 and %edx,%eax |
| 400b71: 88 05 89 e1 2c 00 mov %al,0x2ce189(%rip) # 6ced00 <temp> |
| } |
| } |
| 400b77: 90 nop |
| 400b78: 5d pop %rbp |
| 400b79: c3 retq |
| </code></pre></div></div> |
| |
| <iframe height="500" src="/assets/img/gcc72-static-tage.html" frameborder="0"> |
| </iframe> |
| <p>However, clang generates the following code.</p> |
| |
| <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) { |
| 400ac0: 55 push %rbp |
| 400ac1: 48 89 e5 mov %rsp,%rbp |
| 400ac4: 48 89 7d f8 mov %rdi,-0x8(%rbp) |
| if (x < array1_size) { |
| 400ac8: 48 8b 7d f8 mov -0x8(%rbp),%rdi |
| 400acc: 8b 04 25 90 c0 6c 00 mov 0x6cc090,%eax |
| 400ad3: 89 c1 mov %eax,%ecx |
| 400ad5: 48 39 cf cmp %rcx,%rdi |
| 400ad8: 0f 83 2f 00 00 00 jae 400b0d <victim_function+0x4d> |
| temp &= array2[array1[x] * 512]; |
| 400ade: 48 8b 45 f8 mov -0x8(%rbp),%rax |
| 400ae2: 0f b6 0c 05 a0 c0 6c movzbl 0x6cc0a0(,%rax,1),%ecx |
| 400ae9: 00 |
| 400aea: c1 e1 09 shl $0x9,%ecx |
| 400aed: 48 63 c1 movslq %ecx,%rax |
| 400af0: 0f b6 0c 05 40 f2 6c movzbl 0x6cf240(,%rax,1),%ecx |
| 400af7: 00 |
| 400af8: 0f b6 14 25 50 dc 6c movzbl 0x6cdc50,%edx |
| 400aff: 00 |
| 400b00: 21 ca and %ecx,%edx |
| 400b02: 40 88 d6 mov %dl,%sil |
| 400b05: 40 88 34 25 50 dc 6c mov %sil,0x6cdc50 |
| 400b0c: 00 |
| } |
| } |
| 400b0d: 5d pop %rbp |
| 400b0e: c3 retq |
| 400b0f: 90 nop |
| </code></pre></div></div> |
| |
| <iframe height="500" src="/assets/img/clang-static-tage.html" frameborder="0"> |
| </iframe> |
| <p>Interestingly, the clang-compiled <code class="highlighter-rouge">spectre</code> binary is not able to read |
| the secret data! (At least not in gem5. It is able to read the secret |
| data on my native machine.)</p> |
| |
| <p>We can look into the two traces to see the difference between the clang |
| version and the GCC version.</p> |
| |
| <p>The main difference is that in the clang version, the load generated by |
| the instruction at <code class="highlighter-rouge">0x400af0</code> never completes (and thus, must not have |
| been issued to the memory system).</p> |
| |
| <p>I’m not sure the exact cause of this difference. It could be that the |
| instruction uses a different addressing mode |
| (<code class="highlighter-rouge">movzbl 0x6cf240(,%rax,1),%ecx</code> in clang vs <code class="highlighter-rouge">movzbl (%rdx,%rax,1),%edx</code> |
| in GCC). If you have ideas, please leave a comment!</p> |
| |
| <p>Either way, minor differences in the code generated can have large |
| impacts on the speculative execution!</p> |
| |
| <h3 id="effects-of-branch-predictor">Effects of branch predictor</h3> |
| |
| <p>When I was first playing around with Spectre and gem5, I ran into a |
| problem where I could only <em>sometimes</em> get Spectre to “work” with the |
| out of order CPU. After significant digging, I found that the branch |
| predictor chosen makes a big difference to how quickly the vulnerability |
| happens. The trace below (with the same code as GCC 4.8 above) shows |
| what happens when using the tournament branch predictor.</p> |
| |
| <iframe height="500" src="/assets/img/gcc-static-tourn.html" frameborder="0"> |
| </iframe> |
| <p>Here, we see that the original branch misprediction comes much earlier |
| than the jump instruction in <code class="highlighter-rouge">victim_function</code> that is at address |
| <code class="highlighter-rouge">0x401072</code>. Thus, by the time the load instructions in <code class="highlighter-rouge">victim_function</code> |
| are executed, the ROB and load-store queue resources have been taken by |
| other instructions and the rogue loads are not issued to memory. There |
| are still a few times that the two loads are executed speculatively, but |
| it is much more rare than with the TAGE predictor. When using the TAGE |
| branch predictor, only the exact branch that the attacker wants to |
| mispredict is mispredicted.</p> |
| |
| <p>This interestingly shows that a “smarter” system is actually <em>more</em> |
| vulnerable to speculation-based attacks!</p> |
| |
| </div> |
| |
| </main> |
| |
| <footer class="page-footer"> |
| <div class="container"> |
| <div class="row"> |
| |
| <div class="col-12 col-sm-4"> |
| <p>gem5</p> |
| <p><a href="/about">About</a></p> |
| <p><a href="/publications">Publications</a></p> |
| <p><a href="/contributing">Contributing</a></p> |
| <p><a href="/governance">Governance</a></p> |
| <br></div> |
| |
| <div class="col-12 col-sm-4"> |
| <p>Docs</p> |
| <p><a href="/introduction">Documentation</a></p> |
| <p><a href="http://gem5.org/Documentation">Old Documentation</a></p> |
| <p><a href="https://gem5.googlesource.com/public/gem5">Source</a></p> |
| <br></div> |
| |
| <div class="col-12 col-sm-4"> |
| <p>Help</p> |
| <p><a href="/search">Search</a></p> |
| <p><a href="#">Mailing Lists</a></p> |
| <p><a href="https://github.com/gem5/new-website/tree/master/">Website Source</a></p> |
| <br></div> |
| |
| </div> |
| </div> |
| </footer> |
| |
| |
| |
| <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script> |
| <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script> |
| |
| <script> |
| // When the user scrolls down 20px from the top of the document, show the button |
| window.onscroll = function() {scrollFunction()}; |
| |
| function scrollFunction() { |
| if (document.body.scrollTop > 100 || document.documentElement.scrollTop > 20) { |
| document.getElementById("myBtn").style.display = "block"; |
| } else { |
| document.getElementById("myBtn").style.display = "none"; |
| } |
| } |
| |
| // When the user clicks on the button, scroll to the top of the document |
| function topFunction() { |
| document.body.scrollTop = 0; |
| document.documentElement.scrollTop = 0; |
| } |
| </script> |
| |
| </body> |
| |
| |
| </html> |