_site/2018/06/01/gem5-spectre.html - public/gem5-website - Git at Google

 <!DOCTYPE html>
 <html>
 <head>
 	<!-- Global site tag (gtag.js) - Google Analytics -->
 	<script async src="https://www.googletagmanager.com/gtag/js?id='UA-133422980-2"></script>
 	<script>
 	  window.dataLayer = window.dataLayer || [];
 	  function gtag(){dataLayer.push(arguments);}
 	  gtag('js', new Date());

 	  gtag('config', ''UA-133422980-2');
 	</script>

 	<meta charset="utf-8">
 	<meta http-equiv="x-ua-compatible" content="ie=edge">
 	<meta name="viewport" content="width=device-width, initial-scale=1">

 	<title>gem5</title>

 	<!-- SITE FAVICON -->
 	<link rel="shortcut icon" type="image/gif" href="/assets/img/gem5ColorVert.gif"/>

 	<link rel="canonical" href="http://localhost:4000/2018/06/01/gem5-spectre.html">
 	<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,300,700,800,600' rel='stylesheet' type='text/css'>
 	<link href='https://fonts.googleapis.com/css?family=Muli:400,300' rel='stylesheet' type='text/css'>

 	<!-- FAVICON -->
 	<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">

 	<!-- BOOTSTRAP -->
 	<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">

 	<!-- CUSTOM CSS -->
 	<link rel="stylesheet" href="/css/main.css">
 </head>


 <body>
 	<nav class="navbar navbar-expand-md navbar-light bg-light">
   <a class="navbar-brand" href="/">
 		<img src="/assets/img/gem5ColorLong.gif" alt="gem5" height=45px>
 	</a>
   <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarNavDropdown" aria-controls="navbarNavDropdown" aria-expanded="false" aria-label="Toggle navigation">
     <span class="navbar-toggler-icon"></span>
   </button>
   <div class="collapse navbar-collapse" id="navbarNavDropdown">
     <ul class="navbar-nav ml-auto">
       <li class="nav-item ">
         <a class="nav-link" href="/">Home</a>
       </li>

 			<li class="nav-item dropdown ">
 				<a class="nav-link dropdown-toggle" href="/about" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
 					About
 				</a>
 				<div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
           <a class="dropdown-item" href="/about">About</a>
           <a class="dropdown-item" href="/publications">Publications</a>
           <a class="dropdown-item" href="/governance">Governance</a>
 				</div>
 			</li>

 			<li class="nav-item dropdown ">
 				<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
 					Documentation
 				</a>
 				<div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
 					<!-- Pull navigation from _data/documentation.yml -->

             <a class="dropdown-item" href="/introduction">Introduction</a>

             <a class="dropdown-item" href="/building">Getting Started</a>

             <a class="dropdown-item" href="/environment">Modifying/Extending</a>

             <a class="dropdown-item" href="/MSIintro">Modeling Cache Coherence with Ruby</a>

 				</div>
 			</li>

       <li class="nav-item ">
         <a class="nav-link" href="/contributing">Contributing</a>
       </li>

       <li class="nav-item ">
         <a class="nav-link" href="/blog">Blog</a>
       </li>

 			<li class="nav-item ">
         <a class="nav-link" href="/search">Search</a>
       </li>
     </ul>
   </div>
 </nav>

 	<main>
 		<br><br>
 <div class="container post">
   <h1>Visualizing Spectre with gem5</h1>
   <time>Jun 1, 2018 • Jason Lowe-Power</time>
   <hr>
   <p><a href="https://meltdownattack.com/">Spectre and Meltdown</a> took much of our
 community by surprise. I personally found these attacks fascinating
 because they didn’t rely on a <em>bug</em> in any particular hardware
 implementation, but leveraged undefined behavior. Specifically, Spectre
 and Meltdown can exfiltrate potentially secret memory data by detecting
 the effects of speculative instructions <em>that are later squashed</em>.</p>

 <p>Very cool!</p>

 <p>Out of order processors are very complex. It would make it easier to
 understand exactly what causes speculation attacks like Spectre and
 Meltdown if we had a way to <em>visualize</em> the attacks. Luckily, gem5
 already has a way to view the details of it’s out of order CPU’s
 pipeline.</p>

 <p><img src="/assets/img/o3-example.png" alt="o3 pipeline view example" /></p>

 <p>The image above was created using the O3 pipeline viewer that is
 included with gem5. In this post, I’ll explain how to use the O3
 pipeline viewer and how to generate images like the above. There is also
 a new project which makes it easier to navigate large pipeline traces
 and it is useful for comparing different pipeline designs:
 <a href="https://github.com/shioyadan/Konata">Konata</a> created by Ryota Shioya.
 Ryota gave a presentation on Konata at a recent <a href="http://learning.gem5.org/tutorial/index.html">Learning gem5
 tutorial</a>. You can find
 the pdf of his presentation
 <a href="http://learning.gem5.org/tutorial/presentations/vis-o3-gem5.pdf">here</a>.
 Konata is a cool tool that’s written in javascript and Ryota describes
 it as “Google maps for an out of order pipeline”.</p>

 <h2 id="running-spectre">Running Spectre</h2>

 <p>The first step to visualizing what is going on in the pipeline during a
 Spectre attack is getting proof of concept exploit code. I used the code
 that was posted to a github gist by Erik August soon after the attack
 was announced. You can get that code here:
 <a href="https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6">https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9e3d4bb6</a>.</p>

 <p>First, you need to compile the proof of concept code on your native
 machine (note: I’ll be using x86 for all of my examples).</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc spectre.c -o spectre -static
 </code></pre></div></div>

 <p>I used gcc 7.2 (the default on Ubuntu 17.10) for my tests, and you may
 want to do the same. <a href="#effects-of-compilers">Below</a> I discuss the
 effects different compilers have on the Specre attack. For instance, if
 you use clang instead you may not be able to reproduce the Spectre
 attack in gem5.</p>

 <p>My native machine is still vulnerable to Spectre so when I run the
 binary generated above, I get the following output.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Reading 40 bytes:
 Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2
 Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2
 Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2
 Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2
 Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2
 Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2
 Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2
 Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2
 Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2
 Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2
 Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2
 Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2
 Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2
 Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x50=’P’ score=9 (second best: 0x06 score=2)
 Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2
 Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2
 Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2
 </code></pre></div></div>

 <h3 id="running-spectre-in-gem5">Running Spectre in gem5</h3>

 <p>To find out if gem5’s out of order CPU implementation is vulnerable to
 Spectre, we need to run the code in gem5. The simplest and fastest way
 to do this is by running in gem5’s syscall-emulation (SE) mode. In SE
 mode we won’t be modeling an OS or any user-mode to kernel-mode
 interaction, but this okay for Spectre since this proof of concept code
 is all in user-mode. If we were investigating Metldown, we would have to
 use full-system (FS) mode since Meltdown specifically allows user-mode
 processes to read data that should only be accessible in kernel mode.</p>

 <p>So, when running something in gem5, the first step is to create a Python
 runscript since this is <a href="http://learning.gem5.org/book/part1/simple_config.html">the “interface” to
 gem5</a>. For this
 example, what we need is a system with one CPU, an L1 cache, and memory.
 For simplicity, I’m going to modify one of the existing script,
 specifically the <code class="highlighter-rouge">two_level.py</code> script from the <a href="http://learning.gem5.org/">Learning gem5
 book</a>.</p>

 <p>In the file <code class="highlighter-rouge">gem5/configs/learning_gem5/part1/two_level.py</code>, I simply
 changed the CPU from <code class="highlighter-rouge">TimingSimpleCPU()</code> to
 <code class="highlighter-rouge">DerivO3CPU(branchPred=LTAGE())</code>. I also set the O3CPU to use the LTAGE
 branch predictor instead of the default tournament branch predictor.
 It’s important to use the LTAGE branch predictor as better branch
 predictors actually make Spectre easier to exploit as discussed further
 <a href="#effects-of-branch-predictor">below</a>.</p>

 <p>Now, we simply need to build gem5 and run it.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>scons -j8 build/X86/gem5.opt

 build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre
 </code></pre></div></div>

 <p>And, the output that I get is the following, just like above when I ran
 the <code class="highlighter-rouge">spectre</code> natively.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem5 Simulator System.  http://gem5.org
 gem5 is copyrighted software; use the --copyright option for details.

 gem5 compiled May 10 2018 09:40:08
 gem5 started May 24 2018 11:21:16
 gem5 executing on palisade, pid 27173
 command line: build/X86/gem5.opt configs/learning_gem5/part1/two_level.py spectre

 Global frequency set at 1000000000000 ticks per second
 warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
 0: system.remote_gdb: listening for remote gdb on port 7000
 Beginning simulation!
 info: Entering event queue @ 0.  Starting simulation...
 warn: readlink() called on '/proc/self/exe' may yield unexpected results in various settings.
       Returning '/home/jlp/Code/gem5/spectre-vis/spectre'
 info: Increasing stack size by one page.
 warn: ignoring syscall access(...)
 Reading 40 bytes:                 tput cols
 Reading at malicious_x = 0xffffffffffdd76c8... Success: 0x54=’T’ score=2
 Reading at malicious_x = 0xffffffffffdd76c9... Success: 0x68=’h’ score=2
 Reading at malicious_x = 0xffffffffffdd76ca... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76cb... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76cc... Success: 0x4D=’M’ score=2
 Reading at malicious_x = 0xffffffffffdd76cd... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76ce... Success: 0x67=’g’ score=2
 Reading at malicious_x = 0xffffffffffdd76cf... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76d0... Success: 0x63=’c’ score=2
 Reading at malicious_x = 0xffffffffffdd76d1... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76d2... Success: 0x57=’W’ score=2
 Reading at malicious_x = 0xffffffffffdd76d3... Success: 0x6F=’o’ score=2
 Reading at malicious_x = 0xffffffffffdd76d4... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76d5... Success: 0x64=’d’ score=2
 Reading at malicious_x = 0xffffffffffdd76d6... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76d7... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76d8... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76d9... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76da... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76db... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76dc... Success: 0x53=’S’ score=2
 Reading at malicious_x = 0xffffffffffdd76dd... Success: 0x71=’q’ score=2
 Reading at malicious_x = 0xffffffffffdd76de... Success: 0x75=’u’ score=2
 Reading at malicious_x = 0xffffffffffdd76df... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76e0... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76e1... Success: 0x6D=’m’ score=2
 Reading at malicious_x = 0xffffffffffdd76e2... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76e3... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e4... Success: 0x68=’h’ score=2
 Reading at malicious_x = 0xffffffffffdd76e5... Success: 0x20=’ ’ score=2
 Reading at malicious_x = 0xffffffffffdd76e6... Success: 0x4F=’O’ score=2
 Reading at malicious_x = 0xffffffffffdd76e7... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e8... Success: 0x73=’s’ score=2
 Reading at malicious_x = 0xffffffffffdd76e9... Success: 0x69=’i’ score=2
 Reading at malicious_x = 0xffffffffffdd76ea... Success: 0x66=’f’ score=2
 Reading at malicious_x = 0xffffffffffdd76eb... Success: 0x72=’r’ score=2
 Reading at malicious_x = 0xffffffffffdd76ec... Success: 0x61=’a’ score=2
 Reading at malicious_x = 0xffffffffffdd76ed... Success: 0x67=’g’ score=2
 Reading at malicious_x = 0xffffffffffdd76ee... Success: 0x65=’e’ score=2
 Reading at malicious_x = 0xffffffffffdd76ef... Success: 0x2E=’.’ score=2
 Exiting @ tick 113568969000 because exiting with last active thread context
 </code></pre></div></div>

 <h2 id="visualizing-the-out-of-order-pipeline">Visualizing the out of order pipeline</h2>

 <p>To generate pipeline visualizations, we first need to generate a trace
 file of all of the instructions executed by the out of order CPU. To
 create this trace, we can use the <code class="highlighter-rouge">O3PipeView</code> debug flag.</p>

 <p>Now, the trace for the O3 CPU can be <em>very</em> large, up to many GBs. When
 creating this trace, you need to be careful to create the smallest trace
 possible. Also, it’s important to dump the trace to a file and not to
 <code class="highlighter-rouge">stdout</code>, which is the default when using debug flags. You can redirect
 the trace to a file by using the <code class="highlighter-rouge">--debug-file</code> option to gem5.</p>

 <p>To create the trace file, I used the following methodology:</p>

 <ol>
   <li>Start running spectre in gem5, then hit ctrl-c after the first
 couple of letters. At this point, I wrote down the tick which gem5
 exited (13062347000 for me).</li>
   <li>Run gem5 with the debug flag <code class="highlighter-rouge">O3PipeView</code> enabled.</li>
   <li>Watch the output and kill gem5 with ctrl-c after two more letters
 appeared than in step 1.</li>
 </ol>

 <p>To generate the trace, I ran the following command. Note: you may have a
 different value for when to start the debugging trace. Also note: when
 producing the trace gem5 will run <em>much</em> slower.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build/X86/gem5.opt --debug-flags=O3PipeView --debug-file=pipeview.txt --debug-start=13062347000 configs/learning_gem5/part1/two_level.py spectre
 </code></pre></div></div>

 <p>My tracefile (<code class="highlighter-rouge">pipeview.txt</code>) was 600 MB for catching just two letters
 in the output.</p>

 <p>Now, we can process this file to generate the visualization with a
 script: <code class="highlighter-rouge">util/o3-pipeview.py</code>. This script requires the path to the file
 that contains the output generated with the <code class="highlighter-rouge">O3PipeView</code> debug flag.
 Above, we put the output into the file <code class="highlighter-rouge">pipeview.txt</code>, and this file was
 created in the default output directory of gem5 (<code class="highlighter-rouge">m5out/</code>).</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>util/o3-pipeview.py --store_completions m5out/pipeview.txt --color -w 150
 </code></pre></div></div>

 <p>In the above command, I wanted to see when the stores completed
 (<code class="highlighter-rouge">--store_completions</code>) and specified to use color (<code class="highlighter-rouge">--color</code>) in the
 output and use a width of 150 characters (<code class="highlighter-rouge">-w 150</code>). Processing a large
 file like this one of 600 MB may take a few minutes. The output will be
 in a file called <code class="highlighter-rouge">o3-pipeview.out</code> in the current working directory.</p>

 <p>You can view this file with <code class="highlighter-rouge">less -r o3-pipeview.out</code>. You may want to
 use the <code class="highlighter-rouge">-S</code> option with less if your terminal is less than 150
 characters wide (or whatever width value you used). Below is a
 screenshot of the top of my trace.</p>

 <h3 id="understanding-the-o3-pipeline-viewer">Understanding the O3 pipeline viewer</h3>

 <p><img src="/assets/img/o3-example-annotated.png" alt="o3 pipeline view example" /></p>

 <p>The above image details how to interpret the output from the pipeline
 viewer. Each <code class="highlighter-rouge">.</code> or <code class="highlighter-rouge">=</code> represents one cycle of time, which moves from
 left to right. The “tick” column shows the tick of the leftmost <code class="highlighter-rouge">.</code> or
 <code class="highlighter-rouge">=</code>. <code class="highlighter-rouge">=</code> is used to mark the instructions that were later squashed. The
 address of the instruction (and the micro-op number) as well as the
 disassembly is also shown. The sequence number can be ignored as it is
 always monotonically increasing and is the total order of every dynamic
 instruction. Finally, each stage of the O3 pipeline is shown with a
 different letter and color.</p>

 <h2 id="digging-deeper-into-spectre">Digging deeper into Spectre</h2>

 <p>First, let’s examine the actual instructions that are executed during
 the Spectre attack. The vulnerability is in the <code class="highlighter-rouge">victim_function</code> in
 <code class="highlighter-rouge">spectre.c</code>.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) {
   if (x &lt; array1_size) {
     temp &amp;= array2[array1[x] * 512];
   }
 }
 </code></pre></div></div>

 <p>When this is compiled and then dumped with <code class="highlighter-rouge">objdump</code>, we get the
 following instructions that will be executed. Your code my be slightly
 different, especially the exact addresses of each instruction, depending
 on the version of the compiler and other system-specific configurations.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># NOTE: the movzbl below is MOVZX_B_R_M in gem5.
 # it is implemented with the following microcode.
 #    ld t1, seg, sib, disp, dataSize=1
 #    zexti reg, t1, 7
 #
 000000000040105e &lt;victim_function&gt;:
   40105e:   55                      push   %rbp
   40105f:   48 89 e5                mov    %rsp,%rbp
   401062:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
   401066:   8b 05 14 f0 2b 00       mov    0x2bf014(%rip),%eax # 6c0080 &lt;array1_size&gt; load array1_size (first time is always a miss)
   40106c:   89 c0                   mov    %eax,%eax
   40106e:   48 3b 45 f8             cmp    -0x8(%rbp),%rax  # if (x &lt; array1_size) rax is array1_size, -8(%rbp) is x
   401072:   76 2b                   jbe    40109f &lt;victim_function+0x41&gt; # if (x &lt; array1_size)
   401074:   48 8b 45 f8             mov    -0x8(%rbp),%rax # load x from the stack into rax
   401078:   48 05 a0 00 6c 00       add    $0x6c00a0,%rax  # calculate array1 offset (x+array1)
   40107e:   0f b6 00                movzbl (%rax),%eax # load array1[x]
   401081:   0f b6 c0                movzbl %al,%eax    # zero extend to 32 bits
   401084:   c1 e0 09                shl    $0x9,%eax   # multiply by 512
   401087:   48 98                   cltq               # sign-extend eax
   401089:   0f b6 90 80 1d 6c 00    movzbl 0x6c1d80(%rax),%edx  # load array2[array1[x]*512] **** This is the magic!
   401090:   0f b6 05 e9 0c 2e 00    movzbl 0x2e0ce9(%rip),%eax        # 6e1d80 &lt;temp&gt; Load temp.
   401097:   21 d0                   and    %edx,%eax
   401099:   88 05 e1 0c 2e 00       mov    %al,0x2e0ce1(%rip)        # 6e1d80 &lt;temp&gt;
   40109f:   5d                      pop    %rbp
   4010a0:   c3                      retq
 </code></pre></div></div>

 <p>Now, we can search for the instruction that we care about in the trace.
 In this case, we want to find a time where the <code class="highlighter-rouge">movzbl</code> at address
 <code class="highlighter-rouge">0x401089</code> is executed speculatively. When searching through the
 pipeline viewer (use <code class="highlighter-rouge">\</code> in less), we’re looking for a time where the
 load completes for the instruction at <code class="highlighter-rouge">0x401089</code> and it is later
 squashed (surrounded by <code class="highlighter-rouge">=</code>). An example is shown below.</p>

 <p><img src="/assets/img/o3-spectre-annotated.png" alt="annotated O3 pipeline view of
 spectre" /></p>

 <p>The image above is from my presentation at <a href="http://caslab.csl.yale.edu/workshops/hasp2018/">Hardware and Architectural
 Support for Security and Privacy (HASP)
 2018</a>.</p>

 <p>What we see in this image is that the instruction at <code class="highlighter-rouge">0x401066</code> causes a
 cache miss (there is a long time between when the load is issued and the
 data is returned from memory). Since the load of <code class="highlighter-rouge">array1_size</code> was a
 cache miss, the jump at <code class="highlighter-rouge">0x401072</code> is speculated to be <em>not</em> taken
 (incorrectly). This causes the following instructions to be executed
 speculatively, and, eventually, squashed.</p>

 <p>The key thing in this trace that <em>is</em> the Spectre vulnerability is that
 the load for the instruction at <code class="highlighter-rouge">0x40107e</code>, which loads secret data
 happens during the mis-speculated instructions. Then, this data is
 loaded into the registers and operated on (instruction <code class="highlighter-rouge">0x401084</code>).
 Finally, the load at address <code class="highlighter-rouge">0x401089</code> is executed and loads the value
 from memory <em>that is dependent on the secret data loaded previously</em>.
 Thus, we can later probe the cache to retrieve the secret data.</p>

 <h3 id="effects-of-compilers">Effects of compilers</h3>

 <p>As previously mentioned, the specific compiler version and compiler
 options have a significant effect on the attack. Below are two traces,
 one from GCC 7.2 and one from clang 4.0.</p>

 <h4 id="gcc-72">GCC 7.2</h4>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) {
   400b2d:       55                      push   %rbp
   400b2e:       48 89 e5                mov    %rsp,%rbp
   400b31:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
   if (x &lt; array1_size) {
   400b35:       8b 05 c5 c5 2c 00       mov    0x2cc5c5(%rip),%eax        # 6cd100 &lt;array1_size&gt;
   400b3b:       89 c0                   mov    %eax,%eax
   400b3d:       48 39 45 f8             cmp    %rax,-0x8(%rbp)
   400b41:       73 34                   jae    400b77 &lt;victim_function+0x4a&gt;
     temp &amp;= array2[array1[x] * 512];
   400b43:       48 8d 15 d6 c5 2c 00    lea    0x2cc5d6(%rip),%rdx        # 6cd120 &lt;array1&gt;
   400b4a:       48 8b 45 f8             mov    -0x8(%rbp),%rax
   400b4e:       48 01 d0                add    %rdx,%rax
   400b51:       0f b6 00                movzbl (%rax),%eax
   400b54:       0f b6 c0                movzbl %al,%eax
   400b57:       c1 e0 09                shl    $0x9,%eax
   400b5a:       48 63 d0                movslq %eax,%rdx
   400b5d:       48 8d 05 9c f6 2c 00    lea    0x2cf69c(%rip),%rax        # 6d0200 &lt;array2&gt;
   400b64:       0f b6 14 02             movzbl (%rdx,%rax,1),%edx
   400b68:       0f b6 05 91 e1 2c 00    movzbl 0x2ce191(%rip),%eax        # 6ced00 &lt;temp&gt;
   400b6f:       21 d0                   and    %edx,%eax
   400b71:       88 05 89 e1 2c 00       mov    %al,0x2ce189(%rip)        # 6ced00 &lt;temp&gt;
   }
 }
   400b77:       90                      nop
   400b78:       5d                      pop    %rbp
   400b79:       c3                      retq
 </code></pre></div></div>

 <iframe height="500" src="/assets/img/gcc72-static-tage.html" frameborder="0">
 </iframe>
 <p>However, clang generates the following code.</p>

 <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void victim_function(size_t x) {
   400ac0:       55                      push   %rbp
   400ac1:       48 89 e5                mov    %rsp,%rbp
   400ac4:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
   if (x &lt; array1_size) {
   400ac8:       48 8b 7d f8             mov    -0x8(%rbp),%rdi
   400acc:       8b 04 25 90 c0 6c 00    mov    0x6cc090,%eax
   400ad3:       89 c1                   mov    %eax,%ecx
   400ad5:       48 39 cf                cmp    %rcx,%rdi
   400ad8:       0f 83 2f 00 00 00       jae    400b0d &lt;victim_function+0x4d&gt;
     temp &amp;= array2[array1[x] * 512];
   400ade:       48 8b 45 f8             mov    -0x8(%rbp),%rax
   400ae2:       0f b6 0c 05 a0 c0 6c    movzbl 0x6cc0a0(,%rax,1),%ecx
   400ae9:       00
   400aea:       c1 e1 09                shl    $0x9,%ecx
   400aed:       48 63 c1                movslq %ecx,%rax
   400af0:       0f b6 0c 05 40 f2 6c    movzbl 0x6cf240(,%rax,1),%ecx
   400af7:       00
   400af8:       0f b6 14 25 50 dc 6c    movzbl 0x6cdc50,%edx
   400aff:       00
   400b00:       21 ca                   and    %ecx,%edx
   400b02:       40 88 d6                mov    %dl,%sil
   400b05:       40 88 34 25 50 dc 6c    mov    %sil,0x6cdc50
   400b0c:       00
   }
 }
   400b0d:       5d                      pop    %rbp
   400b0e:       c3                      retq
   400b0f:       90                      nop
 </code></pre></div></div>

 <iframe height="500" src="/assets/img/clang-static-tage.html" frameborder="0">
 </iframe>
 <p>Interestingly, the clang-compiled <code class="highlighter-rouge">spectre</code> binary is not able to read
 the secret data! (At least not in gem5. It is able to read the secret
 data on my native machine.)</p>

 <p>We can look into the two traces to see the difference between the clang
 version and the GCC version.</p>

 <p>The main difference is that in the clang version, the load generated by
 the instruction at <code class="highlighter-rouge">0x400af0</code> never completes (and thus, must not have
 been issued to the memory system).</p>

 <p>I’m not sure the exact cause of this difference. It could be that the
 instruction uses a different addressing mode
 (<code class="highlighter-rouge">movzbl 0x6cf240(,%rax,1),%ecx</code> in clang vs <code class="highlighter-rouge">movzbl (%rdx,%rax,1),%edx</code>
 in GCC). If you have ideas, please leave a comment!</p>

 <p>Either way, minor differences in the code generated can have large
 impacts on the speculative execution!</p>

 <h3 id="effects-of-branch-predictor">Effects of branch predictor</h3>

 <p>When I was first playing around with Spectre and gem5, I ran into a
 problem where I could only <em>sometimes</em> get Spectre to “work” with the
 out of order CPU. After significant digging, I found that the branch
 predictor chosen makes a big difference to how quickly the vulnerability
 happens. The trace below (with the same code as GCC 4.8 above) shows
 what happens when using the tournament branch predictor.</p>

 <iframe height="500" src="/assets/img/gcc-static-tourn.html" frameborder="0">
 </iframe>
 <p>Here, we see that the original branch misprediction comes much earlier
 than the jump instruction in <code class="highlighter-rouge">victim_function</code> that is at address
 <code class="highlighter-rouge">0x401072</code>. Thus, by the time the load instructions in <code class="highlighter-rouge">victim_function</code>
 are executed, the ROB and load-store queue resources have been taken by
 other instructions and the rogue loads are not issued to memory. There
 are still a few times that the two loads are executed speculatively, but
 it is much more rare than with the TAGE predictor. When using the TAGE
 branch predictor, only the exact branch that the attacker wants to
 mispredict is mispredicted.</p>

 <p>This interestingly shows that a “smarter” system is actually <em>more</em>
 vulnerable to speculation-based attacks!</p>

   <div class="commentbox"></div>

 </div>

 	</main>

 	<footer class="page-footer">
 	<div class="container">
 		<div class="row">

 			<div class="col-12 col-sm-4">
 				<p>gem5</p>
 				<p><a href="/about">About</a></p>
 				<p><a href="/publications">Publications</a></p>
 				<p><a href="/contributing">Contributing</a></p>
 				<p><a href="/governance">Governance</a></p>
 			<br></div>

 			<div class="col-12 col-sm-4">
 				<p>Docs</p>
 				<p><a href="/introduction">Documentation</a></p>
 				<p><a href="http://gem5.org/Documentation">Old Documentation</a></p>
 				<p><a href="https://gem5.googlesource.com/public/gem5">Source</a></p>
 			<br></div>

 			<div class="col-12 col-sm-4">
 				<p>Help</p>
 				<p><a href="/search">Search</a></p>
 				<p><a href="#">Mailing Lists</a></p>
 				<p><a href="https://github.com/gem5/new-website/tree/master/">Website Source</a></p>
 			<br></div>

 		</div>
 	</div>
 </footer>


 	<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
 	<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
 	<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
 	<script src="https://unpkg.com/commentbox.io/dist/commentBox.min.js"></script>

 	<script>
 	  // When the user scrolls down 20px from the top of the document, show the button
 	  window.onscroll = function() {scrollFunction()};

 	  function scrollFunction() {
 	      if (document.body.scrollTop > 100 || document.documentElement.scrollTop > 20) {
 	          document.getElementById("myBtn").style.display = "block";
 	      } else {
 	          document.getElementById("myBtn").style.display = "none";
 	      }
 	  }

 	  // When the user clicks on the button, scroll to the top of the document
 	  function topFunction() {
 	      document.body.scrollTop = 0;
 	      document.documentElement.scrollTop = 0;
 	  }

 		import commentBox from 'commentbox.io';
 		// or
 		const commentBox = require('commentbox.io');
 		// or if using the CDN, it will be available as a global "commentBox" variable.

 		commentBox('my-project-id');

 	</script>

 </body>


 </html>