website: Added 3rd gem5 users' workshop blogs

Added the gem5 users' workshop blogs, as well as links to video
presentations.

Change-Id: Ie48f505e99e886e4a86aed21e4b24db4a72f7bf7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5-website/+/29692
Reviewed-by: Bobby R. Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby R. Bruce <bbruce@ucdavis.edu>
Tested-by: Bobby R. Bruce <bbruce@ucdavis.edu>
diff --git a/_pages/events/isca-2020.md b/_pages/events/isca-2020.md
index ec2b98b..833ff7b 100644
--- a/_pages/events/isca-2020.md
+++ b/_pages/events/isca-2020.md
@@ -66,20 +66,23 @@
 this schedule should be viewed as an order of presentation and only a rough
 guide for presentation start times.**
 
-| Time        | Event                        | Authors (presenter in bold)|
-|:-----------:|:-----------------------------|:---------------------------|
-|1600 -- 1610 | Introduction to the workshop | **Jason Lowe-Power**|
-|1610 -- 1620 | A Modular and Secure System Architecture for IoT | **Nils Asmussen**|
-|1620 -- 1630 | Memory controller updates for new DRAM technologies, NVM interfaces and flexible memory topologies | **Wendy Elsasser**, Nikos Nikoleris|
-|1630 -- 1640 | HeteroGarnet - A Detailed Simulator for Diverse Interconnect Systems | **Srikant Bharadqaj**, Jieming Ying, Bradford Beckmann, Tushar Krishna|
-|1640 -- 1650 | Heterogeneous systems modeling with Adaptive Traffic Profiles | Matteo Andreozzi, **Frances Conboy**, Giovanni Stea, Raffaele Zippo|
-|1650 -- 1700 | Enabling Multi-GPU Support in gem5 | **Bobbi Winema Yogatama**, Matthew Sinclair, Michael Swift|
-|1700 -- 1710 | gem5 GUI | **Shivam Desai**, Rohit Dhamankar, Ravishdeep Singh, Ahmed Farooqui, Jason Lowe-Power, Bobby R. Bruce|
-|1710 -- 1720 | Towards full-system discrete GPU simulation | Mattew Porembra, Alexandru Dutu, Gaurav Jain, Pouya Fotouhi, Michael Boyer, Bradford M. Beckmann|
-|1720 -- 1730 | gem5art: Zen and the Art of gem5 Experiments | **Ayaz Akram**, Mahyar Samani, Hoa Nguyen, Krithiga Murugavel, Trivikram Reddy, Marjan Fariborz, Pouya Fotouhi, Jason Lowe-Power|
-|1730 -- 1740 | Modeling Modern GPU Applications in gem5 | Kyle Roarty, **Matthew Sinclair** |
-|1740 -- 1750 | Implementation of a flexible cache coherency protocol for the Ruby memory system | **Tiago Muck**, Pedro Benedicte|
-|1750 -- 1800 | Workshop wrap-up | **Jason Lowe-Power** |
+All gem5 Workshop presentations can be found [here](
+https://www.youtube.com/playlist?list=PL_hVbFs_loVQ8FDTRCmRvmkPFzf6swgZh).
+
+| Time        | Event                        | Authors (presenter in bold)| Presentation | Blog Post |
+|:-----------:|:-----------------------------|:---------------------------|:------------:|:---------:|
+|1600 -- 1610 | Introduction to the workshop | **Jason Lowe-Power**| N/A | N/A |
+|1610 -- 1620 | A Modular and Secure System Architecture for IoT | **Nils Asmussen**|[Link](https://youtu.be/2jPiXOhboko)|[Link](/2020/05/29/modular-and-secure.html)|
+|1620 -- 1630 | Memory controller updates for new DRAM technologies, NVM interfaces and flexible memory topologies | **Wendy Elsasser**, Nikos Nikoleris|[Link 1](https://youtu.be/ttJ9_I_Avyc), [Link 2](https://youtu.be/t2PRoZPwwpk)|[Link](/2020/05/27/memory-controller.html)|
+|1630 -- 1640 | HeteroGarnet - A Detailed Simulator for Diverse Interconnect Systems | **Srikant Bharadqaj**, Jieming Ying, Bradford Beckmann, Tushar Krishna|[Link](https://youtu.be/AH9r44r2lHA)|[Link](/2020/05/27/heterogarnet.html)|
+|1640 -- 1650 | Heterogeneous systems modeling with Adaptive Traffic Profiles | Matteo Andreozzi, **Frances Conboy**, Giovanni Stea, Raffaele Zippo|[Link](https://youtu.be/UhWAozvZ9mU)|-|
+|1650 -- 1700 | Enabling Multi-GPU Support in gem5 | **Bobbi Winema Yogatama**, Matthew Sinclair, Michael Swift|[Link](https://youtu.be/TSULdaGw0V8)|-|
+|1700 -- 1710 | gem5 GUI | **Shivam Desai**, Rohit Dhamankar, Ravishdeep Singh, Ahmed Farooqui, Jason Lowe-Power, Bobby R. Bruce|[Link](https://youtu.be/ab0ZUSTkYEk)|[Link](/2020/05/29/gem5-gui.html)|
+|1710 -- 1720 | Towards full-system discrete GPU simulation | **Mattew Porembra**, Alexandru Dutu, Gaurav Jain, Pouya Fotouhi, Michael Boyer, Bradford M. Beckmann|[Link](https://youtu.be/o1gF2LXNQFQ)|-|
+|1720 -- 1730 | gem5art: Zen and the Art of gem5 Experiments | **Ayaz Akram**, Mahyar Samani, Hoa Nguyen, Krithiga Murugavel, Trivikram Reddy, Marjan Fariborz, Pouya Fotouhi, Jason Lowe-Power|[Link](https://youtu.be/x2GQa26xwzs)|-|
+|1730 -- 1740 | Modeling Modern GPU Applications in gem5 | Kyle Roarty, **Matthew Sinclair** |[Link](https://youtu.be/HhLiMrjqCvA)|[Link](/2020/05/27/modern-gpu-applications.html)|
+|1740 -- 1750 | Implementation of a flexible cache coherency protocol for the Ruby memory system | **Tiago Muck**, Pedro Benedicte|[Link](https://youtu.be/OOEqCZekJbA)|[Link](/2020/05/29/flexible-cache.html)|
+|1750 -- 1800 | Workshop wrap-up | **Jason Lowe-Power** | N/A | N/A |
 
 ### Workshop schedule **June 4th**
 
diff --git a/_posts/2020-05-29-flexible-cache.md b/_posts/2020-05-29-flexible-cache.md
new file mode 100644
index 0000000..715b922
--- /dev/null
+++ b/_posts/2020-05-29-flexible-cache.md
@@ -0,0 +1,36 @@
+---
+layout: post
+title:  "A flexible cache coherency protocol for the Ruby memory system"
+author: Tiago Mück
+date:   2020-05-29
+---
+
+Gem5's Ruby memory subsystem provides flexible on-chip network models and
+multiple cache coherency protocols modeled in detail. However, simple
+experiments are sometimes difficult to pull off. For instance, modifying an
+existing configuration by just adding another shared cache level requires
+either:
+
+1. switching to an entirely new protocol that models the desired cache hierarchy;
+2. or modify an existing protocol;
+
+While (1) is not always an option, (2) is a non-trivial task since Ruby
+protocols can be very complex and hard to debug. This creates a major
+flexibility gap between gem5 "classic" memory sub-system and Ruby.
+
+# New protocol implementation
+
+We are working on a new protocol implementation that aims at addressing this
+configurability limitation. Our new protocol provides a single cache controller
+that can be reused at multiple levels of the cache hierarchy and configured to
+model multiple instances of MESI and MOESI cache coherency protocols. This
+implementation is based of [Arm's AMBA 5 CHI specification](
+https://static.docs.arm.com/ihi0050/d/IHI0050D_amba_5_chi_architecture_spec.pdf)
+and provides a scalable framework for the design space exploration of large SoC
+designs.
+
+# Presentation
+
+To known more please take a look at our workshop presentation:
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/OOEqCZekJbA" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
diff --git a/_posts/2020-05-29-gem5-gui.md b/_posts/2020-05-29-gem5-gui.md
new file mode 100644
index 0000000..0805bd3
--- /dev/null
+++ b/_posts/2020-05-29-gem5-gui.md
@@ -0,0 +1,125 @@
+---
+layout: post
+title:  "gem5 GUI"
+author:  Shivam Desai, Rohit Dhamankar, Ravishdeep Singh, Ahmed Farooqui, Jason Lowe-Power, and Bobby R. Bruce
+date:   2020-05-29
+---
+
+## Overview
+
+### Background and Motivation
+
+The current state of gem5 requires the development of python scripts to define and generate different architectures. This can be tedious to develop since the user flow can result in less visual thinking. We want a tool to have a better user flow to allow developers to create and tweak these models in a visual manner, much like Chisel. To make gem5 more accessible and allow new users to utilize all of its capabilities, we created a user interface that will allow such ease of use and functionality.
+
+
+### Description
+
+We developed a user interface that allows users to search for simobjects in the left-hand catalog, place them in the canvas, and move them around to create an architectural hierarchy that can be instantiated and simulated. Selecting an object allows a user to modify parameters in the attribute table on the bottom left. Placing objects inside other objects establishes a parent-child relationship. Drawing wires using the wire tool allows for port connections between objects.
+
+### Approach
+
+The image below was our initial diagramming of the basic structure of the GUI, as well as a very high-level overview of interaction with our &quot;back-end&quot;, which in this case was the gem5 repository. Our GUI had a distinct front-end and back-end, which are linked by a State class and the SymObject class.
+
+ ![](https://lh3.googleusercontent.com/wjVauYFnztL0aIxrHWjf-dgybE87O4_nTb2dcB3mOpZezpZfnenHZ8csDD0EOwaGaCWd_c1Ysb6HSWvdz-mbfKwMAkVXUMrjLwsyyg4A2aR-Pl3OSn_T2r-zHbBRMiNR1s6pEdnF)
+
+### Technical Specifications
+
+We decided to develop the GUI using just Python. So the natural choice was to use the Python binding for **QT**, one of the most popular GUI development libraries. We chose **PySide2** over PyQt5 since we would need a commercial license to release code under PyQt5.
+
+
+## Installation and Setup
+
+### 1. Prerequisites
+
+gem5 requires the Linux operating system to run, so the GUI does not support cross-platform development. You need to have a compiled gem5 installation on your machine as well. Visit [gem5 download](http://www.gem5.org/documentation/general_docs/building) for instructions on how to setup gem5.
+
+### 2. Basic Setup
+
+To begin the setup process, clone the [repository](https://github.com/afarooqui98/gem5_GUI) directly into the gem5 directory. Once complete, enter into the gem5\_GUI directory and download the dependencies using:
+
+```pip3 install -r requirements.txt```
+
+### 3. Running the application
+
+Once the dependencies are installed, users can run the GUI with the command:
+
+```<gem5.opt path> gui.py```
+
+See the README.md file in the repository for help with setup and running issues.
+
+## Features and Functionality
+
+### GUI Overview
+
+Attached below is a view of the GUI on successful launch:
+
+**![](https://lh5.googleusercontent.com/CFVJ2WTCP-fm_eeWc1hU_a3kHUz9TNeFe7y2UpBRp8JfrFWwZfPxxWh_QshoAuESh_zDqXyp6_G2bf4PeKuDuI4COeeS1KNhSxXhTHBsy2ZYZX60d6R73K4BQ3e6vQ5xgj8yhwSs)**
+
+On the left side lies the **catalog view** as well as the **attribute view**.
+
+The former is used to select a SimObject, and the latter will be used to configure a selected SimObject. The majority of the screen is populated by the **canvas**. This is where most of the user interaction will occur, and where users will build their system. The menu bar contains multiple convenience functions typical to GUI software, from copy-pasting to file saving, but there are also tabs for **debugging** , **running** , and **importing**. These are key functions of the GUI that work in tandem with gem5 to provide the users with the ability to check their system configuration, import both UI objects and configured subclasses, and instantiate their systems. Underneath the top menu is a button that allows the user to draw ports between objects
+
+### Catalog View
+**![](https://lh5.googleusercontent.com/iv-iXWbl-zvDkwHlkJ9Adlp4xjj-vP9g_kb4yZYRMtSTrtOUnrlsVTdY73JieBOCWHBDno7JHm0YxuohtawUyQ5tb1EjewX45XU6Q5Z8NOC8WoIYGeZECXX4tcqR5dfbEmMt7Hp6 )**
+
+The catalog holds all the available SimObjects. Users can maneuver the tree view by a specific category or search for an object at the top search bar. Double clicking an object places what we call a SymObject on the canvas. This is a GUI representation of an m5 SimObject that allows a user to interact with it in a tactile way.
+
+### Attribute Table
+
+**![](https://lh3.googleusercontent.com/gxPj5FWqqfpwkIxH--c_LYTa0eCnfYEqxaqX2iR7ZFf9UwyQQSWjLWBjjDDfbSJrE-0oWVk9rkpOMOZnRdcNzUXfyP8h1144lTWUn-Hgt96BeBitZqRTqXr8Bv8A6RCwkzb7kOdt)**
+
+Selecting an object brings up its attribute table. This table lists the object name, every child object, as well as all the object parameters. The parameter fields are modifiable, while the &quot;Name&quot; and &quot;Child Objects&quot; fields are only viewable. Hovering over the parameter name gives a description of the parameter, hovering over the value shows the type. Attributes for the specific object may also be searched for, aside from the &quot;Name&quot; and &quot;Child Objects&quot; parameters, which always lie at the top of the table.
+
+### Wiring
+
+To enable wire drawing, click the wire icon between the menu and the catalog. While in wire drawing mode, objects cannot be interacted with. Clicking this button changes the cursor to a crosshair, allowing the user to connect ports with wires. Failing to connect two ports or connecting incompatible ports will result in an error message. Right-clicking a wire brings up a context menu, which will allow for deletion and inspection (printing information about the end connections).
+
+**![](https://lh5.googleusercontent.com/k3X4PbsV-p_0oNeMGzexuSvBhwoxifQ28G0GGwRPh3QdDB7Q_zl1dCq-dSx7yF7OOA5lsIbB2maPyrQl_yaHlal2H-QIfMKeph4FpgnbwPfTdk0qWnVR9CFmdGq7VeEDV1wT5I9Q)**
+
+### Context
+
+An important part of understanding the gem5 GUI is the way user context works. Whenever an object is created or selected, it is set as the current selected SymObject. Objects that are selected are typically highlighted green unless there are required attributes that need to be set by the user; then, the object is red. Any time an object is selected, the user can move it around freely and resize it in the canvas, and its attributes are populated in the attribute table. Finally, it&#39;s important to note that wiring _is not_ dependent on the current context, so any ports for any objects may be connected to others regardless of whether they are in context.
+
+### Menu Overview
+
+The menu contains tools to interact with the GUI. Most of these correspond to self-explanatory standard window functionality, and all options in the menu correspond to a keyboard shortcut.
+
+### Run
+
+**![](https://lh3.googleusercontent.com/w7zPXdpEmfvKHBrRCVVazTPXdZGHD4JeemIjvkwXMbomtK5lTdlMmlwplL3d6lF66SYRkinzCPXO1FbHSE4Ou-RjZbbX17yxBO1zkqwt6NBYw23eF7eRHQUiYMHP_WxubpwfzqVR)**
+
+The run tab contains the instantiate and simulate option. Note that it is greyed out until a user drops in at least a root object to prevent the user from instantiating without a root. Once the instantiate button is pressed, the user _must_ save the file (since objects cannot be modified once instantiated), after which the results of instantiation are displayed on the command line. Once instantiation is executed, the user can interact with the simulate button, which will again show output on the command line.
+
+### Debug
+**![](https://lh4.googleusercontent.com/Plh1yAE1Sx3wgVY3w-fHRgmEh1ZyY2uCe0O2SX1984lYe4kgiUJH-C6_2aLZngWX0-eraXf8m--xC--ouErySfQJFGbAkLe-TuzNa3O5QKRf6F-UAThoJ5oyWi0KRsgLLH6LQVU2)**
+
+By pressing the Debug button on the toolbar, the debug window will appear on the right side of the GUI. There will be two checkboxes: &quot;Log to File&quot; and &quot;Log to Stdout,&quot; with the former being set automatically. Logging to file will send debug and error messages to a file which can be renamed at any time in the text below. Logging to stdout will print these messages in the Terminal application running the GUI. Below these options is a table full of debug flags that are native to the gem5 system. These flags can be set and unset, and result in gem5 debug messages related to the flag set. They are also searchable by an accompanying search bar.
+
+### Import and Export UI Objects
+
+**![](https://lh3.googleusercontent.com/siEBwmcd6tGS7QHymwpDtJK7Is0zEQFH30jnCnhSTcjVKfo7rD3oPJskbw_Cty_lu05ifpWIkkwO3wKJqMoDKqtL7XqrEgVqwAXp9X57DmRIjZqkMRpErWt1kLeJYvXZ9Qn2YftK)**
+
+Under file, we have import and export UI object, which allow users to save and load clusters of SymObjects. Exporting saves the configuration as a .obj file, which has a JSON format. Importing places the custom object in the catalog, allowing for the same access methods as regular SimObjects.
+
+### Import SimObject
+
+**![](https://lh6.googleusercontent.com/4ZZ0gDvbqz6o9YIB4-pHTZUBB2Cpxwd7CnC2szd_-dy-fO88NMUCzU1I6YmRTR_oSYgk5wgHl8gce3AAXd3RF7iRJcIZ9mU-qgJyBx13WN-7Ploe03yXLUQHQBnSpfUOaNo-AJoB)**
+
+Users can import custom SimObjects, which are python classes inherited from base SimObjects. These imports appear under the catalog and can be similarly added to the GUI just like regular objects. Imported SimObjects are saved as part of a .ui file, so you do not need to import again when opening a .ui file with imported SimObjects.
+
+## Future
+
+Over the development timeline, we faced roadblocks and came up with new ideas, so we were not able to accomplish everything we initially planned to do. Future development can address these features, as well as others that may be desired:
+
+- Export to multiple file formats (current instantiation generates config.json)
+- Visualization of simulation results, comparison to other simulations
+- Parameterize objects to allow for running multiple simulations in parallel and comparing results / identifying optimal parameter values
+
+Although we did our best to address bugs we came across during development and user testing, the fact of the matter is a sandbox application with few restrictions on the user such as this GUI will yield numerous bugs through unforeseen usage. If you come across a bug let us know by opening an [issue](https://github.com/afarooqui98/gem5_GUI/issues), submitting a [pull request](https://github.com/afarooqui98/gem5_GUI/pulls), or contacting us or the gem5 team directly. Please try to document the steps resulting in a fault, as well as including a screenshot of the terminal.
+
+## Workshop Presentation
+
+<iframe width="560" height="315"
+src="https://www.youtube.com/embed/ab0ZUSTkYEk" frameborder="0"
+allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
+allowfullscreen></iframe>
diff --git a/_posts/2020-05-29-heterogarnet.md b/_posts/2020-05-29-heterogarnet.md
new file mode 100644
index 0000000..80ff5da
--- /dev/null
+++ b/_posts/2020-05-29-heterogarnet.md
@@ -0,0 +1,16 @@
+---
+layout: post
+title:  "HeteroGarnet - A Detailed Simulator for Diverse Interconnect Systems"
+author: Srikant Bharadqaj, Jieming Ying, Bradford Beckmann, and Tushar Krishna
+date:   2020-05-27
+---
+
+Networks-on-Chips (NoCs) have become inevitably more complex with the increased heterogeneity of Systems-On-Chip (SoCs). Recent advances in die-stacking and 2.5D chip integration introduce in-package network heterogeneities that can complicate the interconnect design. Detailed modeling of such complex systems necessitates accurate modeling of their characteristics. Unfortunately, NoC simulators today lack the flexibility and features required to model these diverse interconnects.
+
+We present HeteroGarnet, that improves upon the widely-popular Garnet 2.0 network model by enabling accurate simulation of emerging interconnect systems. Specifically, HeteroGarnet adds support for clock-domain islands, network crossings supporting multiple frequency domains, and network interface controllers capable of attaching to multiple physical links. It also supports variable bandwidth links and routers by introducing a new configurable Serializer-Deserializer component. Our recent work using HeteroGarnet [1] shows how accurate interconnect modeling can lead to better network designs. In this presentation, we will introduce HeteroGarnet and its benefits for modeling modern heterogeneous systems. HeteroGarnet is planned to be integrated into the gem5 repository and will be identified as Garnet 3.0.
+
+# Workshop Presentation
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/AH9r44r2lHA"
+frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope;
+picture-in-picture" allowfullscreen></iframe>
diff --git a/_posts/2020-05-29-memory-controller.md b/_posts/2020-05-29-memory-controller.md
new file mode 100644
index 0000000..6c5217e
--- /dev/null
+++ b/_posts/2020-05-29-memory-controller.md
@@ -0,0 +1,58 @@
+---
+layout: post
+title:  "Memory controller updates for new DRAM technologies, NVM interfaces and flexible memory topologies"
+author: Wendy Elsasser and Nikos Nikoleris
+date:   2020-05-27
+---
+
+## Adding LPDDR5 support to DRAMCtrl
+
+LPDDR5 is currently in mass production for use in multiple markets including mobile, automotive, AI, and 5G. This technology is expected to become the mainstream Flagship Low-Power DRAM by 2021 with anticipated longevity due to proposed speed grade extensions. The specification defines a flexible architecture and multiple options to optimize across different use cases, trading off power, performance, reliability and complexity.  To evaluate these tradeoffs, the gem5 model has been updated with LPDDR5 configurations and architecture support.
+
+LPDDR5 is mostly an evolutionary uptick from LPDDR4 with 3 key motivations: flexibility, performance, and power. The specification offers a multitude of options to enable varied use-cases with a user programmable bank architecture and new lower power features to balance power and performance tradeoffs. Similar to previous generations, LPDDR5 increases the data rates and the current version of the specification supports data-rates up to 6.4Gbps (giga-bits per second) for a maximum I/O bandwidth of 12.8GB/s (giga-bytes per second) with a 16-bit channel. A new clocking architecture is defined leveraging concepts from other technologies like GDDR, but with a low-power twist. With the new clocking architecture, commands are transferred at a lower frequency with some commands requiring multiple clock cycles. The new clocking architecture also includes the additional requirement of data clock synchronization, potentially done dynamically as bursts issue. Due to these changes, additional considerations are required to ensure adequate command bandwidth in some high-speed scenarios. These new LPDDR5 features require new checks and optimizations in gem5 to ensure the model integrity when comparing to real hardware.
+
+Support for multi-cycle commands and lower frequency command transfer motivated a new check in gem5 to verify command bandwidth. The DRAM controller historically did not verify contention on the command bus and assumed unlimited command bandwidth. With the evolution of new technologies this assumption is not always valid. One potential solution is to align all commands to a clock boundary and ensure that two commands are not issued simultaneously. Given that the gem5 model is not a cycle accurate model, this solution was deemed overly complicated. Alternatively, a rolling window has been defined and the model calculates the maximum number of commands that can issue within that window. Prior to issuing a command, the model will verify that the window in which the command will issue still has slots available. If the slots are full, the command will be shifted to the next window. This will be done until a window with a free command slot is found. The window is currently defined by the time required to transfer a burst, which is typically defined by the tBURST parameter.
+
+At higher data rates, the ability to transfer a burst seamlessly depends on the bank architecture in LPDDR5. When configured using a bank group architecture, which defines a total of 16 banks split across 4 bank groups, a burst of 32 cannot be transferred seamlessly. The data instead will be transferred with gaps in the middle of the burst. Essentially half the burst will be transferred in 2 cycles, followed by a 2-cycle gap, with the second half of the burst transferred after the gap. To mitigate the effect on data bus utilization and IO bandwidth, LPDDR5 supports interleaved bursts. The gem5 model has also been updated to support burst interleaving and with these changes, the model is able to achieve high data bus utilization as expected (and in many cases required).
+
+All of these changes will be discussed in the gem5 workshop. In the workshop, we will review LPDDR5 requirements and detail the changes made in gem5. While these changes have been incorporated specifically for LPDDR5, some of them are also applicable to other memory technologies. I look forward to the discussion in the workshop!
+
+### Workshop Presentation
+
+<iframe width="560" height="315"
+src="https://www.youtube.com/embed/ttJ9_I_Avyc" frameborder="0"
+allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
+allowfullscreen></iframe>
+
+## Refactoring the DRAMCtrl and creating an initial NVM interface
+
+The gem5 DRAM controller provides the interface to external, user addressable memory, which is traditionally DRAM. The controller consists of 2 main components: the memory controller and the DRAM interface. The memory controller includes the port connecting to the on-chip fabric. It receives command packets from the fabric, enqueues them into the read and write queues and manages the command scheduling algorithm for read and write requests. The DRAM interface contains media specific information that defines the architecture and timing parameters of the DRAM, and manages the media specific operations like activation, precharge, refresh and low power modes.
+
+With the advent of SCM (storage class memory), emerging NVM could also exist on a memory interface, potentially alongside DRAM. NVM support could simply be layered on top of the existing DRAM controller with the changes integrated into the current DRAM interface. However, with a more systematic approach, the model could be modified to provide a mechanism that enables easier integration of new interfaces to support future memory technologies. To do this, the memory controller has been refactored. Instead of a single DRAM controller (DRAMCtrl) object, two objects have been defined: DRAMCtrl and DRAMInterface. Memory configurations are now defined as a DRAM interface and the DRAM specific parameters and functions have been moved from the controller to the interface. This includes the DRAM architecture, timing and IDD parameters. To connect the two objects, a new parameter has been defined in the DRAM controller Python object. This parameter, ‘dram’, is a pointer to the DRAM interface.
+
+```
+    # Interface to volatile, DRAM media
+    dram = Param.DRAMInterface(NULL, "DRAM interface")
+```
+
+Functions specific to DRAM opcodes have also been pulled out of the controller and moved to the interface. For example, the Rank class and associated functions are now defined within the interface. The DRAM interface is defined as an AbstractMemory, enabling an address range to be defined for the actual media interface instead of the controller. With this change, the controller has been modified to be a ClockedObject.
+
+Now, the DRAM controller is a generic memory controller and non-DRAM interfaces can be defined and easily connected. In that regard, an initial NVM interface, NVMInterface, has been defined, which mimics the behavior of NVDIMM-P. Similar to the DRAM interface, the NVM interface is defined as an AbstractMemory, with an address range defined for the interface. A new parameter, ‘nvm’, has been defined in Python to connects the controller to the NVM interface when configured.
+
+```
+    # Interface to non-volatile media
+    nvm = Param.NVMInterface(NULL, "NVM interface")
+```
+
+The NVM interface is media agnostic and simply defines read and write operations. The intent of the interface is to support a wide variety of media types, many less performant than DRAM. While DRAM is accessed with deterministic timing, internal operations within the NVM could create longer tail latency distributions requiring non-deterministic delays. To manage non-determinism, the reads have been split into 2 stages: Read Request and Data Burst. The first stage, the Read Request simply issues a read command and schedules a ReadReady event. The event will be triggered when the read completes and data is available. At that time, the NVM interface will trigger a controller event to issue a data burst.
+
+While the write latency and write bandwidth of emerging NVM is typically magnitudes faster than FLASH, for many technologies it is not yet on par with DRAM. To mitigate the longer write delay and lower bandwidth, the NVM interface in gem5 models a near NVM write buffer. This buffer offloads write commands and data from the memory controller and provides push-back when full, inhibiting further write command from issuing until an entry is popped. The entries are popped when the write completes, using parameters defined in the NVM Interface.
+
+After refactoring the controller and creating unique DRAM and NVM interfaces, a variety of potential memory sub-system topologies are possible in gem5. A system can incorporate NVM and DRAM on a single channel or have dedicated channels defined per media. Configurations can be defined to provide a multitude of scenarios for NVM+DRAM simulations to analyze the tradeoffs of new memory technologies and methods to optimize future memory subsystems.
+
+### Workshop Presentation
+
+<iframe width="560" height="315"
+src="https://www.youtube.com/embed/t2PRoZPwwpk" frameborder="0"
+allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
+allowfullscreen></iframe>
diff --git a/_posts/2020-05-29-modern-gpu-applications.md b/_posts/2020-05-29-modern-gpu-applications.md
new file mode 100644
index 0000000..c0af9e0
--- /dev/null
+++ b/_posts/2020-05-29-modern-gpu-applications.md
@@ -0,0 +1,66 @@
+---
+layout: post
+title:  "Modeling Modern GPU Applications in gem5"
+author: Kyle Roarty and Matthew D. Sinclair
+date:   2020-05-27
+---
+
+In 2018, AMD added support for an updated gem5 GPU model based on their GCN3 architecture. Having a high-fidelity GPU model allows for more accurate research into optimizing modern GPU applications.  However, the complexity of getting the necessary libraries and drivers, needed for this model to run GPU applications in gem5, made it difficult to use.  This post describes the work we have done with increasing the usability of the GPU model by simplifying the setup process, extending the types of applications that can be run, and optimizing parts of the software stack used by the GPU model.
+
+### Running the GPU model
+
+To provide accurate, high fidelity simulation, the AMD GPU model directly interfaces with the Radeon Open compute platform (ROCm) driver.  Although gem5 can simulate the entire system (full system mode, or FS mode), including devices and an operating system, currently the AMD GPU model uses the syscall emulation (SE) mode.  SE mode only simulates user-space execution and provides system services (e.g., malloc) in the simulator instead of executing kernel-space code.  As a result, the only portion of the ROCm software stack that must be emulated is the KFD (Kernel Fusion Driver).  Thus, in order to use the AMD GPU model, the user must first install ROCm on their machine.
+
+This presents a challenge, because gem5's GPU model supports a specific version of ROCm (version 1.6) and getting the drivers installed and interacting properly with gem5 is difficult.  Moreover, to run modern applications such as machine learning (ML), also referred to as machine intelligence (MI), applications, additional libraries (e.g., MIOpen, MIOpenGEMM, rocBLAS, and hipBLAS) need to be installed.  However, the versions of those libraries must be compatible with ROCm version 1.6.  Overall, figuring out the exact software versions and installing them is time consuming, error-prone, and creates a barrier to entry that discourages users from using the GPU model.
+
+To help address this issue, we have created and validated a Docker image that contains the proper software and libraries needed to run the GPU model in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model.   This Docker container has been integrated in the public gem5 repository, and we intend to use the image for continuous integration on the GPU model.  Furthermore, since the AMD GPU model currently models a tightly-coupled CPU-GPU system with a unified address space and coherent caches, this Docker also includes the changes necessary to HIP and MIOpen to remove discrete GPU copies in these libraries wherever possible.
+
+### Using the Docker image
+
+The Dockerfile and an associated README are located at `util/dockerfiles/gcn-gpu`. This documentation can also be found at the [GCN3](/documentation/general_docs/gpu_models/GCN3) page of the gem5 website. Finally, we have also created a video demonstration of using the Docker in our gem5 workshop presentation.  Next, we briefly summarize how to use the docker image.
+
+#### Building the image
+
+```
+cd util/dockerfiles/gcn-gpu
+docker build -t <image_name> .
+```
+
+#### Running commands using the image
+
+```
+docker run --rm [-v /absolute/path/to/directory:/mapped/location -v...] [-w /working/directory] <image_name> [command]
+```
+
+* `--rm` removes the container after running (recommended, as containers are meant to be single-use)
+* `-v` takes an absolute path from the local machine, and places it at the mapped location in the container
+* `-w` sets the working directory of the container, where the passed in command is executed
+
+To build gem5 in a container, the following command could be used: (Assuming the image is built as gem5-gcn)
+
+```
+docker run --rm -v /path/to/gem5:/gem5 -w /gem5 gem5-gcn scons -sQ -j$(nproc) build/GCN3_X86/gem5.opt
+```
+
+### Optimizing the software stack for MI workloads
+
+Creating the Docker image makes it easy to run HIP applications in gem5.  However, running modern applications such as MI applications is more complex and required additional changes.  Largely, these issues stemmed from the MI libraries utilizing features that were not designed with simulation in mind.
+
+MIOpen is an open-source MI library designed to execute on AMD GPUs.  MIOpen has HIP and OpenCL backends and implements optimized assembly kernels for many common DNN algorithms.  It chooses which of these backends to use at compile time.  Then, at runtime, MIOpen will use the appropriate backend to execute a given MI application on an AMD GPU.  Although this support works well for real GPUs, simulating which backend to use, which GPU kernel to run, and the configuration of the data it's looking to operate on is time consuming and not part of the region of interest for simulation.  
+
+For example, MIOpen calls the backend to search for an appropriate kernel which is optimized for the given parameters.  On real hardware, this process runs multiple different kernel options, then picking the fastest one and compiling it using clang-ocl.  As part of this process, MIOpen caches the kernel binary locally, for subsequent uses of the same kernel.  Since online compilation is computationally intensive and currently unsupported in gem5, we bypass online kernel compilation in gem5 by running the applications on a real GPU beforehand to obtain MIOpen’s cached kernel binaries.  Alternatively, if an AMD GPU is not available, it is also possible to compile the necessary kernels on the command line with clang-ocl.
+
+Moreover, GEMM kernels are extremely common in MI applications.  For these kernels, MIOpen uses MIOpenGEMM to identify and create the best kernel for the parameters of the inputted matrices.  Unfortunately, MIOpenGEMM does this by dynamically creating a database of possible GEMM kernels and then selecting the kernel that best matches the application’s matrices.  Since this happens dynamically, every time a program is run, it is difficult to bypass this process.  Thus, to avoid the overhead of simulating this process, we backported support from newer versions of ROCm that allowed MIOpen to use rocBLAS instead of MIOpenGEMM.  Using rocBLAS instead of MIOpenGEMM removes the repeated, dynamic database creation from the critical path in simulation, since rocBLAS generates the database of optimal solutions on installation.
+
+Overall, these changes avoid simulating work that is not part of the application’s region of interest and enabled us to simulate a number of native MI applications in gem5.
+
+### What's next?
+
+Our work has increased the usability of the gem5 GPU model, and shown how to run a variety of GPU applications, including native MI applications, in gem5.  As mentioned above, we are currently in the process of integrating the Docker into the develop branch of gem5, to enable continuous integration testing on future GPU commits.  Moving forward, we hope that this work can serve as a springboard to running high-level frameworks such as Caffe, TensorFlow, and PyTorch in the simulator.  However, since high-level frameworks have large models and significant runtimes, to make simulating those easier to use, we plan on extending checkpointing support to include the GPU model, allowing us to focus on simulating potential regions of interest.
+
+# Workshop Presentation
+
+<iframe width="560" height="315"
+src="https://www.youtube.com/embed/HhLiMrjqCvA" frameborder="0"
+allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
+allowfullscreen></iframe>
diff --git a/_posts/2020-05-29-modular-and-secure.md b/_posts/2020-05-29-modular-and-secure.md
new file mode 100644
index 0000000..90853a2
--- /dev/null
+++ b/_posts/2020-05-29-modular-and-secure.md
@@ -0,0 +1,69 @@
+---

+layout: post

+title:  "A Modular and Secure System Architecture for the IoT"

+author: Nils Asmussen, Hermann Härtig, and Gerhard Fettweis

+date:   2020-05-29

+---

+

+Introduction

+------------

+

+The "Internet of Things (IoT)" is already pervasive in industrial production and it is expected to become ubiquitous in many other sectors, too. For example, such connected devices have great potential to better automate and optimize critical infrastructure such as electrical grids and transportation networks and are also promising for health care applications. However, a one-size-fits-all solution for the compute hardware and system software of all these devices is infeasible, primarily due to cost pressure and energy constraints, but also because each domain requires different compute capacity, sensors, and actuators. Instead, customized solutions are needed for both the hardware and the software that drives it. System designers should be able to easily assemble these specialized computers and their operating systems (OSes) from reusable building blocks, requiring modularity at both the hardware and the OS level.

+

+Besides modularity, security is essential due to the interaction of IoT devices with the physical world and their attachment to the Internet, enabling attackers to cause harm to the environment or humans. For that reason, using encypted communication is not sufficient, but the IoT devices themselves need to be secured as well. At the software level, the high complexity and missing isolation between subsystems in monolithic OSes render them an inappropriate choice for this security-critical use case. Instead, microkernel-based systems such as L4 [1] are a promising candidate due to their modular architecture and strong isolation between subsystems. In fact, it has been shown that a microkernel-based system could have at least reduced the severity of 96% of Linux's critical CVEs by restricting the impact to a single subsystem and could have completely eliminated 40% of the CVEs [2].

+

+We argue that the ideas of microkernel-based systems to split up the system into multiple isolated components can and should also be applied to hardware. For example, system-on-a-chip designers often buy hardware components (IP blocks) from third-party vendors. However, IP blocks such as modems or accelerators can be complex and should therefore not be trusted. Furthermore, the recently found side-channel attacks on modern general-purpose cores such as Meltdown [3], Spectre [4], and ZombieLoad [5] have raised the question whether we should still trust these complex cores to properly enforce isolation boundaries between different software components. Thus, we believe that hardware components such as modems, accelerators, and cores should be strongly separated just as software components are in microkernel-based systems.

+

+System Architecture

+-------------------

+<p align="center">

+  <img src="{{site.url}}/assets/img/blog/modular-and-secure-fig-1a.png"/>

+  <br>

+  <b>Figure 1a</b>

+

+  <br>

+  <br>

+

+  <img src="{{site.url}}/assets/img/blog/modular-and-secure-fig-1b.png"/>

+  <br>

+  <b>Figure1b</b>

+</p>

+

+

+

+Our system architecture [6], depicted in Figure 1a, builds upon tiled architectures, which already allow to integrate hardware components in a modular way into separate tiles. However, although the tiles are physically separate, typically they still have unrestricted access to the network-on-chip (NoC) that connects the tiles. We are proposing to add a new and simple hardware component between each tile and the NoC that restricts the tile's access to the NoC. This hardware component is called trusted communication unit (TCU). Besides isolating the tiles from each other, the TCU allows to establish and use communication channels between tiles.

+

+The operating system, called M³ and shown in Figure 1b, is designed as a microkernel-based system and leverages the TCU to isolate hardware and software components, while selectively allowing their communication. The kernel of M³ runs exclusively on a dedicated *kernel tile*, whereas services and applications run on the *user tiles*. The kernel as the only privileged component in the system is the only one that can establish communication channels between tiles. User tiles can afterwards use the established channels to directly communicate with other user tiles without involving the kernel. However, user tiles cannot change or add new channels. Due to the physical separation between tiles, M³ has no specific requirements on the tiles such as user/kernel mode or memory management units. For that reason, not only compute cores, but arbitrary hardware components such as modems, accelerators, or devices can be integrated as a user tile and the kernel can control their communication permissions in a uniform way.

+

+Simulation with gem5

+--------------------

+

+Besides working on a FPGA-based implementation, we are prototyping the system architecture in gem5 to evaluate its feasibility. To simulate the system architecture we represent each tile as a `System` object and connect the tiles with a `NoncoherentXBar`. The `System` object implements a custom loader for M³ to load the kernel and other components onto the individual tiles. In our simulation we use x86, ARMv7, and RISC-V. However, our hardware implementation will use RISC-V cores due to their simplicity and openness. Since gem5 had only support for system emulation with RISC-V, we contributed full-system support for RISC-V to gem5, which enables us to run our OS and make use of virtual memory.

+

+Conclusion

+----------

+

+IoT devices that interface with the physical world and the Internet require both security and modularity. We are investigating a new system architecture that takes the ideas from microkernel-based systems for software and apply them to hardware as well. The key idea is to build upon tiled architectures and add a new and simple hardware component called trusted communication unit to each tile for isolation and communication. The microkernel-based OS called M³ builds on top of this hardware platform and establishes communication channels between otherwise isolated tiles. We believe that modularity at both the hardware and software level and the strong isolation between components enables us to deliver a suitable foundation for future IoT devices.

+

+Bibliography

+------------

+

+[1] Hermann Hartig, Michael Hohmuth, Norman Feske, Christian Helmuth, Adam Lackorzynski, Frank Mehnert, and Michael Peter. The nizza secure-system architecture. In 2005 International Conference on Collaborative Computing: Networking, Applications and Worksharing, pages 10–pp. IEEE, 2005.

+

+[2] Simon Biggs, Damon Lee, and Gernot Heiser. The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security. Proceedings of the 9th Asia-Pacific Workshop on Systems. 2018.

+

+[3] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. Meltdown. CoRR, abs/1801.01207, 2018.

+

+[4] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. CoRR, abs/1801.01203, 2018.

+

+[5] Schwarz, Michael, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina, Thomas Prescher, and Daniel Gruss. ZombieLoad: Cross-privilege-boundary data sampling. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 753-768. 2019.

+

+[6] Nils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, and Gerhard Fettweis. M³: A hardware/operating-system co-design to tame heterogeneous manycores. In Proceedings of the wenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'16, pages 189–203. ACM, 2016.

+

+Workshop Presentation

+---------------------

+

+<iframe width="560" height="315"

+src="https://www.youtube.com/embed/2jPiXOhboko" frameborder="0"

+allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"

+allowfullscreen></iframe>

diff --git a/assets/img/blog/modular-and-secure-fig-1a.png b/assets/img/blog/modular-and-secure-fig-1a.png
new file mode 100644
index 0000000..d8cbbb2
--- /dev/null
+++ b/assets/img/blog/modular-and-secure-fig-1a.png
Binary files differ
diff --git a/assets/img/blog/modular-and-secure-fig-1b.png b/assets/img/blog/modular-and-secure-fig-1b.png
new file mode 100644
index 0000000..9bdbdeb
--- /dev/null
+++ b/assets/img/blog/modular-and-secure-fig-1b.png
Binary files differ