website: Merge branch 'stable' into develop

This is done periodically to keep develop up-to-date with improvements
on stable.

Change-Id: Ic2164dc91b14777e46c97fdc89806d75af5784aa
diff --git a/_data/documentation.yml b/_data/documentation.yml
index d07e906..b999970 100755
--- a/_data/documentation.yml
+++ b/_data/documentation.yml
@@ -34,6 +34,8 @@
           url: http://doxygen.gem5.org/release/v20-0-0-3/index.html
         - page: v20.1.0.0
           url: http://doxygen.gem5.org/release/v20-1-0-0/index.html
+        - page: v20.1.0.1
+          url: http://doxygen.gem5.org/release/v20-1-0-1/index.html
 
     - title: gem5 Resources
       id: gem5_resources
@@ -227,6 +229,10 @@
           url: /documentation/learning_gem5/part2/memoryobject
         - page: Creating a simple cache object
           url: /documentation/learning_gem5/part2/simplecache
+        - page: ARM Power Modelling
+          url: /documentation/learning_gem5/part2/arm_power_modelling
+        - page: ARM DVFS Support
+          url: /documentation/learning_gem5/part2/arm_dvfs_support
 
     - title: Modeling Cache Coherence with Ruby
       id: part3
diff --git a/_includes/head.html b/_includes/head.html
index d043276..495697a 100755
--- a/_includes/head.html
+++ b/_includes/head.html
@@ -17,6 +17,8 @@
 		gem5{% if page.title And page.title != "" And page.title != nil%}: {{page.title}} {% endif %}
 	</title>
 
+    {% if page.canonical And page.canonical != "" And page.canonical != nil%}<link rel="canonical" href="{{page.canonical}}"/> {% endif %}
+
 	<!-- SITE FAVICON -->
 	<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
 	<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
diff --git a/_pages/documentation/general_docs/fullsystem/building_arm_kernel.md b/_pages/documentation/general_docs/fullsystem/building_arm_kernel.md
index 3c3dd70..72a1c96 100644
--- a/_pages/documentation/general_docs/fullsystem/building_arm_kernel.md
+++ b/_pages/documentation/general_docs/fullsystem/building_arm_kernel.md
@@ -10,7 +10,8 @@
 
 This page contains instructions for building up-to-date kernels for gem5 running on ARM. 
 
-If you don't want to build the Kernel on your own you could still [download a prebuilt version](./guest_binaries/)
+If you don't want to build the Kernel on your own you could still [download a
+prebuilt version](./guest_binaries).
 
 ## Prerequisites
 These instructions are for running headless systems. That is a more "server" style system where there is no frame-buffer. The description has been created using the latest known-working tag in the repositories linked below, however the tables in each section list previous tags that are known to work. To built the kernels on an x86 host you'll need ARM cross compilers and the device tree compiler. If you're running a reasonably new version of Ubuntu or Debian you can get required software through apt:
@@ -19,11 +20,18 @@
 apt-get install  gcc-arm-linux-gnueabihf gcc-aarch64-linux-gnu device-tree-compiler
 ```
 
-If you can't use these pre-made compilers the next easiest way to obtain the required compilers from [Linaro](http://releases.linaro.org/latest/components/toolchain/binaries/). 
+If you can't use these pre-made compilers the next easiest way to obtain the
+required compilers from ARM:
+- [Cortex A cross-compilers](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a/downloads)
+- [Cortex RM cross-compilers](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
+
+Download (one of) these and make sure the binaries are on your `PATH`.
 
 Depending on the exact source of your cross compilers, the compiler names used below will required small changes.
 
-To actually run the kernel, you'll need to download or compile gem5's bootloader. See the (bootloaders)(#bootloaders) section in this documents for details.
+To actually run the kernel, you'll need to download or compile gem5's
+bootloader. See the [bootloaders](#bootloaders) section in this documents for
+details.
 
 ## Linux 4.x
 Newer gem5 kernels for ARM (v4.x and later) are based on the vanilla Linux kernel and typically have a small number of patches to make them work better with gem5. The patches are optional and you should be able to use a vanilla kernel as well. However, this requires you to configure the kernel yourself. Newer kernels all use the VExpress\_GEM5\_V1 gem5 platform for both AArch32 and AArch64. The required DTB files to describe the hardware to the OS ship with gem5. To build them, execute this command:
@@ -143,4 +151,5 @@
 make -C system/arm/bootloader/arm64
 ```
 
-Once you have compiled the binaries, put them in the binaries directory in your M5\_PATH.
+Once you have compiled the binaries, put them in the binaries directory in your
+`M5_PATH`.
diff --git a/_pages/documentation/general_docs/gem5_resources.md b/_pages/documentation/general_docs/gem5_resources.md
index 6a4131b..f3b4c01 100644
--- a/_pages/documentation/general_docs/gem5_resources.md
+++ b/_pages/documentation/general_docs/gem5_resources.md
@@ -32,21 +32,29 @@
 
 The gem5 resources are hosted on our Google Cloud Bucket. Listed below are the
 compiled resources presently available, as well as links to their sources in
-the gem5 resources repository. **These are resources sources for gem5 20.0**.
+the gem5 resources repository. **These are resources sources for gem5 20.1**.
 
 |Resource |Compiled/Built Resource |Source |
 |:--------|:-----------------------|:------|
-|asmtest | 351 test binaries, downloadable with `https://dist.gem5.org/dist/v20/test-progs/asmtest/bin/<binary>` | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/asmtest) |
-|riscv-tests | [dhryston.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/dhrystone.riscv), [median.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/median.riscv), [mm.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/mm.riscv), [mt-matmul.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/mt-matmul.riscv), [mt-vvadd.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/mt-vvadd.riscv), [multiply.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/multiply.riscv), [pmp.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/pmp.riscv), [qsort.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/qsort.riscv), [rsort.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/rsort.riscv), [spmv.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/spmv.riscv), [towers.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/towers.riscv), [vvadd.riscv](http://dist.gem5.org/dist/v20/test-progs/riscv-tests/vvadd.riscv) |[here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/riscv-tests) | 
-|insttests | [insttest-rv64a](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64a), [insttest-rv64c](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64c), [insttest-rv64d](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64d), [insttest-rv64f](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64f), [insttest-rv64i](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64i), [insttest-rv64m](http://dist.gem5.org/dist/v20/test-progs/insttest/bin/riscv/linux/insttest-rv64m) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/insttest) |
-|pthreads | --- | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/pthreads) |
-|square | --- | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/square) |
+|asmtest | 351 test binaries, downloadable with `https://dist.gem5.org/dist/v20-1/test-progs/asmtest/bin/<binary>` | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/asmtest) |
+|riscv-tests | [dhryston.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/dhrystone.riscv), [median.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/median.riscv), [mm.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/mm.riscv), [mt-matmul.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/mt-matmul.riscv), [mt-vvadd.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/mt-vvadd.riscv), [multiply.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/multiply.riscv), [pmp.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/pmp.riscv), [qsort.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/qsort.riscv), [rsort.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/rsort.riscv), [spmv.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/spmv.riscv), [towers.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/towers.riscv), [vvadd.riscv](http://dist.gem5.org/dist/v20-1/test-progs/riscv-tests/vvadd.riscv) |[here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/riscv-tests) |
+|insttests | [insttest-rv64a](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64a), [insttest-rv64c](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64c), [insttest-rv64d](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64d), [insttest-rv64f](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64f), [insttest-rv64i](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64i), [insttest-rv64m](http://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/riscv/linux/insttest-rv64m) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/insttest) |
+|simple/pthread (x86) | [test_pthread_create_seq](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_pthread_create_seq), [test_pthread_create_para](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_pthread_create_para), [test_pthread_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_pthread_mutex), [test_atomic](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_atomic), [test_pthread_cond](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_pthread_cond), [test_std_thread](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_std_thread), [test_std_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_std_mutex), [test_std_condition_variable](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/x86/test_std_condition_variable) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|simple/pthread (aarch32) | [test_pthread_create_seq](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_pthread_create_seq), [test_pthread_create_para](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_pthread_create_para), [test_pthread_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_pthread_mutex), [test_atomic](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_atomic), [test_pthread_cond](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_pthread_cond), [test_std_thread](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_std_thread), [test_std_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_std_mutex), [test_std_condition_variable](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch32/test_std_condition_variable) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|simple/pthread (aarch64) | [test_pthread_create_seq](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_pthread_create_seq), [test_pthread_create_para](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_pthread_create_para), [test_pthread_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_pthread_mutex), [test_atomic](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_atomic), [test_pthread_cond](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_pthread_cond), [test_std_thread](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_std_thread), [test_std_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_std_mutex), [test_std_condition_variable](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/aarch64/test_std_condition_variable) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|simple/pthread (riscv64) | [test_pthread_create_seq](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_pthread_create_seq), [test_pthread_create_para](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_pthread_create_para), [test_pthread_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_pthread_mutex), [test_atomic](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_atomic), [test_pthread_cond](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_pthread_cond), [test_std_thread](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_std_thread), [test_std_mutex](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_std_mutex), [test_std_condition_variable](http://dist.gem5.org/dist/v20-1/test-progs/pthreads/riscv64/test_std_condition_variable) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|simple/hello| [x86 executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/x86/linux/hello), [arm executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/arm/linux/hello), [mips executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/mips/linux/hello), [power executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/power/linux/hello), [riscv executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/riscv/linux/hello), [sparc executable](https://dist.gem5.org/dist/v20-1/test-progs/hello/bin/sparc/linux/hello) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|simple/m5_exit | [x86 executable](https://dist.gem5.org/dist/v20-1/test-progs/m5-exit/bin/x86/linux/m5_exit) | [here (Along with other 'simple' executables)](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/simple) |
+|insttest | [insttest](https://dist.gem5.org/dist/v20-1/test-progs/insttest/bin/sparc/linux/insttest)| [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/insttest) |
+|square | [square.o](https://dist.gem5.org/dist/v20-1/test-progs/square/square.o) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/square) |
 |spec-2006 | --- | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/spec-2017) |
 |spec-2017 | --- | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/spec-2006) |
-|gapbs | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20/images/x86/ubuntu-18-04/gapbs.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gapbs) |
-|parsec | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20/images/x86/ubuntu-18-04/parsec.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/parsec) |
-|npb | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20/images/x86/ubuntu-18-04/npb.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb) |
-|Linux boot-exit | [Disk Image (GZIPPED)](http://dist.gem5.org/v20/images/x86/ubuntu-18-04/boot-exit.img.gz) |[here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/boot-exit) |
+|gapbs | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20-1/images/x86/ubuntu-18-04/gapbs.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/gapbs) |
+|parsec | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20-1/images/x86/ubuntu-18-04/parsec.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/parsec) |
+|npb | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20-1/images/x86/ubuntu-18-04/npb.img.gz) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/npb) |
+|Linux boot-exit | [Disk Image (GZIPPED)](http://dist.gem5.org/dist/v20-1/images/x86/ubuntu-18-04/boot-exit.img.gz) |[here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/boot-exit) |
+|hack-back| --- | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/hack-back) |
+|linux kernels | [v4.4.186](http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.4.186), [v4.9.186](http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.9.186), [v4.14.134](http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.14.134), [v4.19.83](http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-4.19.83), [v5.4.49](http://dist.gem5.org/dist/v20-1/kernels/x86/static/vmlinux-5.4.49) | [here](https://gem5.googlesource.com/public/gem5-resources/+/refs/heads/stable/src/linux-kernel/) |
 
 ## How do I obtain the gem5 resource sources?
 
diff --git a/_pages/documentation/learning_gem5/part1/part1_6_extending_configs.md b/_pages/documentation/learning_gem5/part1/part1_6_extending_configs.md
index 73223a2..ae4117a 100644
--- a/_pages/documentation/learning_gem5/part1/part1_6_extending_configs.md
+++ b/_pages/documentation/learning_gem5/part1/part1_6_extending_configs.md
@@ -1,17 +1,17 @@
 ---
 layout: documentation
-title: Extending gem5 to run ARM binaries 
+title: Extending gem5 to run ARM binaries
 doc: Learning gem5
 parent: part1
 permalink: /documentation/learning_gem5/part1/extending_configs
-author: Julian T. Angeles 
+author: Julian T. Angeles, Thomas E. Hansen
 ---
 
 Extending gem5 for ARM
 ======================
 
 This chapter assumes you've already built a basic x86 system with
-gem5 and created a simple configuration script. 
+gem5 and created a simple configuration script.
 
 Downloading ARM Binaries
 ------------------------
@@ -108,3 +108,81 @@
 -50000
 Exiting @ tick 258647411000 because exiting with last active thread context
 ```
+
+ARM Full System Simulation
+--------------------------
+To run an ARM FS Simulation, there are some changes required to the setup.
+
+If you haven't already, from the gem5 repository's root directory, `cd` into
+the directory `util/term/` by running
+
+```bash
+$ cd util/term/
+```
+
+and then compile the `m5term` binary by running
+
+```bash
+$ make
+```
+
+The gem5 repository comes with example system setups and configurations. These
+can be found in the `configs/example/arm/` directory.
+
+A collection of full system Linux image files are available
+[here](https://www.gem5.org/documentation/general_docs/fullsystem/guest_binaries).
+Save these in a directory and remember the path to it. For example, you could
+store them in
+
+```
+/path/to/user/gem5/fs_images/
+```
+
+The `fs_images` directory will be assumed to contain the extracted FS images
+for the rest of this example.
+
+With the image(s) downloaded, execute the following command in your terminal:
+
+```bash
+$ export IMG_ROOT=/absolute/path/to/fs_images/<image-directory-name>
+```
+
+replacing "\<image-directory-name\>" with the name of the directory extracted
+from the downloaded image file, without the angle-brackets.
+
+We are now ready to run a FS ARM simulation. From the root of the gem5
+repository, run:
+
+```bash
+$ ./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py \
+    --caches \
+    --bootloader="$IMG_ROOT/binaries/<bootloader-name>" \
+    --kernel="$IMG_ROOT/binaries/<kernel-name>" \
+    --disk="$IMG_ROOT/disks/<disk-image-name>" \
+    --bootscript=path/to/bootscript.rcS
+```
+
+replacing anything in angle-brackets with the name of the directory or file,
+without the angle-brackets.
+
+You can then attach to the simulation by, in a different terminal window,
+running:
+
+```bash
+$ ./util/term/m5term 3456
+```
+
+The full details of what the `fs_bigLITTLE.py` script supports can be gotten by
+running:
+
+```bash
+$ ./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py --help
+```
+
+> **An aside on FS simulations:**
+>
+> Note that FS simulations take a long time; like "1 hour to load the kernel"
+> long time! There are ways to "fast-forward" a simulation and then resume the
+> detailed simulation at the interesting point, but these are beyond the scope
+> of this chapter.
+
diff --git a/_pages/documentation/learning_gem5/part2/part2_7_arm_power_modelling.md b/_pages/documentation/learning_gem5/part2/part2_7_arm_power_modelling.md
new file mode 100644
index 0000000..58e31de
--- /dev/null
+++ b/_pages/documentation/learning_gem5/part2/part2_7_arm_power_modelling.md
@@ -0,0 +1,296 @@
+---
+layout: documentation
+title: ARM Power Modelling
+doc: Learning gem5
+parent: part2
+permalink: /documentation/learning_gem5/part2/arm_power_modelling/
+author: Thomas E. Hansen
+---
+
+
+ARM Power Modelling
+===================
+
+It is possible to model and monitor the energy and power usage of a gem5
+simulation. This is done by using various stats already recorded by gem5 in a
+`MathExprPowerModel`; a way to model power usage through mathematical
+equations. This chapter of the tutorial details what the various components
+required for power modelling are and explains how to add them to an existing
+ARM simulation.
+
+This chapter draws on the `fs_power.py` configuration script, provided in the
+`configs/example/arm` directory, and also provides instructions for how to
+extend this script or other scripts.
+
+Note that power models can only be applied when using the more detailed
+"timing" CPUs.
+
+An overview of how power modelling is built into gem5 and which other parts of
+the simulator they interact with can be found in [Sascha Bischoff's
+presentation](https://youtu.be/3gWyUWHxVj4) from the 2017 ARM Research Summit.
+
+Dynamic Power States
+--------------------
+
+Power Models consist of two functions which describe how to calculate the power
+consumption in different power states. The power states are (from
+`src/sim/PowerState.py`):
+
+- `UNDEFINED`: Invalid state, no power state derived information is available.
+   This state is the default.
+- `ON`: The logic block is actively running and consuming dynamic and leakage
+   energy depending on the amount of processing required.
+- `CLK_GATED`: The clock circuity within the block is gated to save dynamic
+   energy, the power supply to the block is still on and leakage energy is
+   being consumed by the block.
+- `SRAM_RETENTION`: The SRAMs within the logic blocks are pulled into retention
+   state to reduce leakage energy further.
+- `OFF`: The logic block is power gated and is not consuming any energy.
+
+A Power Model is assigned to each of the states, apart from `UNDEFINED`, using
+the `PowerModel` class's `pm` field. It is a list containing 4 Power Models,
+one for each state, in the following order:
+
+0. `ON`
+1. `CLK_GATED`
+2. `SRAM_RETENTION`
+3. `OFF`
+
+Note that although there are 4 different entries, these do not have to be
+different Power Models. The provided `fs_power.py` file uses one Power Model
+for the `ON` state and then the same Power Model for the remaining states.
+
+Power Usage Types
+-----------------
+
+The gem5 simulator models 2 types of power usage:
+
+- **static**: The power used by the simulated system regardless of activity.
+- **dynamic**: The power used by the system due to various types of activity.
+
+A Power Model must contain an equation for modelling both of these (although
+that equation can be as simple as `st = "0"` if, for example, static power is
+not desired or irrelevant in that Power Model).
+
+MathExprPowerModels
+-------------------
+
+The provided Power Models in `fs_power.py` extend the `MathExprPowerModel`
+class. `MathExprPowerModels` are specified as strings containing mathematical
+expressions for how to calculate the power used by the system. They typically
+contain a mix of stats and automatic variables, e.g. temperature, for example:
+
+```python
+class CpuPowerOn(MathExprPowerModel):
+    def __init__(self, cpu_path, **kwargs):
+        super(CpuPowerOn, self).__init__(**kwargs)
+        # 2A per IPC, 3pA per cache miss
+        # and then convert to Watt
+        self.dyn = "voltage * (2 * {}.ipc + 3 * 0.000000001 * " \
+                   "{}.dcache.overall_misses / sim_seconds)".format(cpu_path,
+                                                                    cpu_path)
+        self.st = "4 * temp"
+```
+
+(The above power model is taken from the provided `fs_power.py` file.)
+
+We can see that the automatic variables (`voltage` and `temp`)  do not require
+a path, whereas component-specific stats (the CPU's Instructions Per Cycle
+`ipc`) do.  Further down in the file, in the `main` function, we can see that
+the CPU object has a `path()` function which returns the component's "path" in
+the system, e.g. `system.bigCluster.cpus0`. The `path` function is provided by
+`SimObject` and so can be used by any object in the system which extends this,
+for example the l2 cache object uses it a couple of lines further down from
+where the CPU object uses it.
+
+(Note the division of `dcache.overall_misses` by `sim_seconds` to convert to
+Watts. This is a _power_ model, i.e. energy over time, and not an energy model.
+It is good to be cautious when using these terms as they are often used
+interchangeably, but mean very specific things when it comes to power and
+energy simulation/modelling.)
+
+Extending an existing simulation
+--------------------------------
+
+The provided `fs_power.py` script extends the existing `fs_bigLITTLE.py` script
+by importing it and then modifying the values. As part of this, several loops
+are used to iterate through the descendants of the SimObjects to apply the
+Power Models to. So to extend an existing simulation to support power models,
+it can be helpful to define a helper function which does this:
+
+```python
+def _apply_pm(simobj, power_model, so_class=None):
+    for desc in simobj.descendants():
+        if so_class is not None and not isinstance(desc, so_class):
+            continue
+
+        desc.power_state.default_state = "ON"
+        desc.power_model = power_model(desc.path())
+```
+
+The function above takes a SimObject, a Power Model, and optionally a class
+that the SimObject's descendant have to instantiate in order for the PM to be
+applied. If no class is specified, the PM is applied to all the descendants.
+
+Whether you decide to use the helper function or not, you now need to define
+some Power Models. This can be done by following the pattern seen in
+`fs_power.py`:
+
+0. Define a class for each of the power states you are interested in. These
+   classes should extend `MathExprPowerModel`, and contain a `dyn` and an `st`
+   field. Each of these fields should contain a string describing how to
+   calculate the respective type of power in this state. Their constructors
+   should take a path to be used through `format` in the strings describing the
+   power calculation equation, and a number of kwargs to be passed to the
+   super-constructor.
+1. Define a class to hold all the Power Models defined in the previous step.
+   This class should extend `PowerModel` and contain a single field `pm` which
+   contains a list of 4 elements: `pm[0]` should be an instance of the Power
+   Model for the "ON" power state; `pm[1]` should be an instance of the Power
+   Model for the "CLK_GATED" power state; etc. This class's constructor should
+   take the path to pass on to the individual Power Models, and a number of
+   kwargs which are passed to the super-constructor.
+2. With the helper function and the above classes defined, you can then extend
+   the `build` function to take these into account and optionally add a
+   command-line flag in the `addOptions` function if you want to be able to
+   toggle the use of the models.
+
+> **Example implementation:**
+>
+> ```python
+> class CpuPowerOn(MathExprPowerModel):
+>     def __init__(self, cpu_path, **kwargs):
+>         super(CpuPowerOn, self).__init__(**kwargs)
+>         self.dyn = "voltage * 2 * {}.ipc".format(cpu_path)
+>         self.st = "4 * temp"
+>
+>
+> class CpuPowerClkGated(MathExprPowerModel):
+>     def __init__(self, cpu_path, **kwargs):
+>         super(CpuPowerOn, self).__init__(**kwargs)
+>         self.dyn = "voltage / sim_seconds"
+>         self.st = "4 * temp"
+>
+>
+> class CpuPowerOff(MathExprPowerModel):
+>     dyn = "0"
+>     st = "0"
+>
+>
+> class CpuPowerModel(PowerModel):
+>     def __init__(self, cpu_path, **kwargs):
+>         super(CpuPowerModel, self).__init__(**kwargs)
+>         self.pm = [
+>             CpuPowerOn(cpu_path),       # ON
+>             CpuPowerClkGated(cpu_path), # CLK_GATED
+>             CpuPowerOff(),              # SRAM_RETENTION
+>             CpuPowerOff(),              # OFF
+>         ]
+>
+> [...]
+>
+> def addOptions(parser):
+>     [...]
+>     parser.add_argument("--power-models", action="store_true",
+>                         help="Add power models to the simulated system. "
+>                              "Requires using the 'timing' CPU."
+>     return parser
+>
+>
+> def build(options):
+>     root = Root(full_system=True)
+>     [...]
+>     if options.power_models:
+>         if options.cpu_type != "timing":
+>             m5.fatal("The power models require the 'timing' CPUs.")
+>
+>         _apply_pm(root.system.bigCluster.cpus, CpuPowerModel
+>                   so_class=m5.objects.BaseCpu)
+>         _apply_pm(root.system.littleCluster.cpus, CpuPowerModel)
+>
+>     return root
+>
+> [...]
+> ```
+
+Stat Names
+----------
+
+The stat names are usually the same as can be seen in the `stats.txt` file
+produced in the `m5out` directory after a simulation. However, there are some
+exceptions:
+
+- The CPU clock is referred to as `clk_domain.clock` in `stats.txt` but is
+  accessed in power models using `clock_period` and _not_ `clock`.
+
+Stat dump frequency
+-------------------
+
+By default, gem5 dumps simulation stats to the `stats.txt` file every simulated
+second. This can be controlled through the `m5.stats.periodicStatDump`
+function, which takes the desired frequency for dumping stats measured in
+simulated ticks, not seconds. Fortunately, `m5.ticks` provides a `fromSeconds`
+function for ease of usability.
+
+Below is an example of how stat dumping frequency affects result resolution,
+taken from [Sascha Bischoff's presentation](https://youtu.be/3gWyUWHxVj4) slide
+16:
+
+![A picture comparing a less detailed power graph with a more detailed one; a 1
+second sampling interval vs a 1 millisecond sampling
+interval.](/pages/static/figures/empowering_the_masses_slide16.png)
+
+How frequently stats are dumped directly affects the resolution of the graphs
+that can be produced based on the `stats.txt` file. However, it also affects
+the size of the output file. Dumping stats every simulated second vs. every
+simulated millisecond increases the file size by a factor of several hundreds.
+Therefore, it makes sense to want to control the stat dump frequency.
+
+Using the provided `fs_power.py` script, this can be done as follows:
+
+```python
+[...]
+
+def addOptions(parser):
+    [...]
+    parser.add_argument("--stat-freq", type=float, default=1.0,
+                        help="Frequency (in seconds) to dump stats to the "
+                             "'stats.txt' file. Supports scientific notation, "
+                             "e.g. '1.0E-3' for milliseconds.")
+    return parser
+
+[...]
+
+def main():
+    [...]
+    m5.stats.periodicStatDump(m5.ticks.fromSeconds(options.stat_freq))
+    bL.run()
+
+[...]
+```
+
+The stat dump frequency could then be specified using
+```
+--stat-freq <val>
+```
+when invoking the simulation.
+
+Common Problems
+---------------
+
+- gem5 crashes when using the provided `fs_power.py`, with the message `fatal:
+  statistic '' (160) was not properly initialized by a regStats() function`
+- gem5 crashes when using the provided `fs_power.py`, with the message `fatal:
+  Failed to evaluate power expressions: [...]`
+
+These are due to gem5's stats framework recently having been refactored.
+Getting the latest version of the gem5 source code and re-building should fix
+the problem. If this is not desirable, the following two sets of patches are
+required:
+
+1. [https://gem5-review.googlesource.com/c/public/gem5/+/26643](https://gem5-review.googlesource.com/c/public/gem5/+/26643)
+2. [https://gem5-review.googlesource.com/c/public/gem5/+/26785](https://gem5-review.googlesource.com/c/public/gem5/+/26785)
+
+These can be checked out and applied by following the download instructions at
+their respective links.
+
diff --git a/_pages/documentation/learning_gem5/part2/part2_8_arm_dvfs_support.md b/_pages/documentation/learning_gem5/part2/part2_8_arm_dvfs_support.md
new file mode 100644
index 0000000..c1b2401
--- /dev/null
+++ b/_pages/documentation/learning_gem5/part2/part2_8_arm_dvfs_support.md
@@ -0,0 +1,316 @@
+---
+layout: documentation
+title: ARM DVFS Support
+doc: Learning gem5
+parent: part2
+permalink: /documentation/learning_gem5/part2/arm_dvfs_support/
+author: Thomas E. Hansen
+---
+
+ARM DVFS modelling
+==================
+
+Like most modern CPUs, ARM CPUs support DVFS. It is possible to model this and,
+for example, monitor the resulting power usage in gem5. DVFS modelling is done
+through the use of two components of Clocked Objects: Voltage Domains and Clock
+Domains. This chapter details the different components and shows different ways
+to add them to an existing simulation.
+
+Voltage Domains
+---------------
+
+Voltage Domains dictate the voltage values the CPUs can use. If no VD is
+specified when running a Full System simulation in gem5, a default value of
+1.0 Volts is used. This is to avoid forcing users to consider voltage when they
+are not interested in simulating this.
+
+Voltage Domains can be constructed from either a single value or a list of
+values, passed to the `VoltageDomain` constructor using the `voltage` kwarg. If
+a single value and multiple frequencies are specified, the voltage is used for
+all the frequencies in the Clock Domain. If a list of voltage values is
+specified, its number of entries must match the number of entries in the
+corresponding Clock Domain and the entries must be arranged in _descending_
+order. As with real hardware, a Voltage Domain applies to the entire processor
+socket. This means that if you want to have different VDs for the different
+processors (e.g. for a big.LITTLE setup) you need to make sure the big and the
+LITTLE cluster are on different sockets (check the `socket_id` value associated
+with the clusters).
+
+There are 2 ways to add a VD to an existing CPU/simulation, one is more
+flexible, the other is more straightforward. The first method adds command-line
+flags to the provided `configs/example/arm/fs_bigLITTLE.py` file, while the
+second method adds custom classes.
+
+1. The most flexible way to add Voltage Domains to a simulation is to use
+   command-line flags. To add a command-line flag, find the `addOptions`
+   function in the file and add the flag there, optionally with some help
+   text.  
+   An example supporting both a single and multiple voltages:
+
+   ```python
+   def addOptions(parser):
+       [...]
+       parser.add_argument("--big-cpu-voltage", nargs="+", default="1.0V",
+                           help="Big CPU voltage(s).")
+       return parser
+   ```
+
+   The voltage domain value(s) could then be specified with
+
+   ```
+   --big-cpu-voltage <val1>V [<val2>V [<val3>V [...]]]
+   ```
+
+   This would then be accessed in the `build` function using
+   `options.big_cpu_voltage`.  The `nargs="+"` ensures that at least one
+   argument is required.
+   Example usage in `build`:
+
+   ```python
+   def build(options):
+       [...]
+       # big cluster
+       if options.big_cpus > 0:
+           system.bigCluster = big_model(system, options.big_cpus,
+                                         options.big_cpu_clock,
+                                         options.big_cpu_voltage)
+       [...]
+   ```
+
+   A similar flag and additions to the `build` function could be added to
+   support specifying voltage values for the LITTLE CPU. This approach allows
+   for very easy specification and modification of the voltages. The only
+   downside to this method is that the multiple command line arguments, some
+   being in list form, could clutter up the command used to invoke the
+   simulator.
+
+2. The less flexible way to specify Voltage Domains is by creating sub-classes
+   of the `CpuCluster`. Similar to the existing `BigCluster` and
+   `LittleCluster` sub-classes, these will extend the `CpuCluster` class.
+   In the constructor of the subclass, in addition to specifying a CPU-type, we
+   also define a lists of values for the Voltage Domain and pass this to the
+   call to the `super` constructor using the kwarg `cpu_voltage`.
+   Here is an example, for adding voltage to a `BigCluster`:
+
+   ```python
+   class VDBigCluster(devices.CpuCluster):
+       def __init__(self, system, num_cpus, cpu_clock=None, cpu_voltage=None):
+           # use the same CPU as the stock BigCluster
+           abstract_cpu = ObjectList.cpu_list.get("O3_ARM_v7a_3")
+           # voltage value(s)
+           my_voltages = [ '1.0V', '0.75V', '0.51V']
+
+           super(VDBigCluster, self).__init__(
+               cpu_voltage=my_voltages,
+               system=system,
+               num_cpus=num_cpus,
+               cpu_type=abstract_cpu,
+               l1i_type=devices.L1I,
+               l1d_type=devices.L1D,
+               wcache_type=devices.WalkCache,
+               l2_type=devices.L2
+           )
+   ```
+
+   Adding voltages to the `LittleCluster` could then be done by defining a
+   similar `VDLittleCluster` class.
+
+   With the subclass(es) defined, we still need to add an entry to the
+   `cpu_types` dictionary in the file, specifying a string name as the key and
+   a pair of classes as the value, e.g:
+
+   ```python
+   cpu_types = {
+       [...]
+       "vd-timing" : (VDBigCluster, VDLittleCluster)
+   }
+   ```
+
+   The CPUs with VDs could then be used by passing
+
+   ```
+   --cpu-type vd-timing
+   ```
+
+   to the command invoking the simulation.
+
+   Since any modifications to the voltage values have to be done by finding the
+   right subclass and modifying its code, or adding more subclasses and
+   `cpu_types` entries, this approach is a lot less flexible than the
+   flag-based approach.
+
+Clock Domains
+-------------
+
+Voltage Domains are used in conjunction with Clock Domains. As previously
+mentioned, if no custom voltage values have been specified, a default value of
+1.0V is used for all values in the Clock Domain.
+
+Types of Clock Domain
+In contrast to Voltage Domains, there are 3 types of Clock Domains (from
+`src/sim/clock_domain.hh`):
+
+- `ClockDomain` -- provides a clock to a group of Clocked Objects bundled under
+  the same Clock Domain. The CDs are in turn grouped into Voltage Domains. The
+  CDs provide support for a hierarchical structure with "Source" and "Derived"
+  Clock Domains.
+- `SrcClockDomain` -- provides the notion of a CD that is connected to a
+  tunable clock source. It maintains the clock period and provides the methods
+  for setting/getting the clock, as well as the configuration parameters for
+  the CD that a handler is going to manage. This includes frequency values at
+  various performance levels, a Domain ID, and the current performance level.
+  Note that a performance level as requested by the software corresponds to one
+  of the frequency operation points the CD can operate at.
+- `DerivedClockDomain` -- provides the notion of a CD that is connected to a
+  parent CD which can either be a `SrcClockDomain` or a `DerivedClockDomain`.
+  It maintains the clock divider and provides methods for getting the clock.
+
+Adding Clock Domains to an existing simulation
+----------------------------------------------
+
+This example will use the same provided files as the VD examples, i.e.
+`configs/example/arm/fs_bigLITTLE.py` and `configs/example/arm/devices.py`.
+
+Like VDs, CDs can be a single value or a list of values. If a list of clock
+speeds is given, the same rules apply as for a list of voltages given to a VD,
+i.e. the number of values in the CD must match the number of values in the VD;
+and the clock speeds must be given in _descending_ order. The provided files
+come with support for specifying the clock as a single value (through the
+`--{big,little}-cpu-clock` flags), but not as a list of values.
+Extending/Modifying the behaviour of the provided flags is the simplest and
+most flexible way to add support for multi-value CDs, but it is also possible
+to do it by adding subclasses.
+
+1. To add multi-value support to the existing `--{big,little}-cpu-clock` flags,
+   locate the `addOptions` function in the
+   `configs/example/arm/fs_bigLITTLE.py` file. Amongst the various
+   `parser.add_argument` calls, find the ones that add the CPU-clock flags and
+   replace the kwarg `type=str` with `nargs="+"`:
+   ```python
+   def addOptions(parser):
+       [...]
+       parser.add_argument("--big-cpu-clock", nargs="+", default="2GHz",
+                           help="Big CPU clock frequency.")
+       parser.add_argument("--little-cpu-clock", nargs="+", default="1GHz",
+                           help="Little CPU clock frequency.")
+       [...]
+   ```
+   With this, multiple frequencies can be specified similarly to the flag used
+   for VDs:
+   ```
+   --{big,little}-cpu-clock <val1>GHz [<val2>MHz [<val3>MHz [...]]]
+   ```
+
+   Since this modifies existing flags, the flags' values are already wired up
+   to the relevant constructors and kwargs in the `build` function, so there is
+   nothing to be modified there.
+
+2. To add CDs in a subclass, the process is very similar to the process of
+   adding VDs as a subclass. The difference is that instead of specifying
+   voltages and using the `cpu_voltage` kwarg, we specify clock values and use
+   the `cpu_clock` kwarg in the `super` call:
+   ```python
+   class CDBigCluster(devices.CpuCluster):
+       def __init__(self, system, num_cpus, cpu_clock=None, cpu_voltage=None):
+           # use the same CPU as the stock BigCluster
+           abstract_cpu = ObjectList.cpu_list.get("O3_ARM_v7a_3")
+           # clock value(s)
+           my_freqs = [ '1510MHz', '1000MHz', '667MHz']
+
+           super(VDBigCluster, self).__init__(
+               cpu_clock=my_freqs,
+               system=system,
+               num_cpus=num_cpus,
+               cpu_type=abstract_cpu,
+               l1i_type=devices.L1I,
+               l1d_type=devices.L1D,
+               wcache_type=devices.WalkCache,
+               l2_type=devices.L2
+           )
+   ```
+   This could be combined with the VD example so as to specify both VDs and CDs
+   for the cluster.
+
+   As with adding VDs using this approach, you would need to define a class for
+   each of the CPU-types you wanted to use and specify their name-cpuPair value
+   in the `cpu_types` dictionary. This method also has the same limitations and
+   is a lot less flexible than the flag-based approach.
+
+Making sure CDs have a valid DomainID
+-------------------------------------
+
+Regardless of which of the previous methods are used, there are some additional
+modifications required. These concern the provided
+`configs/example/arm/devices.py` file.
+
+In the file, locate the `CpuClusters` class and find the place where
+`self.clk_domain` is initialised to a `SrcClockDomain`. As noted in the comment
+concerning `SrcClockDomain` above, these have a Domain ID. If this is not set,
+as is the case in the provided setup, then the default ID of `-1` will be used.
+Instead of this, change the code to make sure the Domain ID is set:
+
+```python
+[...]
+self.clk_domain = SrcClockDomain(clock=cpu_clock,
+                                 voltage_domain=self.voltage_domain,
+                                 domain_id=system.numCpuClusters())
+[...]
+```
+
+The `system.numCpuClusters()` is used here since the CD applies to the entire
+cluster, i.e. it will be 0 for the first cluster, 1 for the second cluster,
+etc.
+
+If you don't set the Domain ID, you will get the following error when trying to
+run a DVFS-capable simulation as some internal checks catch the default Domain
+ID:
+
+```
+fatal: fatal condition domain_id == SrcClockDomain::emptyDomainID occurred:
+DVFS: Controlled domain system.bigCluster.clk_domain needs to have a properly
+assigned ID.
+```
+
+The DVFS Handler
+----------------
+
+If you specify VDs and CDs and then try to run your simulation, it will most
+likely run, but you might notice the following warning in the output:
+
+```
+warn: Existing EnergyCtrl, but no enabled DVFSHandler found.
+```
+
+The VDs and CDs have been added, but there is no `DVFSHandler` which the system
+can interface with to adjust the values. The simplest way to fix this is to add
+another command-line flag, in the `configs/example/arm/fs_bigLITTLE.py` file.
+
+As in the VD and CD examples, locate the `addOptions` function and append the
+following code to it:
+
+```python
+def addOptions(parser):
+    [...]
+    parser.add_argument("--dvfs", action="store_true",
+                        help="Enable the DVFS Handler.")
+    return parser
+```
+
+Then, locate the `build` function and append this code to it:
+
+```python
+def build(options):
+    [...]
+    if options.dvfs:
+        system.dvfs_handler.domains = [system.bigCluster.clk_domain,
+                                       system.littleCluster.clk_domain]
+        system.dvfs_handler.enable = options.dvfs
+
+    return root
+```
+
+With this in place, you should now be able to run a DVFS-capable simulation by
+using the `--dvfs` flag when invoking the simulation, with the option to
+specify the voltage and frequency operating points of both the big and the
+LITTLE cluster as necessary.
+
diff --git a/_pages/publications.md b/_pages/publications.md
index 82f1e08..cf32fbd 100644
--- a/_pages/publications.md
+++ b/_pages/publications.md
@@ -40,9 +40,9 @@
 
 *   [**Exploring system performance using elastic traces: Fast, accurate and portable**](https://doi.org/10.1109/SAMOS.2016.7818336). Radhika Jagtap, Matthias Jung, Stephan Diestelhorst, Andreas Hansson, Norbert Wehn. IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2016
 
-## SystemC Couping<span class="anchor" data-clipboard-text="http://new.gem5.org/publications/#systemc-couping"></span>
+## SystemC Coupling<span class="anchor" data-clipboard-text="http://new.gem5.org/publications/#systemc-coupling"></span>
 
-*   [**System Simulation with gem5 and SystemC: The Keystone for Full Interoperability**](http://samos-conference.com/Resources_Samos_Websites/Proceedings_Repository_SAMOS/2017/49_Final_Paper.pdf). C. Menard, M. Jung, J. Castrillon, N. Wehn. IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), July, 2017
+*   [**System Simulation with gem5 and SystemC: The Keystone for Full Interoperability**](https://ieeexplore.ieee.org/document/8344612). C. Menard, M. Jung, J. Castrillon, N. Wehn. IEEE International Conference on Embedded Computer Systems Architectures Modeling and Simulation (SAMOS), July, 2017
 
 # Derivative projects
 
diff --git a/_pages/static/figures/empowering_the_masses_slide16.png b/_pages/static/figures/empowering_the_masses_slide16.png
new file mode 100644
index 0000000..75d45a9
--- /dev/null
+++ b/_pages/static/figures/empowering_the_masses_slide16.png
Binary files differ
diff --git a/_posts/2020-10-27-tme.md b/_posts/2020-10-27-tme.md
new file mode 100644
index 0000000..b6d0486
--- /dev/null
+++ b/_posts/2020-10-27-tme.md
@@ -0,0 +1,340 @@
+---
+layout: post
+title:  "Arm's Transactional Memory Extension support in gem5"
+author: Timothy Hayes
+date:   2020-10-27
+canonical: https://community.arm.com/developer/research/b/articles/posts/arms-transactional-memory-extension-support-
+categories: project
+---
+
+**This post was originally posted on the Arm Research Blog: [here](
+https://community.arm.com/developer/research/b/articles/posts/arms-transactional-memory-extension-support-)**
+
+## A shift to concurrency
+
+In 2005, Herb Sutter published his seminal article “The Free Lunch is Over” (Sutter, 2005). He outlined that the sequential performance of microprocessors would soon plateau, and the industry would respond by offering more performant processors by way of increased core counts. The consequence of this paradigm shift has been a move away from a purely sequential programming model for writing software to that of a concurrent one with multiple threads of execution. When applications inherently exhibit parallelism, dividing work between multiple threads can yield performance gains when the threads execute on different cores.
+
+Multithreading concurrency has two principal drawbacks—(1) the overheads of synchronization, for example, the serializing nature of locks, and (2) its difficulty to program, debug and verify. There is often an inverse correlation between these two properties when characterizing different synchronization strategies, for example, coarse-grained locking, fine-grained locking and lock-free algorithms.
+
+![](/assets/img/TME-Blog-figure-1Asset-1_2D00_100.jpg)
+
+*Figure 1: Achieving more performant/scalable concurrency often comes at the cost of increased difficulty.*
+
+Hardware Transactional Memory (HTM) allows two or more threads to safely execute critical sections—a.k.a. transactions—in parallel without using serializing primitives such as mutexes. Transactions are executed speculatively with the properties of atomicity, consistency and isolation (Harris, et al., 2007), and the microarchitecture is in charge of detecting and recovering from race conditions. For example, if one thread writes to a memory location that another thread has read. This can provide increased parallelism with a simpler programming model.
+
+## Arm's TME
+
+The [Transactional Memory Extension (TME)](https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture) is part of Arm’s A-profile Future Architecture Technologies program, which provides advanced information on unreleased versions of the architecture. TME is a best effort HTM architecture which does not guarantee completion of transactions. The programmer must provide a fallback path to guarantee progress, such as a mutex-guarded critical section. It provides strong isolation, meaning transactions are isolated from both other transactions, and concurrent non-transactional memory accesses. It uses flattened nesting of transactions, in which nested transactions are subsumed by the outer transaction. The effects of a nested transaction do not become visible to other observers until the outer transaction commits. When a nested transaction aborts, it causes the outer transaction (and all its nested transactions within) to abort.
+
+TME comprises four instructions:
+
+- **TSTART \<Xd\>** - This instruction starts a new transaction. If the transaction started successfully, the destination register is set to zero and the processor enters _transactional state_. If the transaction failed or was canceled, then all state modifications that were performed transactionally are discarded. The destination register is then written with a nonzero value that encodes the cause of the failure.
+
+- **TCOMMIT** - This instruction commits the current transaction. If the current transaction is an outer transaction, then _transactional state_ is exited, and all state modifications performed transactionally are committed to the architectural state.
+
+- **TCANCEL #\<imm\>** - This instruction exits _transactional state_ and discards all state modifications that were performed transactionally. Execution continues at the instruction that follows the `TSTART` instruction of the outer transaction. The destination register of the `TSTART` instruction of the outer transaction is written with the immediate operand of `TCANCEL`.
+
+- **TTEST \<Xd\>** - This instruction writes the depth of the transaction to the destination register, or the value 0 otherwise.
+
+![](/assets/img/TME-Blog-figure-2Asset-3_2D00_100.jpg)
+
+*Figure 2: Illustrates the semantics of the four TME instructions. (1) shows two threads creating and committing transactions. Thread T1 conflicts with thread T2, therefore T2 aborts its transaction and rolls back to the `TSTART` instruction. T2's `TCOMMIT` is never reached. (2) thread T1 creates a transaction but manually aborts it using `TCANCEL`. It aborts and rolls back to `TSTART`. T1’s `TCOMMIT` is never reached. (3) thread T1 creates and commits a nested transaction in which the transactional depth is tested. When `TTEST` is executed in the outer transaction it returns 1, whereas if it is executed in the inner transaction it returns 2.*
+
+## Modelling TME in gem5
+
+[gem5](https://www.gem5.org/) is an open-source system and processor simulator widely used in academia and industry for computer architecture research. gem5 offers two alternative memory systems—classic and Ruby. The [Classic Memory System](https://www.gem5.org/documentation/general_docs/memory_system/classic-coherence-protocol/) is inherited from gem5’s predecessor, M5, and implements a MOESI coherence protocol (Sorin, Hill, & Wood, 2011). In contrast, the [Ruby Memory System](https://www.gem5.org/documentation/general_docs/ruby/) offers a flexible way of defining and emulating custom cache coherence protocols, including a range of interconnects and topologies. Protocols are specified through states, transitions, events, and actions using the domain-specific language, SLICC (specification language including cache coherence). gem5 includes a variety of Ruby cache coherence protocols available for use by default, however, none of these protocols currently support HTM.
+
+HTM can be implemented in many ways with different trade-offs. To prototype Arm's TME, we have chosen one particular way of implementing the ISA extension in the microarchitecture. In short, it uses lazy version management and eager conflict detection with a cache line granularity (Bobba, et al., 2007). Other implementations are possible and the choices for the gem5 design do not necessarily reflect any silicon implementation – existing or future.
+
+TME requires that implementations must:
+
+- Checkpoint and restore the architectural register state.
+- Track transactionally read and written state.
+- Buffer speculative memory updates.
+
+![](/assets/img/TME-Blog-figure-3Asset-5_2D00_100.jpg)
+
+*Figure 3: The boxes highlighted in orange are components in the microarchitecture that have been added or modified to accommodate TME support.*
+
+There are several techniques that can be used for register checkpointing, including shadow register files and freezing physical registers. In gem5, we opt for a functionally correct checkpointing mechanism with no overhead, that is, a zero-cycle instantaneous backup of the entire register file. This allows us to share a common checkpointing mechanism between core models.
+
+To separate the HTM support from TME-specific functionality, a generic interface for checkpoint creation and restoration is added to src/arch/ directory. ISAs can implement this interface according to their particular needs, allowing much of the HTM functionality to be shared and reused. The TME implementation must also be able to roll back the architectural state and discard speculative updates on transactional failure or cancellation. This is achieved by repurposing gem5’s exception mechanism.
+
+To track a transaction’s read/write sets and buffer speculative memory updates, we leverage the cache coherence protocol. gem5 includes the Ruby protocol _MESI\_Three\_Level_. MESI refers to the states that a cache line can be in: Modified, exclusive, shared or invalid (Sorin, Hill, & Wood, 2011). The protocol utilizes private L1 data and instruction caches that are fed by a larger unified inclusive private L2 cache. The L2 caches are backed by a larger inclusive shared L3 cache and coherence directory.
+
+The _MESI\_Three\_Level_ protocol has been augmented to support TME. The L1 data cache is used to buffer speculative state isolated from the rest of the system. Because the L2 cache is inclusive, it contains the same lines used in a transaction’s read/write sets, but holds their pre-transactional values. The consequence of this configuration is that a transaction’s working set must reside solely in the L1 data cache. If a transactionally read or written line spills, that is, is evicted from the L1 data cache to the L2 cache, the transaction must abort, and any speculatively written data must be discarded.
+
+To track transactionally read and written states, two additional ‘bits’ are added to the tags of each L1 data cache line - 1 bit if it is in a transaction’s read set and the other bit if it is in the write set. These bits are then used when transitioning from one cache line state to another. To commit a transaction, both bits are cleared. To abort a transaction, the line is transitioned to _invalid_ if it is both _modified_ and in the transaction’s write set; similar to committing, both bits are also cleared. We assume these bits can be cleared atomically so that to external observers, either all the transactional state commits (becomes non-speculative), or is discarded and rolls back. This satisfies the transactional memory property of atomicity.
+
+## Example program
+
+To test the new functionality in gem5, we outline a simple program written in C that uses TME transactions to update a histogram in parallel. This program uses manual lock elision—a lock is used to protect a shared data structure but is bypassed, that is, elided, whenever possible in favor of transactions. This satisfies the requirements of a fallback path if a transaction cannot make progress.
+
+We first define a very simple spinlock that works with [AArch64](https://developer.arm.com/architectures/learn-the-architecture/aarch64-instruction-set-architecture?_ga=2.17759802.282459154.1604342475-1664555334.1603995267)’s weak memory model.
+
+```cpp
+#include <stdatomic.h>
+
+typedef atomic_int lock_t;
+
+inline void lock_init(lock_t *lock) {
+    atomic_init(lock, 0);
+}
+
+inline void lock_acquire(lock_t *lock) {
+    while (atomic_exchange_explicit(lock, 1, memory_order_acquire))
+        ; // spin until acquired
+}
+
+inline int lock_is_acquired(lock_t *lock) {
+    return atomic_load_explicit(lock, memory_order_acquire);
+}
+
+inline void lock_release(lock_t *lock) {
+    atomic_store_explicit(lock, 0, memory_order_release);
+}
+```
+
+Next, we write a function to elide the lock using a TME transaction. `lock_acquire_elided` returns 1 if the lock was successfully elided otherwise 0. The function starts a new transaction and checks that the lock is still free, therefore adding it to the transaction’s read set. If the lock is not free, the transaction is aborted explicitly via `TCANCEL`. The particular 15-bit integer passed as a parameter is unimportant in our example, however, setting the MSB ensures the transaction can be retried.
+
+```cpp
+#include <arm_acle.h>
+
+#define TME_MAX_RETRIES         3
+#define TME_LOCK_IS_ACQUIRED    65535
+
+int lock_acquire_elided(lock_t *lock) {
+    int num_retries = 0;
+    uint64_t status;
+
+    do {
+        status = __tstart();
+        if (status == 0) {
+            // check if lock is acquired and add it to our read-set
+            if (lock_is_acquired(lock)) {
+                __tcancel(TME_LOCK_IS_ACQUIRED);
+                __builtin_unreachable();
+            }
+            return 1;
+        }
+        ++num_retries;
+    } while ((status & _TMFAILURE_RTRY) && (num_retries < TME_MAX_RETRIES));
+
+    // the transaction failed too many times
+    return 0;
+}
+
+void lock_release_elided() {
+    __tcommit();
+}
+```
+
+These spinlock and transaction routines are then leveraged to create the function `work` that updates a global shared array structure on the heap. This function can be called from multiple threads in parallel.
+
+```cpp
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "lock.h"
+
+#define ARRAYSIZE             512
+#define ITERATIONS            10000
+
+volatile long int histogram[ARRAYSIZE];
+lock_t global_lock;
+
+void* work(void* void_ptr) {
+    // Use thread id for RNG seed,
+    // this will prevent threads generating the same array indices.
+    long int idx = (long int)void_ptr;
+    unsigned int seedp = (unsigned int)idx;
+    int i, rc;
+
+    printf("Hello from thread %ld\n", idx);
+
+    for (i=0; i<ITERATIONS; i++)
+    {
+        int num1 = rand_r(&seedp)%ARRAYSIZE;
+
+        rc = lock_acquire_elided(&global_lock);
+        if (rc == 0) // eliding the lock failed
+            lock_acquire(&global_lock);
+
+        // start critical section
+        long int temp = histogram[num1];
+        temp += 1;
+        histogram[num1] = temp;
+        // end critical section
+
+        if (rc == 1)
+            lock_release_elided();
+        else
+            lock_release(&global_lock);
+    }
+
+    printf("Goodbye from thread %ld\n", idx);
+}
+```
+
+Finally, we put this all together using a `main` function that spawns and joins worker threads.
+
+```cpp
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+
+
+int main() {
+    long int i, total, numberOfProcessors;
+    pthread_t *threads;
+    int rc;
+
+    numberOfProcessors = sysconf(_SC_NPROCESSORS_ONLN);
+
+    printf("TME parallel histogram with %ld procs\n", numberOfProcessors);
+
+    lock_init(&global_lock);
+
+    // initialise the array
+    for (i=0; i<ARRAYSIZE; i++)
+        histogram[i] = 0;
+
+    // spawn work
+    threads = (pthread_t*) malloc(sizeof(pthread_t)*numberOfProcessors);
+    for (i=0; i<numberOfProcessors-1; i++) {
+        rc = pthread_create(&threads[i], NULL, work, (void*)i);
+        assert(rc==0);
+    }
+    work((void*)(numberOfProcessors-1));
+
+    // wait for worker threads
+    for (i=0; i<numberOfProcessors-1; i++) {
+        rc = pthread_join(threads[i], NULL);
+        assert(rc==0);
+    }
+
+    // verify array contents
+    total = 0;
+    for (i=0; i<ARRAYSIZE; i++)
+        total += histogram[i];
+
+    // free resources
+    free(threads);
+
+    printf("Total is %lu\nExpected total is %lu\n",
+        total, ITERATIONS*numberOfProcessors);
+
+    return 0;
+}
+```
+
+## Compiling and running
+
+TME is supported in GCC as of [version 10](https://gcc.gnu.org/gcc-10/changes.html)—this includes [ACLE intrinsics](https://developer.arm.com/documentation/101028/0010/Transactional-Memory-Extension--TME--intrinsics?_ga=2.211085974.282459154.1604342475-1664555334.1603995267). To compile source files with TME instructions, an AArch64 compiler must be used with the feature enabled via the march flag, for example, `-march=armv8-a+tme`.
+
+```
+aarch64-linux-gnu-gcc -std=c11 -O2 -static -march=armv8-a+tme -pthread -o histogram.exe ./histogram.c
+```
+
+gem5 must then be compiled with the new Ruby `MESI_Three_Level_HTM` protocol.
+
+```
+scons CC=gcc CXX=g++ build/ARM_MESI_Three_Level_HTM/gem5.opt TARGET_ISA=arm PROTOCOL=MESI_Three_Level_HTM SLICC_HTML=True CPU_MODELS=AtomicSimpleCPU,TimingSimpleCPU,O3CPU -j 4
+```
+
+To run the histogram executable in system emulation mode
+
+```
+./gem5/build/ARM_MESI_Three_Level_HTM/gem5.opt ./gem5/configs/example/se.py --ruby --num-cpus=2 --cpu-type=TimingSimpleCPU --cmd=./blogexample/histogram.exe
+```
+
+The output should look something like:
+
+```
+TME parallel histogram with 2 procs
+Hello from thread 1
+Hello from thread 0
+Goodbye from thread 1
+Goodbye from thread 0
+Total is 20000
+Expected total is 20000
+Exiting @ tick 718668000 because exiting with last active thread context
+```
+
+To verify whether any critical sections executed transactionally, we check `m5out/stats.txt` where several HTM-related statistics live.
+
+```
+system.ruby.l0_cntrl0.sequencer.htm_transaction_abort_cause::explicit           35     22.01%     22.01% # cause of htm transaction abort
+system.ruby.l0_cntrl0.sequencer.htm_transaction_abort_cause::transaction_size           38     23.90%     45.91% # cause of htm transaction abort
+system.ruby.l0_cntrl0.sequencer.htm_transaction_abort_cause::memory_conflict           86     54.09%    100.00% # cause of htm transaction abort
+system.ruby.l0_cntrl0.sequencer.htm_transaction_abort_cause::total          159                       # cause of htm transaction abort
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::samples         9927                       # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::mean    63.466103                       # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::gmean    56.438036                       # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::stdev    29.029108                       # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::32-47         4854     48.90%     48.90% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::48-63            2      0.02%     48.92% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::64-79          195      1.96%     50.88% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::80-95         4627     46.61%     97.49% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::96-111          188      1.89%     99.39% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::112-127           60      0.60%     99.99% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::128-143            1      0.01%    100.00% # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_cycles::total         9927                       # number of cycles spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_instructions::samples         9927                       # number of instructions spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_instructions::mean           12                       # number of instructions spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_instructions::gmean    12.000000                       # number of instructions spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_instructions::12-13         9927    100.00%    100.00% # number of instructions spent in an outer transaction
+system.ruby.l0_cntrl0.sequencer.htm_transaction_instructions::total         9927                       # number of instructions spent in an outer transaction
+```
+
+These are per-core statistics that provide information about the lengths of the transactions (in number of cycles or number of instructions), as well as the reason for aborting a transaction. _explicit_ is incremented when `TCANCEL` is used—in our example code, this happens when the global lock is observed to be taken after the transaction has already started. _memory\_conflict_ occurs when another processing element attempts to modify a cache line in the transaction’s read or write sets. _transaction\_size_ indicates that the transaction spilled out of the L1 data cache; since this is difficult to track accurately, the statistics instead capture transactional cache lines that are evicted from the L1 data cache due to a load/store originating in the same core. Due to factors such as the cache set replacement policy, this statistic often exhibits false positives.
+
+```
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::samples          159                       # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::mean     0.729560                       # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::stdev     0.591988                       # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::0           55     34.59%     34.59% # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::1           92     57.86%     92.45% # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::2           12      7.55%    100.00% # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_read_set::total          159                       # read set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::samples          159                       # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::mean     0.169811                       # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::stdev     0.376653                       # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::0          132     83.02%     83.02% # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::1           27     16.98%    100.00% # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_aborted_write_set::total          159                       # write set size of a aborted transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::samples         9927                       # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::mean     1.987710                       # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::gmean     1.983035                       # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::stdev     0.110181                       # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::1          122      1.23%      1.23% # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::2         9805     98.77%    100.00% # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_read_set::total         9927                       # read set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_write_set::samples         9927                       # write set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_write_set::mean            1                       # write set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_write_set::gmean            1                       # write set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_write_set::1         9927    100.00%    100.00% # write set size of a committed transaction
+system.ruby.l0_cntrl0.Dcache.htm_transaction_committed_write_set::total         9927                       # write set size of a committed transaction
+```
+
+These are sampled histograms containing the sizes of transactions in terms of number of cache lines used. It can be observed that the majority of successful transactions read from two unique cache lines and write to one. These are useful metrics when characterizing TME-enabled applications.
+
+## Available now
+
+Arm is dedicated to working with the open-source software community, and views the innovation it enables as essential to the ongoing success of its ecosystem. Arm’s TME support in gem5 has been open-sourced and upstreamed; it is available from [v20.1](https://www.gem5.org/project/2020/10/01/gem5-20-1.html) onwards. This enables many useful use cases for our commercial and academic partners, some examples being:
+
+- Testing and benchmarking TME-enabled binaries before general silicon availability
+- Sensitivity analyses to determine how different microarchitectural parameters impact the efficacy of a TME implementation
+- As a research platform to discover and demonstrate how this technology could evolve.
+
+This work was in collaboration with Cray, and funded in part by the [DOE ECP PathForward program](https://www.exascaleproject.org/research-group/pathforward/). The code is based on a previous pull request by Pradip Vallathol, who developed HTM and TSX support in gem5 as part of his master’s thesis. The author would like to thank all of the internal and external code reviewers.
+
+## Works cited
+
+Bobba, J., Moore, K. E., Volos, H., Yen, L., Hill, M. D., Swift, M. M., & Wood, D. A. (2007). Performance pathologies in hardware transactional memory. _ACM SIGARCH Computer Architecture News_, _35(2)_, 81-91.
+
+Harris, T., Cristal, A., Unsal, O. S., Ayguade, E., Gagliardi, F., Smith, B., & Valero, M. (2007, August 20). Transactional Memory: An Overview. _IEEE Micro_, _27(3)_, pp. 8-29.
+
+Sorin, D. J., Hill, M. D., & Wood, D. A. (2011). A Primer on Memory Consistency and Cache Coherence. _Synthesis lectures on computer architecture_, _6(3)_, 1-212.
+
+Sutter, H. (2005). The free lunch is over: A fundamental turn toward concurrency in software. _Dr. Dobb’s journal_, _30(3)_, 202-210.
diff --git a/assets/img/TME-Blog-figure-1Asset-1_2D00_100.jpg b/assets/img/TME-Blog-figure-1Asset-1_2D00_100.jpg
new file mode 100644
index 0000000..8dabbfa
--- /dev/null
+++ b/assets/img/TME-Blog-figure-1Asset-1_2D00_100.jpg
Binary files differ
diff --git a/assets/img/TME-Blog-figure-2Asset-3_2D00_100.jpg b/assets/img/TME-Blog-figure-2Asset-3_2D00_100.jpg
new file mode 100644
index 0000000..35899f8
--- /dev/null
+++ b/assets/img/TME-Blog-figure-2Asset-3_2D00_100.jpg
Binary files differ
diff --git a/assets/img/TME-Blog-figure-3Asset-5_2D00_100.jpg b/assets/img/TME-Blog-figure-3Asset-5_2D00_100.jpg
new file mode 100644
index 0000000..16a9d5b
--- /dev/null
+++ b/assets/img/TME-Blog-figure-3Asset-5_2D00_100.jpg
Binary files differ