Due 1pm, Tuesday, 9/29
You should do this assignment alone. No late assignments.
The purpose of this assignment is to give you experience with pipelined CPUs. You will simulate a given program with simple timing cpu to understand the instruction mix of the program. Then, you will simulate the same program with an pipelined inorder CPU to understand how the latency and bandwidth of different parts of pipeline affect performance. You will also be exposed to pseudo-instructions that are used for carrying out functions required by the underlying experiment. This homework is based on exercise 3.6 of CA:AQA 3rd edition.
#include <cstdio> #include <random> int main() { const int N = 1000; double X[N], Y[N], alpha = 0.5; std::random_device rd; std::mt19937 gen(rd()); std::uniform_real_distribution<> dis(1, 2); for (int i = 0; i < N; ++i) { X[i] = dis(gen); Y[i] = dis(gen); } // Start of daxpy loop for (int i = 0; i < N; ++i) { Y[i] = alpha * X[i] + Y[i]; } // End of daxpy loop double sum = 0; for (int i = 0; i < N; ++i) { sum += Y[i]; } printf("%lf\n", sum); return 0; }
Your first task is to compile this code statically and simulate it with gem5 using the timing simple cpu. Compile the program with -O2
flag to avoid running into unimplemented x87 instructions while simulating with gem5. Report the breakup of instructions for different op classes. For this, grep for op_class in the file stats.txt.
-S
and -O2
options when compiling with GCC. As you can see from the assembly code, instructions that are not central to the actual task of the program (computing aX + Y
) will also be simulated. This includes the instructions for generating the vectors X
and Y
, summing elements in Y
and printing the sum. When I compiled the code with -S
, I got about 350 lines of assembly code, with only about 10-15 lines for the actual daxpy loop.Usually while carrying out experiments for evaluating a design, one would like to look only at statistics for the portion of the code that is most important. To do so, typically programs are annotated so that the simulator, on reaching an annotated portion of the code, carries out functions like create a checkpoint, output and reset statistical variables.
You will edit the C++ code from the first part to output and reset stats just before the start of the DAXPY loop and just after it. For this, include the file util/m5/m5op.h
in the program. You will find this file in util/m5
directory of the gem5 repository. Use the function m5_dump_reset_stats()
from this file in your program. This function outputs the statistical variables and then resets them. You can provide 0 as the value for the delay and the period arguments.
To provide the definition of the m5_dump_reset_stats()
, go to the directory util/m5
and edit the Makefile.x86 in the following way:
diff --git a/util/m5/Makefile.x86 b/util/m5/Makefile.x86 --- a/util/m5/Makefile.x86 +++ b/util/m5/Makefile.x86 [=@@=] -31,7 +31,7 @@ AS=as LD=ld -CFLAGS=-O2 -DM5OP_ADDR=0xFFFF0000 +CFLAGS=-O2 OBJS=m5.o m5op_x86.o all: m5
Execute the command make -f Makefile.x86
in the directory util/m5
. This will create an object file named m5op_x86.o
. Link this file with the program for DAXPY. Now again simulate the program with the timing simple CPU. This time you should see three sets of statistics in the file stats.txt. Report the breakup of instructions among different op classes for the three parts of the program. Provide the fragment of the generated assembly code that starts with call to m5_dump_reset_stats()
and ends m5_dump_reset_stats()
, and has the main daxpy loop in between.
Take a look at the file src/cpu/minor/MinorCPU.py
. In the definition of MinorFU
, the class for functional units, we define two quantities opLat
and issueLat
. From the comments provided in the file, understand how these two parameters are to be used. Also note the different functional units that are instantiated as defined in class MinorDefaultFUPool
.
Assume that the issueLat and the opLat of the FloatSimdFU can vary from 1 to 6 cycles and that they always sum to 7 cycles. For each decrease in the opLat, we need to pay with a unit increase in issueLat. Which design of the FloatSimd functional unit would you prefer? Provide statistical evidence obtained through simulations of the annotated portion of the code.
You can find a skeleton file that extends the minor CPU here <$urlbase}html/cpu.py>. If you use this file, you will have to modify your config scripts to work with it. Also, you'll have to modify this file to support the next part.
Turn in your assignment by sending an email message to Prof. David Wood david@cs.wisc.edu and Nilay Vaish nilay@cs.wisc.edu with the subject line: “CS752 Homework3”.
The email should contain the name and ID numbers of the student submitting the assignment. The files below should be attached as a zip file to the email.
A file named daxpy.cpp which is used for testing. This file should also include the pseudo-instructions (m5_dump_reset_stats()
) as asked in part 2. Also provide a file daxpy.s with the fragment of the generated assembly code as asked for in part 2.
stats.txt and config.ini files for all the simulations.
A short report (200 words) on questions asked.