configs: GPUFS: use multiple event queues for >1 CPU

The KVM CPU hangs if there are not multiple event queues when more than
one CPU is created. Since GPUFS primarily relies on the KVM CPU, support
for multiple event queues is needed. Some GPU libraries, such as AMD
Research's ATMI library, assume more than one CPU.

This changeset adds support for multiple CPUs and was tested for up to
four CPUs.

Change-Id: Ia354e02209d0fa18195f3ad44f4fb1d58e93b5ca
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65131
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 944a46a..781ce8e 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -133,6 +133,11 @@
     that should not be changed by the user.
     """
 
+    # GPUFS is primarily designed to use the X86 KVM CPU. This model needs to
+    # use multiple event queues when more than one CPU is simulated. Force it
+    # on if that is the case.
+    args.host_parallel = True if args.num_cpus > 1 else False
+
     # These are used by the protocols. They should not be set by the user.
     n_cu = args.num_compute_units
     args.num_sqc = int(math.ceil(float(n_cu) / args.cu_per_sqc))
@@ -149,6 +154,9 @@
         time_sync_period="1000us",
     )
 
+    if args.host_parallel:
+        root.sim_quantum = int(1e8)
+
     if args.script is not None:
         system.readfile = args.script
 
diff --git a/configs/example/gpufs/system/system.py b/configs/example/gpufs/system/system.py
index 46b023f..a1b59ef 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -204,6 +204,15 @@
         for j in range(len(system.cpu[i].isa)):
             system.cpu[i].isa[j].vendor_string = "AuthenticAMD"
 
+    if args.host_parallel:
+        # To get the KVM CPUs to run on different host CPUs, specify a
+        # different event queue for each CPU.  The last CPU is a GPU
+        # shader and should be skipped.
+        for i, cpu in enumerate(system.cpu[:-1]):
+            for obj in cpu.descendants():
+                obj.eventq_index = 0
+            cpu.eventq_index = i + 1
+
     gpu_port_idx = (
         len(system.ruby._cpu_ports)
         - args.num_compute_units