cpu: Move fetch stats from simple and minor to base

This summarizes a series of changes to move general Simple, Minor,
O3 CPU stats to BaseCPU. This commit focuses on moving numBranches
from SimpleCPU to the FetchCPUStats in the BaseCPU, and
numFetchSuspends from MinorCPU into FetchCPUStats.  More general
information about this relation chain is below

1. Summary:
Moved general CPU stats found across Simple, Minor, and O3 CPU models
into BaseCPU through new stat groups. The stat groups are
FetchCPUStats, ExecuteCPUStats, and CommitCPUStats. Implemented the
committedControl stat vector found in MinorCPU for Simple and O3 CPU.
Implemented the numStoreInsts stat found in SimpleCPU for O3CPU. IPC
and CPI stats are now tracked at the core and thread level in BaseCPU
and are made universal for simple, minor, o3, and kvm CPUs. Duplicate
stats across the models are merged into a single stat in BaseCPU under
the same stat name. This change does not implement every general level
stat moved to BaseCPU for every model.

2. Stat API Changes
a. SimpleCPU:
statExecutedInstType vector unified into committedInstType
numCondCtrlInsts unified into committedControl::isControl

b. O3CPU:
i. Fetch Stage
branches in fetch unified into with numBranches
rate renamed to fetchRate
insts unified into with numInsts

ii. Execute Stage
Regfile stats unified into base with use of Simple's stat naming
numRefs in IEW unified into numMemRefs
numRate from IEW renamed to instRate

iii. Commit Stage
committedInsts is renamed to numInstsNotNOP
committedOps is renamed to numOpsNotNOP
instsCommitted is unified into numInsts
opsCommitted is unified into numOps
branches is unified into committedControl::isControl
floating is unified into numFpInsts
integer is unified into numIntInsts
loads is unified into numLoadInsts
memRefs is renamed to numMemRefs
vectorInstructions is unified into numVecInsts

3. Details:
Created three stat groups in BaseCPU. FetchCPUStats track statistics
related to the fetch stage. ExecuteCPUStats track statistics related
to the execute stage. CommitCPUStats track statistics related to the
commit stage.

There are three vectors in Base that store unique pointers to per
thread instances of these stat groups. The stat group pointer for
thread i is accessible at index i of one of these vectors. For example,
stat numCCRegReads of the execute stage for thread 0 can be accessed
with executeStats[0]->numCCRegReads. The stats.txt output will print the
thread ID of the stat group. For example, numVecRegReads on thread 0
of a single core prints as
"board.processor.cores.core.executeStats0.numVecRegReads".
NOTE: Multithreading in gem5 is untested. Therefore per thread stats
output in stats.txt is not currently guaranteed to be correctly
formatted.

For FetchCPUStats, the stats moved from  SimpleCPU are numBranches
and numInsts. From MinorCPU, the stat moved is numFetchSuspends. From
O3CPU, the stats moved are from the O3 fetch stage: Stat branches is
unified into numBranches, stat rate is renamed to fetchRate in Base,
stat insts is unified into numInsts, stat icacheStallCycles keeps the
same name in Base.

For ExecuteCPUStats, the stats moved from SimpleCPU are
dcacheStallCycles, numCCRegReads, numCCRegWrites,
numFpAluAccesses, numFpRegReads, numFpRegWrites, numIntAluAccesses,
numIntRegReads, numIntRegWrites, numMemRefs, numMiscRegReads,
numMiscRegWrites, numVecAluAccesses, numVecPredRegReads,
numVecPredRegWrites, numVecRegReads, numVecRegWrites. The stat moved
from MinorCPU is numDiscardedOps. From O3, the Regfile stats in CPU are
unified into the reg stats in Base and use the names found originally
in SimpleCPU. From O3 IEW stage, numInsts keeps the same name in
Base, numBranches is unified into numBranches in base, numNop keeps
the same name in Base, numRefs is unified into numMemRefs in Base,
numLoadInsts and numStoreInsts are moved into Base, numRate is renamed
to instRate in base.

For CommitCPUStats, the stats moved from SimpleCPU are
numCondCtrlInsts, numFpInsts, numIntInsts, numLoadInsts, numStoreInsts,
numVecInsts. The stats moved from MinorCPU are numInsts,
committedInstType, and committedControl. statExecutedInstType of
SimpleCPU is unified with committedInstType of MinorCPU. Implemented
committedControl stats from MinorCPU in Simple and O3 CPU. In MinorCPU,
this stat was a 2D vector, where the first dimension is the thread ID.
In base it is now a 1D vector that is tied to a thread ID via the
commitStats vector that the object is accessible through. From the O3
commit stage, committedInsts is renamed to numInstsNotNOP, committedOps
is renamed to numOpsNotNOP, instsCommitted is unified into numInsts,
opsCommitted is renamed to numOps, committedInstType is unified into
committedInstType from Minor, branches is removed because it duplicates
committedControl::IsControl, floating is unified into numFpInsts,
interger is unified into numIntInsts, loads is unified into
numLoadInsts, numStoreInsts is implemented for tracking in O3, memRefs
is renamed to numMemRefs, vectorInstructions is unified into
numVecInsts. Note that numCondCtrlInsts of Simple is unified into
committedControl::IsCondCtrl.

Implemented IPC and CPI tracking inside BaseCPU.
In BaseCPU::BaseCPUStats, numInsts and numOps track per CPU core
committed instructions and operations.
In BaseCPU::FetchCPUStats, numInsts and numOps track per thread
fetched instructions and operations.
In BaseCPU::CommitCPUStats, numInsts tracks per thread executed
instructions.
In BaseCPU::CommitCPUStats, numInsts and numOps track per thread
committed instructions and operations.
In BaseSimpleCPU, the countInst() function has been split into
countInst(), countFetchInst(), and countCommitInst(). The stat count
incrementation step of countInst() has been removed and delegated to the
other two functions. countFetchInst() increments numInsts and numOps
of the FetchCPUStats group for a thread. countCommitInst() increments
the numInsts and numOps of the CommitCPUStats group for a thread and
of the BaseCPUStats group for a CPU core. These functions are called
in the appropriate stage within timing.cc and atomic.cc. The call to
countInst() is left unchanged. countFetchInst() is called in
preExecute(). countCommitInst() is called in postExecute().
For MinorCPU, only the commit level numInsts and numOps stats have been
implemented.
IPC and CPI stats have been added to BaseCPUStats (core level) and
CommitCPUStats (thread level). The formulas for the IPC and CPI stats
in CommitCPUStats are set in the BaseCPU constructor, after the
CommitCPUStats stat group object has been created. These replace IPC,
CPI, totalIpc, and totalCpi stats in O3.

Replaced committedInsts stats of KVM CPU with commitStats.numInsts
of BaseCPU. This results in IPC and CPI printing in stats.txt for
KVM simulations.

This change does not implement most general stats found in one or two
model for all others.

Jira Ticket: https://gem5.atlassian.net/browse/GEM5-1304

Change-Id: I3c852f8dba3268c71b7a3415480fb63d8dc30cb7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66031
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Tested-by: kokoro <noreply+kokoro@google.com>
diff --git a/src/cpu/base.cc b/src/cpu/base.cc
index d2c0a78..1d29339 100644
--- a/src/cpu/base.cc
+++ b/src/cpu/base.cc
@@ -191,6 +191,11 @@
     modelResetPort.onChange([this](const bool &new_val) {
         setReset(new_val);
     });
+    // create a stat group object for each thread on this core
+    fetchStats.reserve(numThreads);
+    for (int i = 0; i < numThreads; i++) {
+        fetchStats.emplace_back(new FetchCPUStats(this, i));
+    }
 }
 
 void
@@ -827,4 +832,18 @@
     hostOpRate = simOps / hostSeconds;
 }
 
+BaseCPU::
+FetchCPUStats::FetchCPUStats(statistics::Group *parent, int thread_id)
+    : statistics::Group(parent, csprintf("fetchStats%i", thread_id).c_str()),
+    ADD_STAT(numBranches, statistics::units::Count::get(),
+             "Number of branches fetched"),
+    ADD_STAT(numFetchSuspends, statistics::units::Count::get(),
+             "Number of times Execute suspended instruction fetching")
+
+{
+    numBranches
+        .prereq(numBranches);
+
+}
+
 } // namespace gem5
diff --git a/src/cpu/base.hh b/src/cpu/base.hh
index 084d9b9..d6e5d38 100644
--- a/src/cpu/base.hh
+++ b/src/cpu/base.hh
@@ -43,6 +43,7 @@
 #define __CPU_BASE_HH__
 
 #include <vector>
+#include <memory>
 
 #include "arch/generic/interrupts.hh"
 #include "base/statistics.hh"
@@ -676,6 +677,21 @@
     const Cycles pwrGatingLatency;
     const bool powerGatingOnIdle;
     EventFunctionWrapper enterPwrGatingEvent;
+
+  public:
+    struct FetchCPUStats : public statistics::Group
+    {
+        FetchCPUStats(statistics::Group *parent, int thread_id);
+
+        /* Total number of branches fetched */
+        statistics::Scalar numBranches;
+
+        /* Number of times fetch was asked to suspend by Execute */
+        statistics::Scalar numFetchSuspends;
+
+    };
+
+    std::vector<std::unique_ptr<FetchCPUStats>> fetchStats;
 };
 
 } // namespace gem5
diff --git a/src/cpu/minor/execute.cc b/src/cpu/minor/execute.cc
index 5eaaf58..323ae29 100644
--- a/src/cpu/minor/execute.cc
+++ b/src/cpu/minor/execute.cc
@@ -1054,7 +1054,7 @@
             DPRINTF(MinorInterrupt, "Suspending thread: %d from Execute"
                 " inst: %s\n", thread_id, *inst);
 
-            cpu.stats.numFetchSuspends++;
+            cpu.fetchStats[thread_id]->numFetchSuspends++;
 
             updateBranchData(thread_id, BranchData::SuspendThread, inst,
                 resume_pc, branch);
diff --git a/src/cpu/minor/stats.cc b/src/cpu/minor/stats.cc
index 64d4c47..e9ca562 100644
--- a/src/cpu/minor/stats.cc
+++ b/src/cpu/minor/stats.cc
@@ -52,8 +52,6 @@
     ADD_STAT(numDiscardedOps, statistics::units::Count::get(),
              "Number of ops (including micro ops) which were discarded before "
              "commit"),
-    ADD_STAT(numFetchSuspends, statistics::units::Count::get(),
-             "Number of times Execute suspended instruction fetching"),
     ADD_STAT(quiesceCycles, statistics::units::Cycle::get(),
              "Total number of cycles that CPU has spent quiesced or waiting "
              "for an interrupt"),
diff --git a/src/cpu/minor/stats.hh b/src/cpu/minor/stats.hh
index 1ab81f4..524d20f 100644
--- a/src/cpu/minor/stats.hh
+++ b/src/cpu/minor/stats.hh
@@ -68,9 +68,6 @@
     /** Number of ops discarded before committing */
     statistics::Scalar numDiscardedOps;
 
-    /** Number of times fetch was asked to suspend by Execute */
-    statistics::Scalar numFetchSuspends;
-
     /** Number of cycles in quiescent state */
     statistics::Scalar quiesceCycles;
 
diff --git a/src/cpu/simple/base.cc b/src/cpu/simple/base.cc
index 768f63e..b2a11fd 100644
--- a/src/cpu/simple/base.cc
+++ b/src/cpu/simple/base.cc
@@ -396,7 +396,7 @@
     }
 
     if (curStaticInst->isControl()) {
-        ++t_info.execContextStats.numBranches;
+        ++fetchStats[t_info.thread->threadId()]->numBranches;
     }
 
     /* Power model statistics */
diff --git a/src/cpu/simple/exec_context.hh b/src/cpu/simple/exec_context.hh
index 0f20763..d4bb017 100644
--- a/src/cpu/simple/exec_context.hh
+++ b/src/cpu/simple/exec_context.hh
@@ -152,8 +152,6 @@
                        "ICache total stall cycles"),
               ADD_STAT(dcacheStallCycles, statistics::units::Cycle::get(),
                        "DCache total stall cycles"),
-              ADD_STAT(numBranches, statistics::units::Count::get(),
-                       "Number of branches fetched"),
               ADD_STAT(numPredictedBranches, statistics::units::Count::get(),
                        "Number of branches predicted as taken"),
               ADD_STAT(numBranchMispred, statistics::units::Count::get(),
@@ -203,9 +201,6 @@
             numIdleCycles = idleFraction * cpu->baseStats.numCycles;
             numBusyCycles = notIdleFraction * cpu->baseStats.numCycles;
 
-            numBranches
-                .prereq(numBranches);
-
             numPredictedBranches
                 .prereq(numPredictedBranches);
 
@@ -297,8 +292,6 @@
         statistics::Scalar dcacheStallCycles;
 
         /// @{
-        /// Total number of branches fetched
-        statistics::Scalar numBranches;
         /// Number of branches predicted as taken
         statistics::Scalar numPredictedBranches;
         /// Number of misprediced branches