dev-arm: Improper translation slot release in SMMUv3

The SMMUv3SlaveInterface is using the xlateSlotsRemaining to model a
limit on the number of translation requests it can receive from the
master device.

Patch

https://gem5-review.googlesource.com/c/public/gem5/+/19308/2

moved the resource acquire/release inside the SMMUTranslationProcess
constructor/destructor, for the sake of having a unique place for
calling the signalDrainDone.
While this is convenient, it breaks the original implementation,
which was freeing resources AFTER a translation has completed, but
BEFORE the final memory access (with the translated PA) is performed.
In other words the xlateSlotsRemaining is only modelling translation
slots and should be release once the PA gets produced.

The patch fixes this mismatch by restoring the resource release in
the right place (while keeping the acquire in the constructor)
and by adding a pendingMemAccess counter, which is keeping track
of a complete device memory request (translation + final access)
and will be used by the draining logic

Change-Id: I708fe2d0b6c96ed46f3f4f9a0512f8c1cc43a56c
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-by: Adrian Herrera <adrian.herrera@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/20260
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
diff --git a/src/dev/arm/smmu_v3_slaveifc.cc b/src/dev/arm/smmu_v3_slaveifc.cc
index fec480d..0ed6c4d 100644
--- a/src/dev/arm/smmu_v3_slaveifc.cc
+++ b/src/dev/arm/smmu_v3_slaveifc.cc
@@ -67,6 +67,7 @@
     portWidth(p->port_width),
     wrBufSlotsRemaining(p->wrbuf_slots),
     xlateSlotsRemaining(p->xlate_slots),
+    pendingMemAccesses(0),
     prefetchEnable(p->prefetch_enable),
     prefetchReserveLastWay(
         p->prefetch_reserve_last_way),
diff --git a/src/dev/arm/smmu_v3_slaveifc.hh b/src/dev/arm/smmu_v3_slaveifc.hh
index 5759a8f..3e03ae4 100644
--- a/src/dev/arm/smmu_v3_slaveifc.hh
+++ b/src/dev/arm/smmu_v3_slaveifc.hh
@@ -83,6 +83,7 @@
 
     unsigned wrBufSlotsRemaining;
     unsigned xlateSlotsRemaining;
+    unsigned pendingMemAccesses;
 
     const bool prefetchEnable;
     const bool prefetchReserveLastWay;
diff --git a/src/dev/arm/smmu_v3_transl.cc b/src/dev/arm/smmu_v3_transl.cc
index d7d5768..429cc2b 100644
--- a/src/dev/arm/smmu_v3_transl.cc
+++ b/src/dev/arm/smmu_v3_transl.cc
@@ -87,16 +87,20 @@
     // Decrease number of pending translation slots on the slave interface
     assert(ifc.xlateSlotsRemaining > 0);
     ifc.xlateSlotsRemaining--;
+
+    ifc.pendingMemAccesses++;
     reinit();
 }
 
 SMMUTranslationProcess::~SMMUTranslationProcess()
 {
     // Increase number of pending translation slots on the slave interface
-    ifc.xlateSlotsRemaining++;
-    // If no more SMMU translations are pending (all slots available),
+    assert(ifc.pendingMemAccesses > 0);
+    ifc.pendingMemAccesses--;
+
+    // If no more SMMU memory accesses are pending,
     // signal SMMU Slave Interface as drained
-    if (ifc.xlateSlotsRemaining == ifc.params()->xlate_slots) {
+    if (ifc.pendingMemAccesses == 0) {
         ifc.signalDrainDone();
     }
 }
@@ -1232,6 +1236,7 @@
 
 
     smmu.translationTimeDist.sample(curTick() - recvTick);
+    ifc.xlateSlotsRemaining++;
     if (!request.isAtsRequest && request.isWrite)
         ifc.wrBufSlotsRemaining +=
             (request.size + (ifc.portWidth-1)) / ifc.portWidth;
@@ -1279,6 +1284,8 @@
 void
 SMMUTranslationProcess::completePrefetch(Yield &yield)
 {
+    ifc.xlateSlotsRemaining++;
+
     SMMUAction a;
     a.type = ACTION_TERMINATE;
     a.pkt = NULL;