dev-amdgpu: Perform frame writes atomically
The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.
The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.
This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.
Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matthew Poremba <matthew.poremba@amd.com>
Reviewed-by: Matthew Poremba <matthew.poremba@amd.com>
(cherry picked from commit 079fc47dc202ffe7c77e1e94bb1d5e0ee38d1816)
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72079
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc
index 3260d05..d1058f1 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -349,6 +349,22 @@
}
nbio.writeFrame(pkt, offset);
+
+ /*
+ * Write the value to device memory. This must be done functionally
+ * because this method is called by the PCIDevice::write method which
+ * is a non-timing write.
+ */
+ RequestPtr req = std::make_shared<Request>(offset, pkt->getSize(), 0,
+ vramRequestorId());
+ PacketPtr writePkt = Packet::createWrite(req);
+ uint8_t *dataPtr = new uint8_t[pkt->getSize()];
+ std::memcpy(dataPtr, pkt->getPtr<uint8_t>(),
+ pkt->getSize() * sizeof(uint8_t));
+ writePkt->dataDynamic(dataPtr);
+
+ auto system = cp->shader()->gpuCmdProc.system();
+ system->getDeviceMemory(writePkt)->access(writePkt);
}
void
@@ -489,8 +505,6 @@
switch (barnum) {
case FRAMEBUFFER_BAR:
- gpuMemMgr->writeRequest(offset, pkt->getPtr<uint8_t>(),
- pkt->getSize(), 0, nullptr);
writeFrame(pkt, offset);
break;
case DOORBELL_BAR: