arch-gcn3: Implement large ds_read/write instructions

This implements the 96 and 128b ds_read/write instructions in a similar
fashion to the 3 and 4 dword flat_load/store instructions.

These instructions are treated as reads/writes of 3 or 4 dwords, instead
of as a single 96b/128b memory transaction, due to the limitations of
the VecOperand class used in the amdgpu code.

In order to handle treating the memory transaction as multiple dwords,
the patch also adds in new initMemRead/initMemWrite functions for ds
instructions. These are similar to the functions used in flat
instructions for the same purpose.

Change-Id: I0f2ba3cb7cf040abb876e6eae55a6d38149ee960
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/48342
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alex Dutu <alexandru.dutu@amd.com>
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
3 files changed