dev-amdgpu: Implement SDMA atomic packet

SDMA atomic packets are used in conjunction with RLC queues in SDMA for
synchronization similar to how HSA signals are used with BLIT kernels
when SDMA is disabled. Implement a skeleton of the SDMA atomic packet
methods as well as the atomic add64 operation.

The atomic add operation appears to be the only operation used in ROCm,
so this implementation is fairly complete. See:

https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/
    rocm-4.2.x/src/core/runtime/amd_blit_sdma.cpp#L880

Change-Id: I62cc337f2ffe590bdb947b48053760ee8b3a6f32
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/63174
Reviewed-by: Matt Sinclair <mattdsinclair@gmail.com>
Maintainer: Matt Sinclair <mattdsinclair@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
3 files changed