gpu-compute, arch-gcn3: Change how waitcnts are implemented

Use single counters per memory operation type and increment
them upon issue, not execute.

Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3
8 files changed