gpu-compute, arch-gcn3: Change how waitcnts are implemented Use single counters per memory operation type and increment them upon issue, not execute. Change-Id: I6afc0b66b21882538ef90a14a57a3ab3cc7bd6f3