mem-ruby: Account for misaligned accesses in GPUCoalescer
Previously, we assumed that the maximum number of requests that would be
issued by an instruction was equal to the number of threads that were
active for that instruction.
However, if a thread has an access that crosses a cache line, that
thread has a misaligned access, and needs to request both cache lines.
This patch takes that into account by checking the status vector for
each thread in that instruction to determine the number of requests.
Tested-by: kokoro <email@example.com>
Reviewed-by: Matt Sinclair <firstname.lastname@example.org>
Reviewed-by: Matthew Poremba <email@example.com>
Maintainer: Matt Sinclair <firstname.lastname@example.org>
1 file changed