blob: f7dc8f2a7d557bf433dba07921c08d1176284e01 [file] [log] [blame]
Please note: The IS code in this directory known as is.c is derived
from the serial version of the NPB2.3 parallel IS. Although for
the serial version it is completely unnecessary to have any notion
of buckets at all in order to correctly solve the specified NPB1 IS
benchmark problem, the buckets seem to be very beneficial in
parallel versions, including the OpenMP version.
Default setting is
#define USE_BUCKETS
i.e., buckets turned on! To switch it off, simply comment out
the line.
The OpenMP version uses the "dynamic" schedule to improve load
balance during key sorting. Sometime, the use of the "static,1"
(or cyclic) schedule may yield better performance. Both options
are acceptable. The default setting is "dynamic". To choose
the cyclic option, define the line:
#define SCHED_CYCLIC
Here some notes inherited from NPB2.3-serial:
Nevertheless, it is possible to turn on bucketing via #ifdef'ed code.
Then, the sort first rearranges the keys into buckets by range (the
bucket's ranges evenly subdivide the total key range), and then
ranks the contents of each bucket. This results in key transfers
first into contiguous elements of buckets. This is relatively
cache efficient, since there are a relatively small number of buckets.
Then the key counting that occurs accesses contiguous array elements.
Once again, accesses reuse cache lines efficiently. Finally, the
accumulation of key multiplicities (the key count) which gives the key
ranks also reuses cache line efficiently.
But using the buckets more than doubles the amount of computational
work that must be performed. On machines with very large caches, the
aforementioned benefits may not exist, and the extra processing looks
expensive. These examples apply to both CLASS A and B problems:
SP2-66MhzWN: 50% speedup with buckets
SGI Indy5000: 50% slowdown with buckets
SGI O2000: 400% slowdown with buckets (Wow!)
It is a conjecture that cache access is the underlying mechanism
causing these variations.
Note: If reporting timing results, either of these modes may be used
without penalty.