blob: ccf61a355063cdd97824cfd655a32e367a97da8b [file] [log] [blame]
###########################################
# Modification History of NPB3.x #
# ------------------------------ #
# NPB development team #
# NASA Ames Research Center #
# npb@nas.nasa.gov #
# http://www.nas.nasa.gov/Software/NPB/ #
###########################################
------------------------------------------------------
Changes in NPB3.3.1
(NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI )
------------------------------------------------------
[17-Feb-09]
This is a bug fixing release of NPB3.3.
1. All versions
- sys/setparams.c: fixed a problem in dealing with quoted (") flags
from make.def when producing npbparams.h for C.
- CG: ensure 'implicit none' used in all subroutines.
2. MPI version
- Additional timers can be used for profiling purpose, similar
to those already included in the OMP and SER versions.
- LU:
* code clean up (suggested by Rob Van der Wijngaart)
> avoid using MPI_ANY_SOURCE in exchange_*.f, which might
alter performance in some cases.
> delete references to sethyper and 'icomm*', which are
no longer used since NPB2.2.
* change the low-bound limit on the sub-domain size in subdomain.f
from 4 to 3 in order to increase allowable process counts.
* allow number of processes other than power of two.
- FT: fix a non-portable way of broadcasting input parameters
(pointed out by Art Lazanoff)
- BT: include 'btio_cleanup' as part of the I/O timing
3. OMP and SER versions
- DC: fix access to out-of-bound array elements in adc.c
Reported by Per Larsen of Demark <pl@imm.dtu.dk>
- UA: fix the use of uninitialized array 'sje' in mortar_vertex() by
adding "call nr_init[_omp](sje,4*6*nelt,0)" in the main program.
- MG, UA: include additional timers for profiling purpose.
- Executables now use ".x" as a name extension
------------------------------------------------------
Changes in NPB3.3
(NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI )
------------------------------------------------------
[02-Aug-07]
1. New and improvements
- The Class E problem has been introduced in seven of the benchmarks
(BT, SP, LU, CG, MG, FT, and EP) in all three implementations.
- The Class D problem has been added to the IS benchmark in all
three implementations. It requires the compiler support of
64-bit "long" type in C. The MPI version of IS now allows runs
up to 1024 processes.
- The Bucket Sort option (USE_BUCKETS) has been added to
the OpenMP version of IS and made as the default.
- Introduced the "twiddle" array in the OpenMP FT benchmark,
which has been used in the MPI and SER versions and seems
to improve performance for larger problem sizes.
- Merged vector codes for the BT and LU benchmarks into
the release.
- Updates to BTIO (MPI/BT with IO subtypes):
* added I/O stats (I/O timing, data size written, I/O data rate)
* added an option for interleaving reads between writes through
the inputbt.data file. Although the data file size would be
smaller as a result, the total amount of data written is still
the same.
- Made documents more consistent throughout different versions
(README and README.install).
2. Bug fixes
- MPI/FT: fixed a verification failure for cases where NX/=NY
and the 2D decomposition are used. The bug occurred at least
for (Class D, NPROCS=2048) and (Class B, NPROCS=512).
fixed an output printing format problem occurred when
the number of processes >= 1000.
- MPI/SP: fixed a performance regression due to improper
padding of array dimensions.
- MPI/IS: minor fix to support large processor counts (>=512).
- OMP/UA: fixed a race condition in mason.f, avoided the use
of the LASTPRIVATE directive.
- OMP/LU: minor fix in data flushing for pipelining.
- DC: There are a number of fixes -
* fixed segmentation fault in both OMP and SER versions
caused by accessing zero-length array elements.
Reported by Jeff Odom <jodom@cs.umd.edu>.
* fixed a race in reporting benchmark timing in the OMP version
* fixed the use of timer in the OMP version, which limited
the number of threads to 64. The number of threads is now
lifted to a maximum of MAX_NUMBER_OF_TASKS (=256).
* made the benchmark output consistent with other NPBs.
- fixed a use of uninitialized variable in MPI/sys/setparams.c.
setparams in all three versions was updated to deal with
make.def that contains carriage-return character ('\r').
- SER/FT: added 'implicit none' to all missing places.
- SER/IS: fixed missing variable declarations for the Bucket
Sort option (when USE_BUCKETS is defined).
3. Others
- The default value for collbuf_nodes in the BT I/O benchmark
is now set to 0, indicating no file hints will be used.
The setting can be changed by using the "inputbt.data" file.
- The hyperplane version of LU (LU-HP) is no longer included
in the distribution.
------------------------------------------------------
Changes in NPB3.2.1
(NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI )
------------------------------------------------------
[27-Jul-05]
This is a bug fixing release of NPB3.2.
1. MPI version
- sys/setparams.c: removed a duplicated statement for writing
FT parameters and made invalid SUBTYPE as an error condition.
The 'duplicated statement' problem was fixed in NPB3.2 (See
the note below). However, during the final updating process,
the fix was left out, even though the log file was updated.
- BT: included SUBTYPE=EPIO in the I/O verification.
- LU: bcast_inputs.f: fixed wrong data type (dp_type) used for
communicating integers (nx0,ny0,nz0) with the correct type
MPI_INTEGER.
- MG: fixed a mis-calculation of parameter "nr" in globals.h
that caused run-time failure for NPROCS >= 512
(reported by Donald Ferry of Cray). Expanded to limit to
131072 processes and added an error checking code.
The use of MPI_ANY_SOURCE for MPI_Irecv inside subroutine
ready() could cause MPI_Wait return a message meant for
the wrong k. The problem is fixed with nbr(axis,-dir,k)
in place of MPI_ANY_SOURCE in the call to MPI_Irecv
(reported and suggested by Hideo Saito).
2. OpenMP version
- EP: use THREADPRIVATE for working array storage. It should not
change performance but made some compiler happier.
- LU: add variable "v" to FLUSH to ensure solution data properly
flushed for pipeline. This change is needed according to
the OpenMP 2.5 standard.
- IS: reorganized working buffers so that the count for key
population could be more naturally performed. This version
uses much less stack space.
- UA: implemented atomic updates with locks in order to achieve
better scaling on those systems that have an inefficient
(or even buggy) ATOMIC implementation.
------------------------------------------------------
Changes in NPB3.2
(NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI )
------------------------------------------------------
[07-Jan-05]
1. DC version in NPB3.2-SER was converted to C from C++
(CLASSES S, W, A, B).
sys/setparams.c file was changed appropriately.
2. OpenMP version of DC was added to NPB3.2-OMP.
3. Data Traffic benchmark DT was added to NPB3.2-MPI.
[24-May-04]
All versions:
- use assumed shape "(*)" declaration in CG
- fixed the use of an uninitialized variable in EP
- avoid using integer array for assumed shape dimensions in FT
- fix in UA:
* fix the reference to file "inputua.data"
* avoid overindexing
* avoid reference to out-of-bound array elements
* change declaration "real*8" to "double precision"
OMP version:
- explicitly added "SCHEDULE(STATIC)" to the OMP version
- use the "omp_get_wtime()" function for timer if available
- removed the call to "getenv" for portability
- change in UA:
* implemented an alternative approach for atomic update
MPI version:
- removed a duplicated declaration in FT (from setparams.c)
- removed a duplicated declaration in BT/full_mpiio.f
- fixed a missing "NPROCS=" in sys/suite.awk
------------------------------------------------------
Changes in NPB3.1
(NPB3.1-MPI, NPB3.1-SER, NPB3.1-OMP)
------------------------------------------------------
[22-Apr-04] NPB3.1-MPI
Merged the NPB2.4-MPI branch into NPB3.1 with the following changes.
- Optimized the BT memory usage. The new version is about 1/3 of
the memory used in NPB2.x.
- Fixed a bug in CG for running on a large number of processes
- Redefined the Class W size in MG so that the verification value
will not be too small. (see below for SER & OMP versions)
- Use the relative errors for verification in both CG and MG
- Fixed a race in 'make suite'
[08-Apr-04] NPB3.1-SER and NPB3.1-OMP
The following changes are made in both NPB3.1-SER and NPB3.1-OMP.
1. Added the Class D problem
- verification values taken from NPB2.4-MPI
- modified variables to fit in large problem
2. Improvements for LU and LU-HP:
- reduced the memory usage for the 'tv' variable in LU and LU-HP
- a more efficient memory access for variables "a,b,c,d" in LU-HP
- a dummy iteration added before the time step loop for consistency
with other benchmarks
3. Improvement and fix in MG:
- verification in MG now uses the relative error
(instead of the absolute error). This will avoid incorrect
verification for small reference values.
- redefined the class size for Class W so that the verification
value will not be too small.
In version 3.0 and earlier: 64x64x64, 40 iters
New size in version 3.1 : 128x128x128, 4 iters
- fixed incorrect verification values for Classes A and C.
4. CG:
- use relative error for verification
- clean up codes for matrix initialization (makea).
The new code uses about 1/2 memory of the previous version.
5. Fixed makefile related issues
- fixed dependence on make.def for files in common.
- fixed a race in 'make suite'
- added 'LU-HP' as a valid benchmark option in makefiles
The following changes are made in NPB3.1-OMP.
1. Included a hyper-plane version of the LU benchmark: LU-HP
- based on the serial version
2. The dummy 'omp_lib_dum' library is not longer used for compilation
without an OpenMP compiler. Conditional compilation is now used.
3. Parallelization of the initialization part of MG.
It improves the turn-around time quite a bit for the larger
classes, such as class D.
4. Parallelize codes for matrix initialization (makea) in CG.
The new code uses about 2/3 memory of the version in NPB3.0-OMP.
5. Code clean up in SP so that the structure is more consistent
with the serial version.
------------------------------------------------------
Changes in NPB2.x MPI version
------------------------------------------------------
Changes in 2.4.1
- fixed error in BT/Makefile (replaced "==" with "=")
- added stub function accumulate_norms in BT/btio.f
- changed type of Class B verification constants in BT/verify.f from
single to double precision
Changes in 2.4
- Added I/O benchmark (subtype of BT).
- Added Class D for all benchmarks except IS.
- Reduced size of tabulated exponentials in FT.
- Made minor changes to FT to prevent integer overflow for class D on
systems with 32-bit integers. FT class D will not run on small
numbers of processors anymore.
------------------------------------------------------
Changes in non-MPI versions of NPB (previously PBN3.0)
(NPB3.0-SER, NPB3.0-HPF, NPB3.0-OMP, NPB3.0-JAV)
------------------------------------------------------
[01-Mar-99] Initial Beta Release.
[06-Apr-99] Based on report from Charles Grassl and Ramesh Menon (SGI).
1. NPB-SER, FT: file auxfnct.f -
lines 74 and 75 were interchanged:
double complex u0(d1+1,d2,d3), tmp(maxdim)
integer d1,d2,d3
2. NPB-OMP: The OpenMP standards requires reduction variable be scalars,
thus, changes made to remove the use of array variable for reduction.
Relevant modifications in EP, CG, LU, SP, and BT
3. NPB-OMP: Remove compiler warnings of "Referenced scalar variables
use defaults" by declaring explicitly as shared.
Relevant modifications in FT, LU, and BT
4. NPB-OMP, README.openmp: Explicitly spell out the requirement of
the static scheduling (setenv OMP_SCHEDULE "static").
[05-Oct-99] NPB3.0-non-MPI Beta Release (02)
General change to all (NPB-SER, NPB-HPF, NPB-OMP) -
1. Update header information for all benchmarks.
2. Allow continuation lines in 'make.def' (modification done
in sys/setparams.c).
Change made in NPB-OMP -
1. 'print_results' now prints Number-Of-Threads and Mflops/s/thread.
The printed number is the activated threads during the run, which
may not be the same as what's requested.
2. A initial data touch loop for array A is added in CG.
3. 'CRITICAL' section is used for reduction with array.
Relevant changes in EP, CG, LU, SP, and BT.
4. Reconfigure 'make.def' such that 'omp_lib_dum' can be activated
from the file for no directive compilation.
5. The "!$OMP END DO" seems needed before "!$OMP MASTER" in rhs.f
for both BT and SP for some f90 compilers.
6. "SCHEDULE(STATIC)" are used for the pipeline in LU to ensure
compliance with the OMP standard.
Change made in NPB-HPF -
1. 'print_results' now prints Number-Of-Processes and Mflops/s/process.
2. Use more consistent output format (via print_results).
3. More consistent makefiles (via config/make.def).
[04-Apr-00] NPB3.0-non-MPI Beta Release (03)
Change made in NPB-OMP -
1. The OpenMP-C version of IS has been added, including more timers.
2. 'cprint_results' includes Number-Of-Threads and Mflops/s/thread.
Change made in NPB-SER -
1. More timers included in IS.
NPB-JAV has been included in NPB3.0-non-MPI.
[31-May-01] NPB3.0-non-MPI Beta Release (04)
Change made in NPB-OMP -
1. NPB-OMP/LU: Failure in verification for number of threads greater
than the problem size is now fixed.
2. If OMP_NUM_THREADS is unset, the printout will report as "unset"
instead of "1"
3. NPB-OMP/IS: Allocating work_buff on the stack seems to cause problem
for large problem size (CLASS C). "work_buff" is now allocated
by "malloc" on the heap for CLASS C.
4. NPB-OMP/IS: Reported by <RaeLyn.Crowell@compaq.com> - potential
synchronization problem could arise due to the use of "static"
variables inside "randlc()". Declaration of these static variables
are moved out of randlc() and put in the threadprivate directive.
General change to all (NPB-SER, NPB-HPF, NPB-OMP) -
1. Cleanup in makefiles
[28-Aug-02] The Official NPB3.0 Release
Change made in all -
1. Fixed a bogus verification for "NaN".
2. Name change from "PBN3.0" to "NPB3.0". Updated all the banners.
3. NPB-SER/FT: use a derived version from NPB2.3-serial.
4. NPB-HPF/FT: use a consistent printing format.