src/npb/disk-image/npb/npb-hooks/NPB3.3.1/NPB3.3-MPI/README.install - public/gem5-resources - Git at Google

 Some explanations on the MPI implementation of NPB 3.3 (NPB3.3-MPI)
 ----------------------------------------------------------------------

 NPB-MPI is a sample MPI implementation based on NPB2.4 and NPB3.0-SER.
 This implementation contains all eight original benchmarks:
 Seven in Fortran: BT, SP, LU, FT, CG, MG, and EP; one in C: IS,
 as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.

 For changes from different versions, see the Changes.log file
 included in the upper directory of this distribution.

 This version has been tested, among others, on an SGI Origin3000 and
 an SGI Altix.  For problem reports and suggestions on the implementation,
 please contact

    NAS Parallel Benchmark Team
    npb@nas.nasa.gov


 CAUTION *********************************
 When running the I/O benchmark, one or more data files will be written
 in the directory from which the executable is invoked. They are not
 deleted at the end of the program. A new run will overwrite the old
 file(s). If not enough space is available in the user partition, the
 program will fail. For classes C and D the disk space required is
 3 GB and 135 GB, respectively.
 *****************************************


 1. Compilation

    NPB3-MPI uses the same directory tree as NPB3-SER (and NPB2.x) does.
    Before compilation, one needs to check the configuration file
    'make.def' in the config directory and modify the file if necessary.
    If it does not (yet) exist, copy 'make.def.template' or one of the
    sample files in the NAS.samples subdirectory to 'make.def' and
    edit the content for site- and machine-specific data.  Then

        make <benchmark-name> NPROCS=<number> CLASS=<class> \
          [SUBTYPE=<type>] [VERSION=VEC]

    where <benchmark-name>  is "bt", "cg", "dt", "ep", "ft", "is",
                               "lu", "mg", or "sp"
          <number>          is the number of processes
          <class>           is "S", "W", "A", "B", "C", "D", or "E"

    Classes C, D and E are not available for DT.
    Class E is not available for IS.

    The "VERSION=VEC" option is used for selecting the vectorized
    versions of BT and LU.

    Only when making the I/O benchmark:
          <benchmark-name>  is "bt"
          <number>, <class> as above
          <type>            is "full", "simple", "fortran", or "epio"

    Three parameters not used in the original BT benchmark are present in
    the I/O benchmark. Two are set by default in the file BT/bt.f.
    Changing them is optional.
    One is set in make.def. It must be specified.

    bt.f: collbuf_nodes: number of processes used to buffer data before
                         writing to file in the collective buffering mode
                         (<type> is "full").
          collbuf_size:  size of buffer (in bytes) per process used in
                         collective buffering

    make.def: -DFORTRAN_REC_SIZE: Fortran I/O record length in bytes. This
                         is a system-specific value. It is part of the
                         definition string of variable CONVERTFLAG. Syntax:
                         "CONVERTFLAG = -DFORTRAN_REC_SIZE=n", where n is
                         the record length.

    When <type> is "full" or "simple", the code must be linked with an
    MPI library that contains the subset of IO routines defined in MPI 2.


    Class D for IS (Integer Sort) requires a compiler/system that
    supports the "long" type in C to be 64-bit.  As examples, the SGI
    MIPS compiler for the SGI Origin using the "-64" compilation flag and
    the Intel compiler for IA64 are known to work.


    The above procedure allows you to build one benchmark
    at a time. To build a whole suite, you can type "make suite"
    Make will look in file "config/suite.def" for a list of
    executables to build. The file contains one line per specification,
    with comments preceded by "#". Each line contains the name
    of a benchmark, the class, and the number of processors, separated
    by spaces or tabs. config/suite.def.template contains an example
    of such a file.


    The benchmarks have been designed so that they can be run
    on a single processor without an MPI library. A few "dummy"
    MPI routines are still required for linking. For convenience
    such a library is supplied in the "MPI_dummy" subdirectory of
    the distribution. It contains an mpif.h and mpi.f include files
    which must be used as well. The dummy library is built and
    linked automatically and paths to the include files are defined
    by inserting the line "include ../config/make.dummy" into the
    make.def file (see example in make.def.template). Make sure to
    read the warnings in the README file in "MPI_dummy".The use of
    the library is fragile and can produce unexpected errors.


    ================================

    The "RAND" variable in make.def
    --------------------------------

    Most of the NPBs use a random number generator. In two of the NPBs (FT
    and EP) the computation of random numbers is included in the timed
    part of the calculation, and it is important that the random number
    generator be efficient.  The default random number generator package
    provided is called "randi8" and should be used where possible. It has
    the following requirements:

    randi8:
      1. Uses integer*8 arithmetic. Compiler must support integer*8
      2. Uses the Fortran 90 IAND intrinsic. Compiler must support IAND.
      3. Assumes overflow bits are discarded by the hardware. In particular,
         that the lowest 46 bits of a*b are always correct, even if the
         result a*b is larger than 2^64.

    Since randi8 may not work on all machines, we supply the following
    alternatives:

    randi8_safe
      1. Uses integer*8 arithmetic
      2. Uses the Fortran 90 IBITS intrinsic.
      3. Does not make any assumptions about overflow. Should always
         work correctly if compiler supports integer*8 and IBITS.

    randdp
      1. Uses double precision arithmetic (to simulate integer*8 operations).
         Should work with any system with support for 64-bit floating
         point arithmetic.

    randdpvec
      1. Similar to randdp but written to be easier to vectorize.


 2. Execution

    The executable is named <benchmark-name>.<class>.<nprocs>[.<suffix>],
    where <suffix> is "fortran_io", "mpi_io_simple",  "ep_io", or
                      "mpi_io_full"
    The executable is placed in the bin subdirectory (or in the directory
    BINDIR specified in make.def, if you've defined it). The method for
    running the MPI program depends on your local system.
    When any of the I/O benchmarks is run (non-empty subtype), one or
    more output files are created, and placed in the directory from which
    the program was started. These are not removed automatically, and
    will be overwritten the next time an IO benchmark is run.

    To enable additional timers in several benchmarks at runtime, create
    a dummy file 'timer.flag' in the working directory before executing
    a benchmark.
	Some explanations on the MPI implementation of NPB 3.3 (NPB3.3-MPI)
	----------------------------------------------------------------------

	NPB-MPI is a sample MPI implementation based on NPB2.4 and NPB3.0-SER.
	This implementation contains all eight original benchmarks:
	Seven in Fortran: BT, SP, LU, FT, CG, MG, and EP; one in C: IS,
	as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.

	For changes from different versions, see the Changes.log file
	included in the upper directory of this distribution.

	This version has been tested, among others, on an SGI Origin3000 and
	an SGI Altix. For problem reports and suggestions on the implementation,
	please contact

	NAS Parallel Benchmark Team
	npb@nas.nasa.gov


	CAUTION *********************************
	When running the I/O benchmark, one or more data files will be written
	in the directory from which the executable is invoked. They are not
	deleted at the end of the program. A new run will overwrite the old
	file(s). If not enough space is available in the user partition, the
	program will fail. For classes C and D the disk space required is
	3 GB and 135 GB, respectively.
	*****************************************


	1. Compilation

	NPB3-MPI uses the same directory tree as NPB3-SER (and NPB2.x) does.
	Before compilation, one needs to check the configuration file
	'make.def' in the config directory and modify the file if necessary.
	If it does not (yet) exist, copy 'make.def.template' or one of the
	sample files in the NAS.samples subdirectory to 'make.def' and
	edit the content for site- and machine-specific data. Then

	make <benchmark-name> NPROCS=<number> CLASS=<class> \
	[SUBTYPE=<type>] [VERSION=VEC]

	where <benchmark-name> is "bt", "cg", "dt", "ep", "ft", "is",
	"lu", "mg", or "sp"
	<number> is the number of processes
	<class> is "S", "W", "A", "B", "C", "D", or "E"

	Classes C, D and E are not available for DT.
	Class E is not available for IS.

	The "VERSION=VEC" option is used for selecting the vectorized
	versions of BT and LU.

	Only when making the I/O benchmark:
	<benchmark-name> is "bt"
	<number>, <class> as above
	<type> is "full", "simple", "fortran", or "epio"

	Three parameters not used in the original BT benchmark are present in
	the I/O benchmark. Two are set by default in the file BT/bt.f.
	Changing them is optional.
	One is set in make.def. It must be specified.

	bt.f: collbuf_nodes: number of processes used to buffer data before
	writing to file in the collective buffering mode
	(<type> is "full").
	collbuf_size: size of buffer (in bytes) per process used in
	collective buffering

	make.def: -DFORTRAN_REC_SIZE: Fortran I/O record length in bytes. This
	is a system-specific value. It is part of the
	definition string of variable CONVERTFLAG. Syntax:
	"CONVERTFLAG = -DFORTRAN_REC_SIZE=n", where n is
	the record length.

	When <type> is "full" or "simple", the code must be linked with an
	MPI library that contains the subset of IO routines defined in MPI 2.


	Class D for IS (Integer Sort) requires a compiler/system that
	supports the "long" type in C to be 64-bit. As examples, the SGI
	MIPS compiler for the SGI Origin using the "-64" compilation flag and
	the Intel compiler for IA64 are known to work.


	The above procedure allows you to build one benchmark
	at a time. To build a whole suite, you can type "make suite"
	Make will look in file "config/suite.def" for a list of
	executables to build. The file contains one line per specification,
	with comments preceded by "#". Each line contains the name
	of a benchmark, the class, and the number of processors, separated
	by spaces or tabs. config/suite.def.template contains an example
	of such a file.


	The benchmarks have been designed so that they can be run
	on a single processor without an MPI library. A few "dummy"
	MPI routines are still required for linking. For convenience
	such a library is supplied in the "MPI_dummy" subdirectory of
	the distribution. It contains an mpif.h and mpi.f include files
	which must be used as well. The dummy library is built and
	linked automatically and paths to the include files are defined
	by inserting the line "include ../config/make.dummy" into the
	make.def file (see example in make.def.template). Make sure to
	read the warnings in the README file in "MPI_dummy".The use of
	the library is fragile and can produce unexpected errors.


	================================

	The "RAND" variable in make.def
	--------------------------------

	Most of the NPBs use a random number generator. In two of the NPBs (FT
	and EP) the computation of random numbers is included in the timed
	part of the calculation, and it is important that the random number
	generator be efficient. The default random number generator package
	provided is called "randi8" and should be used where possible. It has
	the following requirements:

	randi8:
	1. Uses integer8 arithmetic. Compiler must support integer8
	2. Uses the Fortran 90 IAND intrinsic. Compiler must support IAND.
	3. Assumes overflow bits are discarded by the hardware. In particular,
	that the lowest 46 bits of a*b are always correct, even if the
	result a*b is larger than 2^64.

	Since randi8 may not work on all machines, we supply the following
	alternatives:

	randi8_safe
	1. Uses integer*8 arithmetic
	2. Uses the Fortran 90 IBITS intrinsic.
	3. Does not make any assumptions about overflow. Should always
	work correctly if compiler supports integer*8 and IBITS.

	randdp
	1. Uses double precision arithmetic (to simulate integer*8 operations).
	Should work with any system with support for 64-bit floating
	point arithmetic.

	randdpvec
	1. Similar to randdp but written to be easier to vectorize.


	2. Execution

	The executable is named <benchmark-name>.<class>.<nprocs>[.<suffix>],
	where <suffix> is "fortran_io", "mpi_io_simple", "ep_io", or
	"mpi_io_full"
	The executable is placed in the bin subdirectory (or in the directory
	BINDIR specified in make.def, if you've defined it). The method for
	running the MPI program depends on your local system.
	When any of the I/O benchmarks is run (non-empty subtype), one or
	more output files are created, and placed in the directory from which
	the program was started. These are not removed automatically, and
	will be overwritten the next time an IO benchmark is run.

	To enable additional timers in several benchmarks at runtime, create
	a dummy file 'timer.flag' in the working directory before executing
	a benchmark.