src/npb/disk-image/npb/npb-hooks/NPB3.3.1/NPB3.3-OMP/MG/README - public/gem5-resources - Git at Google

 Some info about the MG benchmark
 (Note: this info applies to the parallel version and mostly concerns
 the processor decomposition.  Info not concerning the decomposition
 still applies to the serial version.)
 ================================

 'mg_demo' demonstrates the capabilities of a very simple multigrid
 solver in computing a three dimensional potential field.  This is
 a simplified multigrid solver in two important respects:

   (1) it solves only a constant coefficient equation,
   and that only on a uniform cubical grid,

   (2) it solves only a single equation, representing
   a scalar field rather than a vector field.

 We chose it for its portability and simplicity, and expect that a
 supercomputer which can run it effectively will also be able to
 run more complex multigrid programs at least as well.

      Eric Barszcz                         Paul Frederickson
      RIACS
      NASA Ames Research Center            NASA Ames Research Center

 ========================================================================
 Running the program:  (Note: also see parameter lm information in the
                        two sections immediately below this section)

 The program may be run with or without an input deck (called "mg.input").
 The following describes a few things about the input deck if you want to
 use one.

 The four lines below are the "mg.input" file required to run a
 problem of total size 256x256x256, for 4 iterations (Class "A"),
 and presumes the use of 8 processors:

    8 = top level
    256 256 256 = nx ny nz
    4 = nit
    0 0 0 0 0 0 0 0 = debug_vec

 The first line of input indicates how many levels of multi-grid
 cycle will be applied to a particular subpartition.  Presuming that
 8 processors are solving this problem (recall that the number of
 processors is specified to MPI as a run parameter, and MPI subsequently
 determines this for the code via an MPI subroutine call), a 2x2x2
 processor grid is  formed, and thus each partition on a processor is
 of size 128x128x128.  Therefore, a maximum of 8 multi-grid levels may
 be used.  These are of size 128,64,32,16,8,4,2,1, with the coarsest
 level being a single point on a given processor.


 Next, consider the same size problem but running on 1 processor.  The
 following "mg.input" file is appropriate:

     9 = top level
     256 256 256 = nx ny nz
     4 = nit
     0 0 0 0 0 0 0 0 = debug_vec

 Since this processor must solve the full 256x256x256 problem, this
 permits 9 multi-grid levels (256,128,64,32,16,8,4,2,1), resulting in
 a coarsest multi-grid level of a single point on the processor


 Next, consider the same size problem but running on 2 processors.  The
 following "mg.input" file is required:

     8 = top level
     256 256 256 = nx ny nz
     4 = nit
     0 0 0 0 0 0 0 0 = debug_vec

 The algorithm for partitioning the full grid onto some power of 2 number
 of processors is to start by splitting the last dimension of the grid
 (z dimension) in 2: the problem is now partitioned onto 2 processors.
 Next the middle dimension (y dimension) is split in 2: the problem is now
 partitioned onto 4 processors.  Next, first dimension (x dimension) is
 split in 2: the problem is now partitioned onto 8 processors.  Next, the
 last dimension (z dimension) is split again in 2: the problem is now
 partitioned onto 16 processors.  This partitioning is repeated until all
 of the power of 2 processors have been allocated.

 Thus to run the above problem on 2 processors, the grid partitioning
 algorithm will allocate the two processors across the last dimension,
 creating two partitions each of size 256x256x128. The coarsest level of
 multi-grid must be a single point surrounded by a cubic number of grid
 points.  Therefore, each of the two processor partitions will contain 4
 coarsest multi-grid level points, each surrounded by a cube of grid points
 of size 128x128x128, indicated by a top level of 8.


 Next, consider the same size problem but running on 4 processors.  The
 following "mg.input" file is required:

     8 = top level
     256 256 256 = nx ny nz
     4 = nit
     0 0 0 0 0 0 0 0 = debug_vec

 The partitioning algorithm will create 4 partitions, each of size
 256x128x128.  Each partition will contain 2 coarsest multi-grid level
 points each surrounded by a cube of grid points of size 128x128x128,
 indicated by a top level of 8.


 Next, consider the same size problem but running on 16 processors.  The
 following "mg.input" file is required:

     7 = top level
     256 256 256 = nx ny nz
     4 = nit
     0 0 0 0 0 0 0 0 = debug_vec

 On each node a partition of size 128x128x64 will be created.  A maximum
 of 7 multi-grid levels (64,32,16,8,4,2,1) may be used, resulting in each
 partions containing 4 coarsest multi-grid level points, each surrounded
 by a cube of grid points of size 64x64x64, indicated by a top level of 7.


 Note that non-cubic problem sizes may also be considered:

 The four lines below are the "mg.input" file appropriate for running a
 problem of total size 256x512x512, for 20 iterations and presumes the
 use of 32 processors (note: this is NOT a class C problem):

     8 = top level
     256 512 512 = nx ny nz
     20 = nit
     0 0 0 0 0 0 0 0 = debug_vec

 The first line of input indicates how many levels of multi-grid
 cycle will be applied to a particular subpartition.  Presuming that
 32 processors are solving this problem, a 2x4x4 processor grid is
 formed, and thus each partition on a processor is of size 128x128x128.
 Therefore, a maximum of 8 multi-grid levels may be used.  These are of
 size 128,64,32,16,8,4,2,1, with the coarsest level being a single
 point on a given processor.
	Some info about the MG benchmark
	(Note: this info applies to the parallel version and mostly concerns
	the processor decomposition. Info not concerning the decomposition
	still applies to the serial version.)
	================================

	'mg_demo' demonstrates the capabilities of a very simple multigrid
	solver in computing a three dimensional potential field. This is
	a simplified multigrid solver in two important respects:

	(1) it solves only a constant coefficient equation,
	and that only on a uniform cubical grid,

	(2) it solves only a single equation, representing
	a scalar field rather than a vector field.

	We chose it for its portability and simplicity, and expect that a
	supercomputer which can run it effectively will also be able to
	run more complex multigrid programs at least as well.

	Eric Barszcz Paul Frederickson
	RIACS
	NASA Ames Research Center NASA Ames Research Center

	========================================================================
	Running the program: (Note: also see parameter lm information in the
	two sections immediately below this section)

	The program may be run with or without an input deck (called "mg.input").
	The following describes a few things about the input deck if you want to
	use one.

	The four lines below are the "mg.input" file required to run a
	problem of total size 256x256x256, for 4 iterations (Class "A"),
	and presumes the use of 8 processors:

	8 = top level
	256 256 256 = nx ny nz
	4 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	The first line of input indicates how many levels of multi-grid
	cycle will be applied to a particular subpartition. Presuming that
	8 processors are solving this problem (recall that the number of
	processors is specified to MPI as a run parameter, and MPI subsequently
	determines this for the code via an MPI subroutine call), a 2x2x2
	processor grid is formed, and thus each partition on a processor is
	of size 128x128x128. Therefore, a maximum of 8 multi-grid levels may
	be used. These are of size 128,64,32,16,8,4,2,1, with the coarsest
	level being a single point on a given processor.


	Next, consider the same size problem but running on 1 processor. The
	following "mg.input" file is appropriate:

	9 = top level
	256 256 256 = nx ny nz
	4 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	Since this processor must solve the full 256x256x256 problem, this
	permits 9 multi-grid levels (256,128,64,32,16,8,4,2,1), resulting in
	a coarsest multi-grid level of a single point on the processor


	Next, consider the same size problem but running on 2 processors. The
	following "mg.input" file is required:

	8 = top level
	256 256 256 = nx ny nz
	4 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	The algorithm for partitioning the full grid onto some power of 2 number
	of processors is to start by splitting the last dimension of the grid
	(z dimension) in 2: the problem is now partitioned onto 2 processors.
	Next the middle dimension (y dimension) is split in 2: the problem is now
	partitioned onto 4 processors. Next, first dimension (x dimension) is
	split in 2: the problem is now partitioned onto 8 processors. Next, the
	last dimension (z dimension) is split again in 2: the problem is now
	partitioned onto 16 processors. This partitioning is repeated until all
	of the power of 2 processors have been allocated.

	Thus to run the above problem on 2 processors, the grid partitioning
	algorithm will allocate the two processors across the last dimension,
	creating two partitions each of size 256x256x128. The coarsest level of
	multi-grid must be a single point surrounded by a cubic number of grid
	points. Therefore, each of the two processor partitions will contain 4
	coarsest multi-grid level points, each surrounded by a cube of grid points
	of size 128x128x128, indicated by a top level of 8.


	Next, consider the same size problem but running on 4 processors. The
	following "mg.input" file is required:

	8 = top level
	256 256 256 = nx ny nz
	4 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	The partitioning algorithm will create 4 partitions, each of size
	256x128x128. Each partition will contain 2 coarsest multi-grid level
	points each surrounded by a cube of grid points of size 128x128x128,
	indicated by a top level of 8.


	Next, consider the same size problem but running on 16 processors. The
	following "mg.input" file is required:

	7 = top level
	256 256 256 = nx ny nz
	4 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	On each node a partition of size 128x128x64 will be created. A maximum
	of 7 multi-grid levels (64,32,16,8,4,2,1) may be used, resulting in each
	partions containing 4 coarsest multi-grid level points, each surrounded
	by a cube of grid points of size 64x64x64, indicated by a top level of 7.




	Note that non-cubic problem sizes may also be considered:

	The four lines below are the "mg.input" file appropriate for running a
	problem of total size 256x512x512, for 20 iterations and presumes the
	use of 32 processors (note: this is NOT a class C problem):

	8 = top level
	256 512 512 = nx ny nz
	20 = nit
	0 0 0 0 0 0 0 0 = debug_vec

	The first line of input indicates how many levels of multi-grid
	cycle will be applied to a particular subpartition. Presuming that
	32 processors are solving this problem, a 2x4x4 processor grid is
	formed, and thus each partition on a processor is of size 128x128x128.
	Therefore, a maximum of 8 multi-grid levels may be used. These are of
	size 128,64,32,16,8,4,2,1, with the coarsest level being a single
	point on a given processor.