blob: cce3108fb69738217ecee5ea2c1f0f96b3c004e4 [file] [log] [blame]
GENERAL INFORMATION:
The LU program factors a dense matrix into the product of a lower
triangular and an upper triangular matrix. The factorization uses
blocking to exploit temporal locality on individual submatrix elements.
The algorithm used in this implementation is described in
Woo, S. C., Singh, J. P., and Hennessy, J. L. The Performance Advantages
of Integrating Block Data Transfer in Cache-Coherent Multiprocessors.
Proceedings of the 6th International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS-VI),
October 1994.
Two implementations are provided in the SPLASH-2 distribution:
(1) Non-contiguous block allocation
This implementation (contained in the non_contiguous_blocks
subdirectory) implements the matrix to be factored with a
two-dimensional array. This data structure prevents blocks from
being allocated contiguously, but leads to a conceptually simple
programming implementation.
(2) Contiguous block allocation
This implementation (contained in the contiguous_blocks
subdirectory) implements the matrix to be factored as an array
of blocks. This data structure allows blocks to be allocated
contiguously and entirely in the local memory of processors that
"own" them, thus enhancing data locality properties.
These programs work under both the Unix FORK and SPROC models.
RUNNING THE PROGRAM:
To see how to run the program, please see the comment at the top of the
file lu.C, or run the application with the "-h" command line option.
Three parameters may be specified on the command line, of which the
ones that are normally changed are the matrix size and the number of
processors. It is suggested that the block size be kept at the value
B=16, since this value works well in practice. If this parameter is
changed, the new value should be reported in any results that are
presented.
BASE PROBLEM SIZE:
The base problem size for an upto-64 processor machine is a 512x512 matrix
with a block size of B=16.
DATA DISTRIBUTION:
Our "POSSIBLE ENHANCEMENT" comments in the source code tell where one
might want to distribute data and how. Data distribution has a small
impact on performance on the Stanford DASH multiprocessor.