Next: Acknowledgments Up: Contents Previous: Series Foreword

Preface

The development of the Parallel Linear Algebra Package (PLAPACK, pronounced PLAY-pack) infrastructure resulted in part from my frustration while teaching a graduate level special topics course titled Parallel Techniques for Numerical Algorithms at the University of Texas. While I could explain the high performance scalable implementation of algorithms like matrix-matrix multiplication and Cholesky factorization without filling up more than half a chalk board, presenting actual parallel code for such operations required explanation of subroutine calls with fifteen to thirty parameters. The natural blocking that can be used to describe the algorithm, and is required to attain high performance, did not translate well to code.

To get a sense of how linear algebra algorithms can be naturally expressed as blocked algorithms, we suggest the reader now turn to the description of the Cholesky factorization algorithm in Section 1.2. One will observe that the description of the algorithm does not index individual elements explicitly. There is an inherent use of recursion. The parallel algorithm is described without explicitly referencing processor indices or communication. Thus, an ideal infrastructure for coding such algorithms should capture these same attributes.

The arrival of the Message-Passing Interface (MPI) has not only provided an ideal vehicle for the communication layer of parallel libraries, but has also introduced a convenient approach to programming to the parallel processing community. One of the ways MPI reduces the number of parameters in the calling sequences of routines is through the use of object based programming. Elaborate descriptions of groups of processors within which an MPI communication is to occur are stored in a hidden data structure (opaque object). Similarly, descriptions of distributed matrices and vectors in PLAPACK are stored in linear algebra objects. Furthermore, we use views, objects that are references into distributed matrices and vectors, to address sub-blocks of matrices and vectors. Through use of views, a PLAPACK implementation becomes a line-by-line translation of a given blocked algorithm.

There are a number of additional recent advances that have been incorporated into PLAPACK. One is the recognition that applications inherently view the distribution of a matrix differently than has been traditionally supported by parallel linear algebra libraries. PLAPACK uses an alternative view of matrix distribution, Physically Based Matrix Distribution, in which the distribution of matrices is induced by the distribution of vectors in a linear system of equations. It is this approach to matrix distribution that also naturally supports and exploits the collective communications required to perform the data duplications necessary to parallelize dense linear algebra algorithms. This in turn supports a systematic, layered approach for the parallel implementation of common matrix-vector and matrix-matrix operations, allowing a highly compact implementation of the PLAPACK infrastructure.

PLAPACK can be used as a library. This requires a user to have a rudimentary knowledge of the first few chapters of this book and to access the PLAPACK web page, to see what higher level operations are available. It can be used as a vehicle for parallel implementation of new linear algebra algorithms, thus providing users with additional high level operations. In addition, its simplicity make it an ideal vehicle for education in the area of high performance supercomputing. Thus, we target scientific application programmers, library developers, and novices.

We intend for PLAPACK to become an ``open'' infrastructure, with new subroutines added as they are developed by our own group as well as others. For this purpose, we maintain an extensive web page at http://www.cs.utexas.edu/users/plapack/

Since the name can be confusing, the PLAPACK library should not be mistaken for a parallelization of LAPACK [] (generally pronounced L-A-pack) or a repackaging of ScaLAPACK [] (generally pronounced Sca-L-A-pack). LAPACK is a highly successful high performance dense linear algebra package for conventional (shared memory) supercomputers as well as workstations. It was written in FORTRAN and makes use of the Level 1, 2, and 3 Basic Linear Algebra Subprograms (BLAS) to attain high performance in a portable fashion. We view ScaLAPACK as an effort to port LAPACK to distributed memory computers, with an emphasis on maximal code reuse through minimal change to all components of the LAPACK library. The PLAPACK approach to parallelizing linear algebra algorithms is somewhat more radical.

Next: Acknowledgments Up: Contents Previous: Series Foreword

rvdg@cs.utexas.edu