next up previous contents
Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Example: Parallelizing Matrix-Matrix Multiplication

Forming tex2html_wrap_inline15098

 

For this operation to be well-defined, we require matrices A , B , and C to have dimensions tex2html_wrap_inline15106 , tex2html_wrap_inline15108 , and tex2html_wrap_inline15110 , respectively.

We start my noting that if we partition A and B by columns and rows, respectively,

displaymath15094

then

displaymath15095

Thus, a matrix-matrix multiply can be written as a sequence of rank-one updates.

Basic Parallel Algorithm:

We can implement the above sequence of rank-1 updates by using views, splitting off one column and row at a time: Let A and B be partitioned like

displaymath15116

where tex2html_wrap_inline15124 and tex2html_wrap_inline15126 equal the first column of A and first row of B . Then

eqnarray9763

So the update of C can be viewed as an initial scaling of C , followed by a rank-1 update with the first column and row of A and B, respectively. After this, the operation tex2html_wrap_inline15140 can be performed, using the same approach, recursively.

This leads to the following algorithm:

Here the `` tex2html_wrap_inline15158 '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. The resulting PLAPACK code is given in Figure gif. When comparing the main loop in this code to that given for rank-1 update in Figure gif, notice that we have merely added a loop that creates appropriate views into matrices A and B .

PLACE BEGIN HR HERE

figure9848

PLACE END HR HERE

Blocked Algorithm:

In order to get better performance, it is advantageous to set up the algorithm to perform a sequence of rank-kgif updates instead: Let A and B be partitioned like

displaymath15168

where tex2html_wrap_inline15182 and tex2html_wrap_inline15184 are is a tex2html_wrap_inline15186 and tex2html_wrap_inline15188 matrices, respectively. Then

eqnarray9810

This leads to the following algorithm:

Again, the `` tex2html_wrap_inline15216 '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. This blocked code is given in Fig. gif. We will often refer to this approach to parallel matrix-matrix multiplication as a     panel-panel variant, to indicate that the basic operation uses panels of each of the matrices (rank-k update).

PLACE BEGIN HR HERE

figure9950

PLACE END HR HERE

Pipelined Algorithm:

When k is large, the above algorithm can be improved by introducing pipelining. This can be accomplished by setting the natural flow of computation to be to the right for communication within rows and down for communication within columns. We will assume this has been done before entering the matrix multiplication routine, using the calls

PLA_Temp_set_comm_dir( template, PLA_TEMP_ROW, PLA_DIR_RIGHT );
PLA_Temp_set_comm_dir( template, PLA_TEMP_COL, PLA_DIR_DOWN );
For details on the benefits of pipelining, see [, ].


next up previous contents
Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Example: Parallelizing Matrix-Matrix Multiplication

rvdg@cs.utexas.edu