next up previous contents
Next: Left-Looking Variant Up: Right-Looking Variant Previous: Level-2 BLAS implementation

Level-3 BLAS implementation

  In order to take advantage of the higher performance of sequential level-3 BLAS, we rewrite the above algorithm in terms of matrix-matrix operations. Partition

displaymath16234

where tex2html_wrap_inline16236 and tex2html_wrap_inline16238 are tex2html_wrap_inline16240 sub-matrices. Now,

eqnarray10936

This in turn yields the equations

eqnarray10967

We thus conclude that the following steps will implement the Cholesky factorization, overwriting the lower triangular portion of A with L :

Notice that to compute tex2html_wrap_inline16260 , a level-2 implementation of the Cholesky factorization can be used. The bulk of the computation is in the operation tex2html_wrap_inline16262 , which now uses a matrix-matrix operation, thereby achieving better performance.

The PLAPACK code for this level-3 BLAS based implementation is given in Figure 8.2. Notice that the block size b must be determined. This block size should equal the width of tex2html_wrap_inline16266 that makes the the symmetric rank-k update: tex2html_wrap_inline16268 most efficient. The call

PLA_Environ_nb_alg( PLA_OP_SYM_PAN_PAN, template, &nb_alg );
queries the infrastructure for the optimal block size for symmetric rank-k (panel-panel) update. A second optimization, which could also have been used for the level-2 BLAS right-looking variant, recognizes that computation naturally flows to the right and down: notice that the columns are completed from left to right, and similarly the order in which rows no longer participate is from top to bottom. The calls
PLA_Temp_get_comm_dir( template, PLA_DIR_TEMP_ROW, &old_dir_row );
PLA_Temp_get_comm_dir( template, PLA_DIR_TEMP_COL, &old_dir_col );
PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_ROW, PLA_DIR_RIGHT );
PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_COL, PLA_DIR_DOWN );

PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_ROW, old_dir_row );
PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_COL, old_dir_col );
query the current directions of the flow of computation, sets them as appropriate for the Cholesky factorization, and resets them after the Cholesky factorization has completed.


next up previous contents
Next: Left-Looking Variant Up: Right-Looking Variant Previous: Level-2 BLAS implementation

rvdg@cs.utexas.edu