Quickstart into production¤
Info
This guide assumes you're already familiar with scaLAPACK and are looking to use the C wrappers provided by scaLAPACKe. If you're new to scaLAPACK, please check out the tutorial instead.
Currently, scaLAPACKe offers both a low-level and a middle-level interface to scaLAPACK, PBLAS, and BLACS. The main difference between the two is that the middle-level interface uses pass-by-value for arguments.
List of routines available in scaLAPACKe¤
Providing a complete list of routines here would be redundant. By design, scaLAPACKe mirrors the reference implementations of the various components of scaLAPACK, meaning it shares the same set of routines.
For quick reference, you can find the relevant lists here:
If you notice that a routine is missing, feel free to open an issue. 😊
Common Features and Caveats for Both Interfaces¤
-
Variables of Fortran type
INTEGER
andLOGICAL
are converted tolapack_int
in scaLAPACKe. This conversion supports the use of 64-bit integers (see ILP64 programming model). -
Following LAPACKe, variables of Fortran type
COMPLEX
andCOMPLEX*16
are converted tolapack_complex_float*
andlapack_complex_double*
, respectively. It is assumed throughout that the real and imaginary components are stored contiguously in memory, with the real component first. Thelapack_complex_float
andlapack_complex_double
macros can be either C99_Complex
types, a C struct defined type, or a custom complex type. -
Arrays are passed as pointers, not as pointers to pointers. They should be stored in Fortran style, i.e., column-major order: for an
MxN
matrix (whereM
is the number of rows andN
is the number of columns),A(i,j)
is accessed asA[i + j * LDA]
, whereLDA
is the leading dimension ofA
(typically the number of rows,M
). -
Similarly, strings are passed as pointers, not as pointers to pointers. Generally, only the first character of the string is significant.
-
When an argument (or return value) refers to a row or column number, 1-based indexing is used (i.e.,
1 <= i <= N
rather than0 <= i < N
).
Warning
Currently, scaLAPACKe does not provide a mechanism to switch from row-major to column-major storage, unlike LAPACKe. Implementing this would incur significant overhead due to the need to redistribute data across all processes. Therefore, it is recommended to use column-major arrays when working with scaLAPACKe functions.
Remember that memory locality is crucial for performance, so you should adapt your code when looping over array values:
// A is an MxN array
for (int j = 0; j < N; j++) { // loop over columns
for (int i = 0; i < M; i++) // loop over rows
// use `A[i + j * LDA]` for `A(i, j)`.
}
The low-level interface¤
The low-level interface requires four of the header files provided by scaLAPACKe: scalapack_config.h
(included by the 3 others), blacs.h
, pblas.h
, and scalapack.h
.
#include <blacs.h>
#include <pblas.h>
#include <scalapack.h>
This interface follows the customary naming convention for Fortran-C interfaces: the Fortran routine name is converted to lowercase, with an underscore (_
) appended at the end.
For example, the Fortran subroutine PDGEMM
becomes pdgemm_
.
All arguments from the original Fortran subroutine are retained and must be passed as pointers, rather than by value.
Example
The following code:
- Generates two matrices, \(A\) and \(I\), whith \(I_{ij} = \delta_{ij}\). \(A\) is filled with random numbers.
- It computes \(C = A\times I\) and then \(A = C \times I - A\). At the end of this operation, \(A_{ij} = 0\).
- It prints \(\max\{A_{ij}\}\) (which should be zero) and exits.
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <blacs.h>
#include <pblas.h>
#include <scalapack.h>
lapack_int I_ZERO = 0, I_ONE = 1;
double D_ONE = 1.0, D_ZERO = 0.0, D_M_ONE = -1.f;
int main(int argc, char* argv[]) {
lapack_int N = 128, blk_size = 16;
lapack_int nprocs, ctx_sys, grid_M, grid_N, glob_i, glob_j;
lapack_int iam, loc_row, loc_col, loc_M, loc_N, loc_LD, info;
lapack_int desc_A[9];
double *loc_A, *loc_I, *loc_C; // loc_A, loc_I, and loc_C are NxN matrices
// seed
srand(time(NULL));
// initialize BLACS & system context
blacs_pinfo_(&iam, &nprocs);
blacs_get_(&I_ZERO, &I_ZERO, &ctx_sys);
// create a (grid_M x grid_N) grid
grid_M = sqrt((double) nprocs);
grid_N = nprocs / grid_M;
blacs_gridinit_(&ctx_sys, "R", &grid_M, &grid_N);
blacs_gridinfo_(&ctx_sys, &grid_M, &grid_N, &loc_row, &loc_col);
if(loc_row >= 0 && loc_col >= 0) { // if I'm in grid
loc_M = numroc_(&N, &blk_size, &loc_row, &I_ZERO, &grid_M);
loc_N = numroc_(&N, &blk_size, &loc_col, &I_ZERO, &grid_N);
loc_LD = loc_M;
loc_A = calloc(loc_M * loc_N, sizeof(double));
loc_I = calloc(loc_M * loc_N, sizeof(double));
loc_C = calloc(loc_M * loc_N, sizeof(double));
// fill arrays locally
for(lapack_int loc_j=1; loc_j <= loc_N; loc_j++) {
glob_j = indxl2g_(&loc_j, &blk_size, &loc_col, &I_ZERO, &grid_N);
for(lapack_int loc_i=1; loc_i <= loc_M; loc_i++) { // set loc_A[i,j]
glob_i = indxl2g_(&loc_i, &blk_size, &loc_row, &I_ZERO, &grid_M);
loc_A[(loc_j - 1) * loc_LD + (loc_i - 1)] = ((double) rand()) / INT_MAX;
loc_I[(loc_j - 1) * loc_LD + (loc_i - 1)] = (glob_i == glob_j);
}
}
// create descriptor for loc_A, I and loc_C
descinit_(desc_A,
&N, &N, &blk_size, &blk_size,
&I_ZERO, &I_ZERO, &ctx_sys,
&loc_LD, &info);
// compute loc_C = loc_A * loc_I
pdgemm_("N", "N", &N, &N, &N,
&D_ONE, loc_A, &I_ONE, &I_ONE, desc_A,
loc_I, &I_ONE, &I_ONE, desc_A,
&D_ZERO, loc_C, &I_ONE, &I_ONE, desc_A
);
// compute loc_A = loc_C * loc_I - loc_A
pdgemm_("N", "N", &N, &N, &N,
&D_ONE, loc_C, &I_ONE, &I_ONE, desc_A,
loc_I, &I_ONE, &I_ONE, desc_A,
&D_M_ONE, loc_A, &I_ONE, &I_ONE, desc_A
);
// fetch the maximum (i.e., the inf-norm)
double* work = (double*) calloc(loc_LD, sizeof(double));
double max = pdlange_("I", &N, &N, loc_A, &I_ONE, &I_ONE, desc_A, work);
if(iam == 0)
printf("%f\n", max);
free(loc_A);
free(loc_I);
free(loc_C);
free(work);
blacs_gridexit_(&ctx_sys);
}
blacs_exit_(&I_ZERO);
return EXIT_SUCCESS;
}
Notice that both loc_i
and loc_j
starts at one.
The middle-level interface¤
The middle-level interface requires all the headers and wrappers files provided by scaLAPACKe.
#include <scalapacke_blacs.h>
#include <scalapacke_pblas.h>
#include <scalapacke.h>
The naming convention for this interface is as follows: take the Fortran routine name, convert it to lowercase, and prepend it with SCALAPACKE_
.
If the routine requires an auxiliary workspace (WORK
, RWORK
, IWORK
, etc), the suffix _work
should be added.
For example, the Fortran routine PDGEMM
becomes SCALAPACKE_pdgemm
, and PDSYEV
becomes SCALAPACKE_pdsyev_work
.
The INFO
parameter found in the Fortran subroutine is omitted in scaLAPACKe.
Instead, the lapack_int
return value of the function is set to the value that INFO
would have returned.
All other arguments from the original Fortran subroutine are retained in the C interface.
Arguments are passed by value rather than by pointer when both of the following conditions are met:
- The argument is input-only.
- The argument is a scalar type (such as
INTEGER
,REAL
,DOUBLE
,COMPLEX
, orLOGICAL
).
Example
This is the same program as above, but written using the middle-level interface:
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <scalapacke_blacs.h>
#include <scalapacke_pblas.h>
#include <scalapacke.h>
int main(int argc, char* argv[]) {
lapack_int N = 128, blk_size = 16;
lapack_int nprocs, sys_ctx, grid_ctx, grid_M, grid_N, glob_i, glob_j;
lapack_int iam, loc_row, loc_col, loc_M, loc_N, loc_LD, info;
lapack_int desc_A[9];
double *loc_A, *loc_I, *loc_C; // loc_A, loc_I, and loc_C are NxN matrices
// seed
srand(time(NULL));
// initialize BLACS & system context
SCALAPACKE_blacs_pinfo(&iam, &nprocs);
SCALAPACKE_blacs_get(0, 0, &sys_ctx);
// create a (grid_M x grid_N) grid
grid_M = sqrt((double) nprocs);
grid_N = nprocs / grid_M;
grid_ctx = sys_ctx;
SCALAPACKE_blacs_gridinit(&grid_ctx, "R", grid_M, grid_N);
SCALAPACKE_blacs_gridinfo(grid_ctx, &grid_M, &grid_N, &loc_row, &loc_col);
if(loc_row >= 0) { // if loc_I'm in grid
loc_M = SCALAPACKE_numroc(N, blk_size, loc_row, 0, grid_M);
loc_N = SCALAPACKE_numroc(N, blk_size, loc_col, 0, grid_N);
loc_LD = loc_M;
loc_A = calloc(loc_M * loc_N, sizeof(double));
loc_I = calloc(loc_M * loc_N, sizeof(double));
loc_C = calloc(loc_M * loc_N, sizeof(double));
// fill arrays locally
for(lapack_int loc_j=0; loc_j < loc_N; loc_j++) {
glob_j = SCALAPACKE_indxl2g(loc_j + 1, blk_size, loc_col, 0, grid_N);
for(lapack_int loc_i=0; loc_i < loc_M; loc_i++) { // set loc_A[i,j]
glob_i = SCALAPACKE_indxl2g(loc_i + 1, blk_size, loc_row, 0, grid_M) ;
loc_A[loc_j * loc_LD + loc_i] = ((double) rand()) / INT_MAX;
loc_I[loc_j * loc_LD + loc_i] = (glob_i == glob_j);
}
}
// create descriptor for loc_A, loc_I and loc_C
SCALAPACKE_descinit(desc_A,
N, N, blk_size, blk_size,
0, 0, grid_ctx,
loc_LD);
// compute loc_C = loc_A * loc_I
SCALAPACKE_pdgemm("N", "N", N, N, N,
1., loc_A, 1, 1, desc_A,
loc_I, 1, 1, desc_A,
.0, loc_C, 1, 1, desc_A
);
// compute loc_A = loc_C * loc_I - loc_A
SCALAPACKE_pdgemm("N", "N", N, N, N,
1., loc_C, 1, 1, desc_A,
loc_I, 1, 1, desc_A,
-1., loc_A, 1, 1, desc_A
);
// fetch the maximum (i.e., the inf-norm)
double* work = (double*) calloc(loc_LD, sizeof(double));
double max = SCALAPACKE_pdlange_work("I", N, N, loc_A, 1, 1, desc_A, work);
if(iam == 0)
printf("%f\n", max);
free(loc_A);
free(loc_I);
free(loc_C);
free(work);
SCALAPACKE_blacs_gridexit(grid_ctx);
}
SCALAPACKE_blacs_exit(0);
return EXIT_SUCCESS;
}