Quickstart into production¤

Info

This guide assumes you're already familiar with scaLAPACK and are looking to use the C wrappers provided by scaLAPACKe. If you're new to scaLAPACK, please check out the tutorial instead.

Currently, scaLAPACKe offers both a low-level and a middle-level interface to scaLAPACK, PBLAS, and BLACS. The main difference between the two is that the middle-level interface uses pass-by-value for arguments.

List of routines available in scaLAPACKe¤

Providing a complete list of routines here would be redundant. By design, scaLAPACKe mirrors the reference implementations of the various components of scaLAPACK, meaning it shares the same set of routines.

For quick reference, you can find the relevant lists here:

If you notice that a routine is missing, feel free to open an issue. 😊

Common Features and Caveats for Both Interfaces¤

Variables of Fortran type INTEGER and LOGICAL are converted to lapack_int in scaLAPACKe. This conversion supports the use of 64-bit integers (see ILP64 programming model).
Following LAPACKe, variables of Fortran type COMPLEX and COMPLEX*16 are converted to lapack_complex_float* and lapack_complex_double*, respectively. It is assumed throughout that the real and imaginary components are stored contiguously in memory, with the real component first. The lapack_complex_float and lapack_complex_double macros can be either C99 _Complex types, a C struct defined type, or a custom complex type.
Arrays are passed as pointers, not as pointers to pointers. They should be stored in Fortran style, i.e., column-major order: for an MxN matrix (where M is the number of rows and N is the number of columns), A(i,j) is accessed as A[i + j * LDA], where LDA is the leading dimension of A (typically the number of rows, M).
Similarly, strings are passed as pointers, not as pointers to pointers. Generally, only the first character of the string is significant.
When an argument (or return value) refers to a row or column number, 1-based indexing is used (i.e., 1 <= i <= N rather than 0 <= i < N).

Warning

Currently, scaLAPACKe does not provide a mechanism to switch from row-major to column-major storage, unlike LAPACKe. Implementing this would incur significant overhead due to the need to redistribute data across all processes. Therefore, it is recommended to use column-major arrays when working with scaLAPACKe functions.

Remember that memory locality is crucial for performance, so you should adapt your code when looping over array values:

// A is an MxN array
for (int j = 0; j < N; j++) { // loop over columns
    for (int i = 0; i < M; i++) // loop over rows
        // use `A[i + j * LDA]` for `A(i, j)`.  
}

The low-level interface¤

The low-level interface requires four of the header files provided by scaLAPACKe: scalapack_config.h (included by the 3 others), blacs.h, pblas.h, and scalapack.h.

#include <blacs.h>
#include <pblas.h>
#include <scalapack.h>

This interface follows the customary naming convention for Fortran-C interfaces: the Fortran routine name is converted to lowercase, with an underscore (_) appended at the end. For example, the Fortran subroutine PDGEMM becomes pdgemm_.

All arguments from the original Fortran subroutine are retained and must be passed as pointers, rather than by value.

Example

The following code:

Generates two matrices, \(A\) and \(I\), whith \(I_{ij} = \delta_{ij}\). \(A\) is filled with random numbers.
It computes \(C = A\times I\) and then \(A = C \times I - A\). At the end of this operation, \(A_{ij} = 0\).
It prints \(\max\{A_{ij}\}\) (which should be zero) and exits.

#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <limits.h>
#include <math.h>
#include <stdio.h>

#include <blacs.h>
#include <pblas.h>
#include <scalapack.h>

lapack_int I_ZERO = 0, I_ONE = 1;
double D_ONE = 1.0, D_ZERO = 0.0, D_M_ONE = -1.f;

int main(int argc, char* argv[]) {
    lapack_int N = 128, blk_size = 16;
    lapack_int nprocs, ctx_sys, grid_M, grid_N, glob_i, glob_j;
    lapack_int  iam, loc_row, loc_col, loc_M, loc_N, loc_LD, info;
    lapack_int desc_A[9];
    double *loc_A, *loc_I, *loc_C; // loc_A, loc_I, and loc_C are NxN matrices

    // seed
    srand(time(NULL));

    // initialize BLACS & system context
    blacs_pinfo_(&iam, &nprocs);
    blacs_get_(&I_ZERO, &I_ZERO, &ctx_sys);

    // create a (grid_M x grid_N) grid
    grid_M = sqrt((double) nprocs);
    grid_N = nprocs / grid_M;
    blacs_gridinit_(&ctx_sys, "R", &grid_M, &grid_N);
    blacs_gridinfo_(&ctx_sys, &grid_M, &grid_N, &loc_row, &loc_col);

    if(loc_row >= 0 && loc_col >= 0) { // if I'm in grid
        loc_M = numroc_(&N, &blk_size, &loc_row, &I_ZERO, &grid_M);
        loc_N = numroc_(&N, &blk_size, &loc_col, &I_ZERO, &grid_N);

        loc_LD =  loc_M;
        loc_A = calloc(loc_M * loc_N, sizeof(double));
        loc_I = calloc(loc_M * loc_N, sizeof(double));
        loc_C = calloc(loc_M * loc_N, sizeof(double));

        // fill arrays locally
        for(lapack_int loc_j=1; loc_j <= loc_N; loc_j++) {
            glob_j = indxl2g_(&loc_j, &blk_size, &loc_col, &I_ZERO, &grid_N);
            for(lapack_int loc_i=1; loc_i <= loc_M; loc_i++) { // set loc_A[i,j]
                glob_i = indxl2g_(&loc_i, &blk_size, &loc_row, &I_ZERO, &grid_M);
                loc_A[(loc_j - 1) * loc_LD + (loc_i - 1)] = ((double) rand()) / INT_MAX;
                loc_I[(loc_j - 1) * loc_LD + (loc_i - 1)] = (glob_i == glob_j);
            }
        }

        // create descriptor for loc_A, I and loc_C
        descinit_(desc_A,
                  &N, &N, &blk_size, &blk_size,
                  &I_ZERO, &I_ZERO, &ctx_sys,
                  &loc_LD, &info);

        // compute loc_C = loc_A * loc_I
        pdgemm_("N", "N", &N, &N, &N,
                &D_ONE, loc_A, &I_ONE, &I_ONE, desc_A,
                loc_I, &I_ONE, &I_ONE, desc_A,
                &D_ZERO, loc_C, &I_ONE, &I_ONE, desc_A
        );

        // compute loc_A = loc_C * loc_I - loc_A
        pdgemm_("N", "N", &N, &N, &N,
                &D_ONE, loc_C, &I_ONE, &I_ONE, desc_A,
                loc_I, &I_ONE, &I_ONE, desc_A,
                &D_M_ONE, loc_A, &I_ONE, &I_ONE, desc_A
        );

        // fetch the maximum (i.e., the inf-norm)
        double* work = (double*) calloc(loc_LD, sizeof(double));
        double max = pdlange_("I", &N, &N, loc_A, &I_ONE, &I_ONE, desc_A, work);

        if(iam == 0)
            printf("%f\n", max);

        free(loc_A);
        free(loc_I);
        free(loc_C);
        free(work);
        blacs_gridexit_(&ctx_sys);
    }

    blacs_exit_(&I_ZERO);
    return EXIT_SUCCESS;
}

Notice that both loc_i and loc_j starts at one.

The middle-level interface¤

The middle-level interface requires all the headers and wrappers files provided by scaLAPACKe.

#include <scalapacke_blacs.h>
#include <scalapacke_pblas.h>
#include <scalapacke.h>

The naming convention for this interface is as follows: take the Fortran routine name, convert it to lowercase, and prepend it with SCALAPACKE_. If the routine requires an auxiliary workspace (WORK, RWORK, IWORK, etc), the suffix _work should be added. For example, the Fortran routine PDGEMM becomes SCALAPACKE_pdgemm, and PDSYEV becomes SCALAPACKE_pdsyev_work.

The INFO parameter found in the Fortran subroutine is omitted in scaLAPACKe. Instead, the lapack_int return value of the function is set to the value that INFO would have returned. All other arguments from the original Fortran subroutine are retained in the C interface.

Arguments are passed by value rather than by pointer when both of the following conditions are met:

The argument is input-only.
The argument is a scalar type (such as INTEGER, REAL, DOUBLE, COMPLEX, or LOGICAL).

Example

This is the same program as above, but written using the middle-level interface:

#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <limits.h>
#include <math.h>
#include <stdio.h>

#include <scalapacke_blacs.h>
#include <scalapacke_pblas.h>
#include <scalapacke.h>

int main(int argc, char* argv[]) {
    lapack_int N = 128, blk_size = 16;
    lapack_int nprocs, sys_ctx, grid_ctx, grid_M, grid_N, glob_i, glob_j;
    lapack_int  iam, loc_row, loc_col, loc_M, loc_N, loc_LD, info;
    lapack_int desc_A[9];
    double *loc_A, *loc_I, *loc_C; // loc_A, loc_I, and loc_C are NxN matrices

    // seed
    srand(time(NULL));

    // initialize BLACS & system context
    SCALAPACKE_blacs_pinfo(&iam, &nprocs);
    SCALAPACKE_blacs_get(0, 0, &sys_ctx);

    // create a (grid_M x grid_N) grid
    grid_M = sqrt((double) nprocs);
    grid_N = nprocs / grid_M;
    grid_ctx = sys_ctx;
    SCALAPACKE_blacs_gridinit(&grid_ctx, "R", grid_M, grid_N);
    SCALAPACKE_blacs_gridinfo(grid_ctx, &grid_M, &grid_N, &loc_row, &loc_col);

    if(loc_row >= 0) { // if loc_I'm in grid
        loc_M = SCALAPACKE_numroc(N, blk_size, loc_row, 0, grid_M);
        loc_N = SCALAPACKE_numroc(N, blk_size, loc_col, 0, grid_N);

        loc_LD =  loc_M;
        loc_A = calloc(loc_M * loc_N, sizeof(double));
        loc_I = calloc(loc_M * loc_N, sizeof(double));
        loc_C = calloc(loc_M * loc_N, sizeof(double));

        // fill arrays locally
        for(lapack_int loc_j=0; loc_j < loc_N; loc_j++) {
            glob_j = SCALAPACKE_indxl2g(loc_j + 1, blk_size, loc_col, 0, grid_N);
            for(lapack_int loc_i=0; loc_i < loc_M; loc_i++) { // set loc_A[i,j]
                glob_i = SCALAPACKE_indxl2g(loc_i + 1, blk_size, loc_row, 0, grid_M) ;
                loc_A[loc_j * loc_LD + loc_i] = ((double) rand()) / INT_MAX;
                loc_I[loc_j * loc_LD + loc_i] = (glob_i == glob_j);
            }
        }

        // create descriptor for loc_A, loc_I and loc_C
        SCALAPACKE_descinit(desc_A,
                            N, N, blk_size, blk_size,
                            0, 0, grid_ctx,
                            loc_LD);

        // compute loc_C = loc_A * loc_I
        SCALAPACKE_pdgemm("N", "N", N, N, N,
                          1., loc_A, 1, 1, desc_A,
                          loc_I, 1, 1, desc_A,
                          .0, loc_C, 1, 1, desc_A
        );

        // compute loc_A = loc_C * loc_I - loc_A
        SCALAPACKE_pdgemm("N", "N", N, N, N,
                          1., loc_C, 1, 1, desc_A,
                          loc_I, 1, 1, desc_A,
                          -1., loc_A, 1, 1, desc_A
        );

        // fetch the maximum (i.e., the inf-norm)
        double* work = (double*) calloc(loc_LD, sizeof(double));
        double max = SCALAPACKE_pdlange_work("I", N, N, loc_A, 1, 1, desc_A, work);

        if(iam == 0)
            printf("%f\n", max);

        free(loc_A);
        free(loc_I);
        free(loc_C);
        free(work);

        SCALAPACKE_blacs_gridexit(grid_ctx);
    }

    SCALAPACKE_blacs_exit(0);
    return EXIT_SUCCESS;
}