Sparse matrix is a matrix populated primarily with zeros. Sparse matrices are stored using special data structures, which allow to store only nonzero elements and save a lot of memory and CPU time when working with such matrices.
Specialized data structures allow to solve computational problems which are impossible to solve when matrices are stored using "conventional", dense storage.
Usually it is linear systems with large number of variables, but almost empty matrix. Such system arise in many areas - from partial differential equations to large scale RBF interpolation problems. All these problems are impossible to solve without sparse matrices. Hence, sparse subpackage of the ALGLIB package contains data structures, which allow to store sparse matrices, and algorithms, which can work with them.
Below we will study different sparse storage formats employed by sparsematrix structure, and functions which can be used on this structure. We also briefly review algorithms which can be applied to sparse matrices.
Structure used for sparse matrix storage should provide following functionality:
There are several popular data structures which are used to store sparse matrices, each with its own benefits and drawbacks. ALGLIB package implements two such structures: hash table storage (also known as Dictionary of keys) and CRS (Compressed row storage).
Sparse matrix can be stored as hash table which maps keys (row/column indexes) to values of corresponding elements. Such storage format has following features:
You may see that hash table storage allows easy creation and modification of sparse matrices. We may access (read and write) elements in arbitrary order, we do not have to know table size in advance, we have fast random access to the table. However, this storage format is not good for and does not allow to use sparse matrix in linear algebra operations like matrix-vector product. You have to convert table to CRS format in order to do so.
Thus, hash table storage can be classified as intermediate representation which is used when you want to initialize sparse matrix.
CRS (Compressed Row Storage) is another representation which can be used to store sparse matrix. Matrix elements are ordered by rows (from top to bottom) and columns (within row - from left to right). Such storage format has following features:
You may see that CRS representation allows to use matrices in the linear algebra operations due to efficient storage format. However, hard initialization is weakness of such representation: a) you have to know in advance number of nonzero elements in the matrix, and b) you have to fill matrix in strict order (from left to right, from top to bottom).
Thus, CRS representation is good for numerical work, but is not good for easy initialization. Luckily, there exists hash table storage format, which is good for initialization, and can be converted to CRS.
Depending on problem being solved we can point out two ways of working with sparse matrix:
Note #1
Although hash table format allows to create matrix without knowing exact number of nonzero elements,
it is desirable to pass to constructor at least approximate, order-of-magnitude correct estimate.
It will help to avoid costly reallocation of hash table which occurs when table becomes nearly full.
In order to create matrix in a hash table storage format you should use sparsecreate function. This function creates empty (zero) M·N matrix. If you know at least approximate estimate of the nonzero elements count, you can pass it to the constructor function. Such estimate, if given, allows to reserve required amount of memory and avoid reallocation of internal hash table.
After matrix creation you can fill it by elements. sparseset function allows to modify matrix - call for nonzero element rewrites its contents, call for zero element inserts new entry in the hash table. sparseadd function allows to add value to the element - call for nonzero element increments its value, call for zero element inserts new entry in the table. sparse_d_1 example shows how to work with these functions.
After the initialization sparse matrix can be passed to one of the algorithms (say, sparse solver). Most algorithms require sparse matrix to be in the CRS format. In this case you should manually convert it with sparseconverttocrs function before passing elsewhere.
In order to create matrix in CRS format you should use sparsecreatecrs function. This function accepts following parameters: rows count M, columns count N, array NER which contains number of nonzero elements in each row. Thus, in order to create CRS matrix you must know exact number of nonzero elements in each row of the matrix. Such information often requires a lot of careful programming work, but as result memory requirements become approximately three times lower.
After creation of the CRS matrix you can start filling its elements. Unlike hash table storage, CRS representation requires you to fill elements in strictly specified order: row by row from top to bottom, within row - from left to right. The only function which can be used for initialization is sparseset. You can read sparse_d_2 example which shows how to create matrix in CRS format.
You can read matrix elements by means of two functions. First, you may use sparseget function, which allows to get element value (nonzero or zero) by its index. This function works for matrices in CRS and hash table formats. For hash table storage it has O(1) time complexity, for CRS format it has O(log(NonzeroElementsInRow)) running time (binary search within a row is used). The most important drawback of this function is that you can get value of any specific element (nonzero or zero), but you can't get a list of nonzero elements without scanning all M·N elements, zero or not.
Second option is to use sparseenumerate function. It allows to enumerate all nonzero elements, subsequently returning their indexes and values. It works for both hash table and CRS matrices, and in both cases only O(1) time is needed to get element. However, enumeration order is different for different formats. Hash-based matrices are enumerated in seemingly random order (one which is used by hash table), while CRS format guarantees that elements will be scanned from top row to the bottom one, and within row - from left to right.
ALGLIB package contains algorithms which use user-generated sparse matrices - linear solvers, for example. Different algorithms may have different requirements to sparse storage format. Thus, you have to study algorithm description and pass sparse matrix in correct format - one which is required by algorithm.
sparse subpackage contains functions for linear algebra operations with sparse matrix - different kinds of matrix-vector or matrix-matrix products are supported. These functions are require sparse matrix to be stored in CRS format because hash-based representation does not allow to perform efficient numerical operations. This subpackage includes following functions:
In addition to functions listed above, sparse subpackage contains:
This article is intended for personal use only.
C# source.
C++ source.
C++ source. MPFR/GMP is used.
GMP source is available from gmplib.org. MPFR source is available from www.mpfr.org.
FreePascal version.
Delphi version.
VB.NET version.
VBA version.
Python version (CPython and IronPython are supported).
ALGLIB® - numerical analysis library, 1999-2017. |