From charlesreid1

Intro

Basic idea behind linear algebra: solving large number of linear algebraic equations simultaneously.

There are N unknowns and M equations, forming a linear system from matrices:

where lowercase letters denote vectors, and the dot represents multiplication of two matrices or a matrix and a vector, or dot product of two vectors.

Nonsingular vs Singular

We are not guaranteed to find a solution to a given linear system Ax=b

  • row degeneracy - one or more of the M equations is a linear combination of other equations
  • column degeneracy - one or more of the N variables are interchangeable and cannot be distinguished (are non-unique)
  • for square matrices, these two conditions are equivalent

Having singular system of equations means we can't find good solutions.

Other things preventing good solution:

  • Equations being equivalent to within roundoff error (approximately degenerate)
  • Accumulation of roundoff error during solution can swamp solution

Many linear equation solving packages dedicate a good chunk of their code to solving these problems

Sense of scale:

  • Linear equations with 20-50 unknowns are routine, can almost always be solved in single precision with straightforward routines
  • Linear equations with up to 1,000 unknowns require double precision
  • Larger sets of thousands or millions of equations can be solved with coefficients are sparse (mostly zero) by using specialty sparse methods

Tasks of Computational Linear Algebra

Beyond simply solving Ax=b for the unknowns x, we may want to do other things:

  • Solve more than one matrix equation, , where the matrix A does not change but unknown/RHS vectors do change
  • Calculating the inverse of the matrix
  • Calculating the determinant of a square matrix
  • Calculating the condition number of a matrix

If M < N (row degeneracy), or if M = N and equations are degenerate, there are fewer equations than unknowns. If fewer equations than unknowns:

  • Either no solution vector, or
  • More than one solution vector
  • If more than one solution, solution space consists of particular solution added to linear combination of N-M vectors (which are in the nullspace of the matrix A)
  • SVD can be used to solve this problem, by find these vectors (SVD finds the nullspace of a matrix A)

If M > N (column degeneracy), there are fewer unknowns than equations. If fewer unknowns than equations:

  • Think least squares - more data points than unknowns
  • Looking for best fit/compromise solution
  • Linear least-squares problem can be used to solve overdetermined linear system

Linear Algebra Software

Software:

  • Blas is short for "basic linear algebra subroutines" and provides the linear algebra operations that compose Lapack
  • Lapack is the most common linear algebra package (Linpack is old and was replaced by Lapack)
  • ScaLapack is version of Lapack for parallel architectures
  • Lapack routines divided by precision, algorithm, simplifications (tridiag., sparse, dense, etc.)
  • Atlas is version of Lapack that automatically tunes itself to run as fast as possible for your system's architecture when it is compiled

Direct vs Iterative:

  • Major dividing line between routines
  • Direct - algorithms that execute in a predictable number of operations
  • Iterative - attempt to converge on the desired answer to within a desired tolerance, taking as many steps as necessary

Gauss-Jordan Elimination

Classic technique for solving linear equations: combine equations to cancel out unknowns, turning A into the identity matrix

Gauss-Jordan elimination produces the solution of the equations and the matrix inverse - this is an expensive extra thing to produce, not always desirable to have it

Deficiencies:

  • All right hand side values must be stored and manipulated at once (not feasible for very large systems of equations)
  • Gauss-Jordan is three times slower than alternative techniques that do not produce the matrix inverse

Advantages:

  • Straightforward
  • Solid
  • Pivoting

Pivoting:

  • Pivoting is just interchanging rows (partial pivoting) or rows and columns (full pivoting) to put desired elements on matrix diagonals
  • Partial pivoting is easier, full pivoting requires keeping track of permutations, and un-doing these when solution vector achieved

Storage requirements of method:

  • The matrix inverse of A is gradually built up in A, as original A is overwritten
  • Likewise solution vector x can gradually replace right-hand side vector b (solution vector is overwritten one row element at a time, each time an element is overwritten it will not be used again)

On the subject of row vs. column elimination strategy:

  • if we perform row operations, row transformations build right to left, so right hand side is transformed at each stage from one vector to another
  • if we perform column operations, the column operations build from left to right, so we have to remember each transformation at each stage, and apply them all in reverse order at the end
  • this means column operations are much less useful

Gaussian Elimination with Back-Substitution

Like Gauss-Jordan, Gaussian elimination utilizes pivoting to eliminate elements from the matrix A and simplify it, but doesn't go as far as simplifying it to the identity matrix - simplifies it to an upper-triangular matrix

Triangular matrices make back-substitution easy - solving first equation involves one unknown, solving second equation involves only one new unknown, etc.

Represents midpoint between full elimination (Gauss-Jordan) and triangular decomposition (LU, QR)

Advantage of Gaussian elimination with back-substitution over Gauss-Jordan elimination is faster in terms of number of operations:

  • Inner loops of Gauss-Jordan elimination have one subtraction and one multiplication, executed and times (where m is number of right hand sides)
  • Inner loops in Gaussian elimination are only executed times and times
  • Back-substitution of RHS is executions of similar loop (one mult plus one subt)
  • Gaussian elimination has a factor of three advantage over Gauss-Jordan, which erodes as m approaches N (number of RHS approaches number of equations/unknowns)

Calculation of inverse matrix is equivalent to the case of m = N (basically solving Ax=b, and letting b be each column of an identity matrix)

  • Gaussian elimination and back-subst. requires operations for matrix reduction, for RHS manipulation, and for N back-substitutions, for a total of operations
  • Gauss-Jordan only requires operations for matrix inversion, so it seems to win out
  • However, because all but one of RHS vector elements are 0, the operations count for Gaussian elimination is actually lower
  • Gauss-Jordan matrix inversion costs operations

Note: the 1/6 comes from the fact that the matrix multiplication step normally requires 8 operations, but due to the sparseness of the RHS, we are only performing 1 of the 8 operations. Hence,

LU Decomposition

This is based on decomposing the matrix A into two matrices, L and U:

L is a lower triangular matrix, U is an upper triangular matrix.

To solve the linear system:

we proceed in two steps:

Equation 1 is solved by forward substitution:

and the second equation is solved by back substitution:

The advantage is that solving a linear system with a triangular matrix is much easier.

The back and forward substitutions execute operations. If there are N RHS vectors (i.e., if we're explicitly inverting a matrix), and we take into account the reduced number of operations due to all but one element in each RHS being zero, the total number of operations is:

For forward substitution, the operation count is:

For back substitution, the operation count is:

BUT, there is also the cost of performing the decomposition itself, which is not free. This can be done with Crout's Algorithm.

Adding the cost of the LU decomposition itself brings the total cost of inverting a matrix to , which is the same complexity class as Gaussian elimination. This highlights an important fact: LU decomposition is simply a modified version of Gaussian elimination.

The advantage of LU decomposition is that, once A has been decomposed into L and U, it remains unchanged and can be used to solve as many RHS as desired, one at a time. (Gauss-Jordan and Gaussian elimination both depend on manipulating the left and right sides simultaneously.)

Tridiagonal and Band-Diagonal Systems

Tridiagonal matrix systems have only two non-zero off-diagonal elements. This structure commonly arises from finite difference schemes.

For tridiagonal matrices, LU decomposition forward and back substitution only takes operations, instead of the usual

To store tridiagonal systems, it is wasteful to store the entire matrix, so only three vectors are used. The algorithm to solve the system performs decomposition and forward subsittution for each element, then performs back-substitution. Pivoting is not necessary, but this means the algorithm will failif there are zeros on the diagonals. However, many applications that lead to tridiagonal matrices have guarantees that will prevent this from failing.

Diagonal dominance: property that guarantees a zero pivot will not be encountered. This condition states:

where j indexes a row, the b elements are on the diagonal, and a and c are the off-diagonal elements.

Tridiagonal algorithm is more robust in practice than in theory - a rare occurrence.

Parallel Solution

To solve tridiagonal systems in parallel, the (large) matrix system can be permuted, and columns combined, to form a smaller tridiagonal system in the bottom-right part of the matrix.

The process starts by permuting the columns of A (and the solution vector) so that the vector (u0, u1, u2, u3, u4, u5, u6) would become the vector (u0, u2, u4, u6, u1, u3, u5). This leads to a banded diagonal matrix - a matrix with a single diagonal, plus two additional off-diagonal bands above and two off-diagonal bands below that diagonal.

Next, the even and odd rows are combined to eliminate nonzero elements in the lower left, which turns the bottom right corner of the matrix into a tridiagonal matrix.

Now the original tridiagonal linear system of size N has been reduced to a smaller tridiagonal system of size floor(N/2).

Visually:

TridiagonalParallelAlgorithm.jpg

Band Diagonal Systems

Tridiagonal systems have nonzero elements on the diagonal plus or minus one. Band diagonal systems are more general, and have m1 >= 0 nonzero elements below the diagonal, m2 >= 0 nonzero elements above the diagonal.

These linear systems can be solved using LU decomposition very fast with less storage than N x N systems.

Band-diagonal matrices: stored in compact form (matrix tilted 45 degrees clockwise, forms a block matrix of size (m1 + 1 + m2) x N, with a few nonzero elements at the ends. Algorithms should be tailored to the storage format.

It is not possible to store LU decomposition of band-diagonal matrix as compactly as the compact form of A. Crout's method produces additional nonzero fill-ins. Can store upper triangular factor U in a matrix with same shape as A, and store lower triangular matrix L in separate compact matrix of size N x m1. Diagonal elements of U are in first column of U.

Once matrix A is decomposed, the system can be solved for any number of RHS vectors and solved via back-substitution, as standard with LU systems.

iterative Improvement

Iterative improvement is the concept of using a (slightly) wrong solution to the matrix equation to compute a residual, and use the residual to improve the solution.

We are attempting to solve the equation .

Our slightly wrong solution solves the equation .

Subtracting the first from the second gives .

Now we can obtain the portion that our solution is off by, , through the equation:

Finally, once we have the error , we subtract it from our slightly wrong solution to obtain an improved solution. (And this procedure can be repeated again, and again, until our numerical precision runs out.)

If we decompose A using LU decomposition, we get an extra computational savings from being able to re-use it for each step of the solution. Iterative improvement can be implemented as a function in an LU decomposition class, since it makes use of LU decomposition.

Iterative improvement is highly recommended as a follow-up step to solving the original matrix equation Ax = b. iterative improvement is (due to a vector-matrix multiply, and a forward and back substitution to solve an LU decomposed system). This is cheap relative to the operations for solving the original matrix equation.

Iterative improvement applied repeatedly can be looked at as an iterative Newton-Raphson method for root-finding, applied to matrix inversion.

Singular Value Decomposition

SVD:

  • Powerful set of techniques for dealing with sets of equations that are singular or numerically near-singular
  • Can diagnose problems with matrix systems where LU or Gaussian elimination fails.
  • Method of choice for solving linear least-squares problems

Basic theorem of linear algebra: any M x N matrix A can be written as product of three matrices :

  • U is an M x N column-orthogonal matrix
  • W is an N x N diagonal matrix with positive or zero elements (singular values)
  • V is the transpose of an N x N orthogonal matrix

M > N corresponds to the overdetermined situation (more equations than unknowns)

M < N corresponds to the underdetermined situation (fewer equations than unknowns)

Decomposition into can always be done

W contains the singular values - it is conventional to return them in sorted, descending order

Sparse Linear Systems

Relatively small number of matrix elements are nonzero

For tridiagonal case, were able to reduce operations to and space from to by applying LU decomposition cleverly. Same thing can be done with sparse linear systems.

Special sparse forms:

  • Tridiagonal
  • Band diagonal
  • Band triangular
  • Block diagonal
  • Block tridiagonal
  • Cyclic banded
  • Singly/doubly bordered block/band diagonal/triagonal

PDEs in 2+ dimensions have special sparse forms that appear.

Sherman-Morrison Formula

When you have expended the work to invert a matrix A, and make a tiny change to it, you can usually find the corresponding change in A inverse easily, if it is of the form:

If u is unit vector, then this is equivalent to adding the components of v to the ith row. If v is a unit vector, this is equivalent to adding the components of u to the jth column. If both are unit vectors, this is equivalent to just changing element a_ij.

The Sherman-Morrison formula gives an expression for the inverse of the modified matrix. The procedure requires multiplies.

If every row is modified, you are back to method.

Cyclic Tridiagonal Systems

These occur commonly in finite difference schemes for differential equations with periodic BCs - tridiagonal matrices, with upper right and lower left components of matrix containing small number of elements

Use Sherman-Morrison formula to do the easier task of inverting the tridiagonal matrix without the corners in operations, then treat the corner components as being smal changes in A inverse.

Woodbury Formula

If there are multiple correction terms, need a form of the Sherman-Morrison formula for block matrices - this is the Woodbury Formula.

Inversion by Partitioning

A matrix that can be partitioned into a 2 x 2 matrix of sub-matrices P, Q, R, S, can have its inverse written as a 2 x 2 matrix of sub-matrices Ptilde, Qtilde, Rtilde, Stilde

Thus, the inverse of the matrix can be written in a simplified form, in terms of the inverses of P, Q, R, S, reducing the overall cost and space requirements.

Sparse Matrix Data Structure

Data structures for sparse matrix: most obvious is to maintain two lists, one of nonzero values and one of corresponding locations.

Alternative is compressed column storage format (Harwell-Boeing format). This uses three vectors - val vector for nonzero values, row_idx for corresponding row indices of each value, and col_ptr for the locations in the other two arrays that start a particular column.

Example:

3.0   0.0   1.0   2.0   0.0
0.0   4.0   0.0   0.0   0.0
0.0   7.0   5.0   9.0   0.0
0.0   0.0   0.0   0.0   0.0
0.0   0.0   0.0   6.0   5.0

This 5 x 5 matrix has 9 elements. It would be stored using three vectors: one of length 6, two of length 9:

val[0] = 3.0
val[1] = 4.0
val[2] = 7.0
val[3] = 1.0
val[4] = 5.0
val[5] = 2.0
val[6] = 9.0
val[7] = 6.0
val[8] = 5.0

row_ix[0] = 0
row_ix[1] = 1
row_ix[2] = 2
row_ix[3] = 0
row_ix[4] = 2
row_ix[5] = 0
row_ix[6] = 2
row_ix[7] = 4
row_ix[8] = 4

col_ptr[0] = 0
col_ptr[1] = 1
col_ptr[2] = 3  <-- column 2 starts at val[3]
col_ptr[3] = 5  <-- column 3 starts at val[5]
col_ptr[4] = 8
col_ptr[5] = 9

Conjugate Gradient Methods for Sparse Systems

Conjugate gradient methods provide general means of solving N x N linear systems Ax = b

Conjugate gradient methods reference A only through matrix-vector multiplication (not transpose of matrix and vector)

Simple conjugate gradient algorithm:

  • Solves Ax = b for symmetric, positive definite A matrix, by minimizing the quantity:

which is minimized when this gradient is zero:

  • Minimization carried out by generating succession of search directions, improving the minimization iteratively, searching for direction at each step that maximizes gradient descent
  • After N iterations, you arrive at minimizer over entire vector space

Ordinary conjugate gradient algorithm:

  • Minimizes arbitrary nonlinear functions.

Biconjugate gradient method:

  • Solves linear, not necessarily positive-definite, not necessarily symmetric, equations
  • This uses approach of finding sequence of vectors satisfying a biorthogonality condition and biconjugacy condition, as well as mutual orthogonality condition.
  • This uses an initial guess, computes a residual vector, and improves the estimate using the residual.
  • No guarantee the procedure will remain stable, but in practice instabilities are rare
  • Exact termination in maximum N iterations only occurs without any roundoff error - so in reality, need to set a halting criteria

Generalized minimum residual (GMRES) method:

  • Generalization of biconjugate gradient method to non-symmetric matrices
  • Applies the biconjugate gradient approach to a preconditioned form of the linear equation
  • By preconditioning, we can turn a matrix A into one that's easier to solve, converges in fewer steps
  • Overall scheme: PBCG (preconditioned biconjugate gradient method)
  • Routine given in Num. Recipes originally based on work of Anne Greenbaum

Vandermode Matrices

Special linear systems lead to more efficient/cheaper solutions, as with tridiagonal becoming instead of .

Vandermonde matrices can be solved in operations - better than the general case, although not as good as tridiagonal

Vandermonde matrices:

  • Each column is an increasing power - column 0 is 0th power, column 1 is 1st power, column 2 is 2nd power, etc.
  • Occur in polynomial fitting, distribution reconstruction, etc.
  • Completely determined by N arbitrary numbers in each row; each column is raised to some power
  • Method of solution is closely related to Lagrange's polynomial interpolation formula
  • We define components of the matrix as the coefficients that arise when the product is multiplied out and the like terms are collected
  • The term is the exact inverse of the matrix of components appearing in the A matrix
  • Warning: Vandermonde systems are notoriously ill-conditioned (hence, the reason Chebyshev fitting is so impressively accurate)
  • Higher order polynomials that are very good uniform fits to zero mean that roundoff errors can introduce substantial coefficients in the leading (higher order) terms' coefficients.
  • Always want to solve Vandermonde systems using double precision or higher

Toeplitz Matrices

Toeplitz matrices are uniquely determined by numbers. Each band of a Toeplitz matrix is identical - the diagonal consists entirely of the same number; the above-the-diagonal-by-one diagonal consists entirely of a (different) same number; etc.

Upper left to lower right diagonals are constant.

Levinson developed an algorithm for fast solution of the symmetric Toeplitz problem by a bordering method - a recursive procedure that starts with the 1-dimensional Toeplitz problem (a small element in the center), then builds the (M+1)-dimensional Toeplitz solution from that. In this way, M proceeds from 0, to 1, and so on, to N-1, and the final result is reached.

This method also generalizes to the non-symmetric case (due to Rybicki).

This computation can be done in-place.

If any diagonal elements are zero, this method will fail (pivoting is not allowed). This does not mean the matrix is singular, but it does mean this method will not work, and you should try a slower method like LU decomposition.

Levinson's method takes operations to solve. Newer, faster methods have come out that require operations instead of for Levinson's method, but these algorithms are more complicated. (See Bunch and de Hoog.)

Cholesky Decomposition

Solving a linear system with a symmetric and positive definite matrix A can be solved using a more efficient triangular decomposition method.

Note: positive definite means

Symmetric, positive-definite matrices occur in some applications

Colesky decomposition is an efficient triangular decomposition

Rather than decomposing , Cholesky decomposition decomposes A into

(also referred to as the square root of A)

Cholesky decomposition is numerically stable without pivoting

Failure of Cholesky Decomposition indicates A is not positive definite (can actually use Cholesky decomposition as a test of whether a matrix is positive definite)

QR Decomposition

QR decomposition requires twice the number of operations of LU decomposition. Not the general go-to method, but useful for some purposes.

QR decomposition is accomplished by performing a series of Householder transformations

Householder matrix is of the form , where

Why use QR decomposition? If your numerical algorithm requires solving succession of linear systems, each A being slightly different, LU decomposition will require operations each time. QR decopmosition can be updated with only operations.

QR decomposition is simple to update if A changes according to - this only changes the matrix R to be slightly different

Two phases:

  • Apply N-1 Jacobi rotations to reduce the (new) R matrix to an upper Hessenberg form
  • Apply N-1 Jacobi rotations to transform this upper Hessenberg matrix to the new R' matrix
  • The updated matrix Q' is the product of Q with these 2(N-1) Jacobi rotations

The Big-O cost of Matrix Inversion

We've worked on the assumption that matrix inversion is an process, but this is not strictly speaking true.

Naively, doing a 2 x 2 matrix times a 2 x 2 matrix will take 8 multiplication operations

But if we are clever, we can do it in 7 multiplications (with more addition - b/c there is no free lunch)

This reduces the cost from to

Fast matrix multiplication routines are more common in libraries like BLAS