Iterative methods - CSE Labs User Home Pages

Notation & Review of some linear algebra concepts
Iterative methods :
Notation and a brief background
• Mathematical background: matrices, inner products and
norms
• linear systems of equations
ä The set of all linear combinations of a set of vectors G =
{a1, a2, . . . , aq } of Rn is a vector subspace called the linear span
of G. Notation
span(G), span {a1, a2, . . . , aq }
ä If the ai’s are linearly independent, then each vector of span{G}
admits a unique expression as a linear combination of the ai’s. The
set G is then called a basis
ä Recall: A matrix represents a linear mapping between two vector
spaces of finite dimension n and m.
• Iterative processes
11-2
Transposition: If A ∈ Rm×n then its transpose is a matrix
C ∈ Rn×m with entries
cij = aji, i = 1, . . . , n, j = 1, . . . , m
Notation : AT .
Transpose Conjugate: for complex matrices, the transpose con¯T = AT .
jugate matrix denoted by AH is more relevant: AH = A
ä We consider now only square matrices (m = n).
ä Spectral radius = The maximum modulus of the eigenvalues
Csci 8314 – March 5, 2014
ä Trace of A = sum of diagonal elements of A.
P
tr(A) = ni=1 aii.
ä tr(A) = sum of all the eigenvalues of A counted with their
multiplicities.
ä Recall that det(A) = product of all the eigenvalues of A
counted with their multiplicities.
Example: : Trace, spectral radius, and
determinant of the matrix:
ρ(A) = maxλ∈λ(A) |λ|.
2 1
A=
.
3 0
ä Recall: limk→∞ Ak = 0 iff ρ(A) < 1.
11-3
Csci 8314 – March 5, 2014
11-4
Csci 8314 – March 5, 2014
Range and null space
ä Range:
Types of matrices (square)
Ran(A) = {Ax | x ∈ Rn}
ä Null Space:
Null(A) = {x ∈ Rn | Ax = 0 }
ä Range = linear span of the columns of A
ä Rank of a matrix
rank(A) = dim(Ran(A))
ä rank (A) = the number of linearly independent columns of (A)
= the number of linearly independent rows of A.
ä A is of full rank if rank(A) = min{m, n}. Otherwise it is
rank-deficient.
Rank+Nullity theorem for an m × n matrix:
• Symmetric matrices: AT = A.
• Hermitian matrices: AH = A.
• Skew-symmetric matrices: AT = −A.
• Skew-Hermitian matrices: AH = −A.
• Normal matrices: AH A = AAH .
• Nonnegative matrices: aij ≥ 0, i, j = 1, . . . , n (similar
definition for nonpositive, positive, and negative matrices).
• Unitary matrices: QH Q = I.
Note: if Q is unitary then Q−1 = QH .
dim(Ran(A)) + dim(N ull(A)) = n
Csci 8314 – March 5, 2014
11-5
Vector norms
Inner products and Norms
ä Inner product of 2 vectors x and y in Rn:
x1y1 + x2y2 + · · · + xnyn in Rn
Notation: (x, y) or y T x
ä For complex vectors
(x, y) = x1y¯1 + x2y¯2 + · · · + xny¯n in Cn
Norms are needed to measure lengths of vectors and closeness
of two vectors. Examples of use: Estimate convergence rate of
an iterative method; Estimate the error of an approximation to a
given solution; ...
ä A vector norm on a vector space X is a real-valued function on
X, which satisfies the following three conditions:
1. kxk ≥ 0,
Note: (x, y) = y H x
An important property: Given A ∈ Cm×n then
(Ax, y) = (x, AH y) ∀ x ∈ Cn, ∀y ∈ Cm
11-7
Csci 8314 – March 5, 2014
11-6
Csci 8314 – March 5, 2014
∀ x ∈ X,
2. kαxk = |α|kxk,
and kxk = 0 iff x = 0.
∀ x ∈ X,
3. kx + yk ≤ kxk + kyk,
∀α ∈ C.
∀ x, y ∈ X.
ä 3. is called the triangle inequality.
11-8
Csci 8314 – March 5, 2014
Example: Euclidean norm on X = Cn,
kxk2 = (x, x)1/2 =
Convergence of vector sequences
p
|x1|2 + |x2|2 + . . . + |xn|2
ä Most common vector norms in numerical linear algebra:
A sequence of vectors x(k), k = 1, . . . , ∞ converges to a vector
x with respect to the norm k.k if, by definition,
lim kx(k) − xk = 0
k→∞
kxk1 = |x1| + |x2| + · · · + |xn|,
1/2
kxk2 = |x1|2 + |x2|2 + · · · + |xn|2
,
kxk∞ = max |xi|.
i=1,...,n
ä Important point: because all norms in Rn are equivalent, the
convergence of x(k) w.r.t. a given norm implies convergence w.r.t.
any other norm.
ä Notation:
ä The Cauchy-Schwartz inequality (important) is:
|(x, y)| ≤ kxk2kyk2.
11-9
Csci 8314 – March 5, 2014
limk→∞ x(k) = x
(k)
ä Note: x(k) converges to x iff each component xi
converges to the corresoponding component xi of x
of x(k)
Csci 8314 – March 5, 2014
11-10
ä Given a matrix A in Cm×n, define the set of matrix norms
Matrix norms
ä Can define matrix norms by considering m × n matrices as
vectors in Rmn. These norms satisfy the usual properties of vector
norms, i.e.,
1. kAk ≥ 0, ∀ A ∈ Cm×n, and kAk = 0 iff A = 0
2. kαAk = |α|kAk, ∀ A ∈ Cm×n, ∀ α ∈ C
3. kA + Bk ≤ kAk + kBk, ∀ A, B ∈ Cm×n.
kAkp =
max
n
x∈C , x6=0
kAxkp
kxkp
.
ä These norms satisfy the usual properties of vector norms (see
previous page).
ä The matrix norm k.kp is induced by the vector norm k.kp.
ä Again, important cases are for p = 1, 2, ∞.
ä However, these will lack (in general) the right properties for
composition of operators (product of matrices).
ä The case of k.k2 yields the Frobenius norm of matrices.
11-11
Csci 8314 – March 5, 2014
11-12
Csci 8314 – March 5, 2014
Consistency
Important equalities:
ä A fundamental property is consistency
kABkp ≤ kAkpkBkp.
kAk1 = max
j=1,...,n
ä Consequence: kAk kp ≤ kAkkp
kAk∞ = max
m
X
i=1
n
X
i=1,...,m
ä Ak converges to zero if any of its p-norms is < 1
|aij |,
|aij |,
j=1
1/2
1/2
kAk2 = ρ(AH A)
= ρ(AAH )
,
1/2
H
H 1/2
kAkF = T r(A A)
= T r(AA )
.
ä The Frobenius norm is defined by
P
1/2
Pm
n
2
kAkF =
.
j=1
i=1 |aij |
ä Same as the 2-norm of the column vector in Cmn consisting of
all the columns (respectively rows) of A.
ä This norm is also consistent [but not induced from a vector norm]
Csci 8314 – March 5, 2014
11-13
A few properties of Symmetric Positive Definite matrices
Positive-Definite Matrices
ä Diagonal entries of A are positive. More generally, ...
ä A real matrix is said to be positive definite if
ä Diagonal block A(k : l, k : l), (k < l), is SPD
(Au, u) > 0 for all u 6= 0 u ∈ R
n
ä Let A be a real positive definite matrix. Then there is a scalar
α > 0 such that
(Au, u) ≥ αkuk22,
ä Consider now the case of Symmetric Positive Definite (SPD)
matrices.
ä Consequence 1: A is nonsingular
ä Consequence 2: the eigenvalues of A are (real) positive
11-15
Csci 8314 – March 5, 2014
11-14
Csci 8314 – March 5, 2014
ä For any n × k matrix X of rank k, the matrix X T AX is SPD.
ä The mapping :
x, y
→
(x, y)A ≡ (Ax, y)
is a proper inner product on Rn. The associated norm, denoted by
k.kA, is called the energy norm:
√
kxkA = (Ax, x)1/2 = xT Ax
ä A admits the Cholesky factorization A = LLT where L is
lower triangular
11-16
Csci 8314 – March 5, 2014
Iterative processes for linear systems
Basic relaxation techniques
In contrast with “direct” methods (Gaussian Elimination) iterative
methods compute a sequence of approximations x(k) to the
solution x. Ideally, a good approximation is obtained in a few
iterations of the process. Convergence measured by some norm.
• Relaxation methods: Jacobi, Gauss-Seidel, SOR
• Basic convergence results
Questions which arise:
• Optimal relaxation parameter for SOR
ä Why not use a direct method [always works!]
• See Chapter 4 of text for details.
ä Under which condition (s) will the method converge?
ä When to stop?
ä Can we estimate costs?
11-17
Csci 8314 – March 5, 2014
ä Gauss-Seidel iteration can be expressed as:
Basic relaxation schemes
ä Relaxation schemes: methods that modify one component of
current approximation at a time
ä
Based on the decomposition
A = D − E − F with:
D = diag(A), −E = strict lower part of
A and −F = its strict upper part.
−F
D
−E
(D − E)x(k+1) = F x(k) + b
Can also define a backward Gauss-Seidel Iteration:
(D − F )x(k+1) = Ex(k) + b
and a Symmetric Gauss-Seidel Iteration: forward sweep followed by
backward sweep.
Over-relaxation is based on the decomposition:
ωA = (D − ωE) − (ωF + (1 − ω)D)
Gauss-Seidel iteration for solving Ax = b:
ä corrects j-th component of current approximate solution, to zero
the j − th component of residual for j = 1, 2, · · · , n.
11-19
Csci 8314 – March 5, 2014
→ successive overrelaxation, (SOR):
(D − ωE)x(k+1) = [ωF + (1 − ω)D]x(k) + ωb
11-20
Csci 8314 – March 5, 2014
Iteration matrices
General convergence result
Jacobi, Gauss-Seidel, SOR, & SSOR iterations are of the form
x(k+1) = M x(k) + f
Consider the iteration:
x(k+1) = Gx(k) + f
(1) Assume that ρ(G) < 1. Then I − G is non-singular and G
has a fixed point. Iteration converges to a fixed point for any f and
x(0).
MJ ac = D −1(E + F ) = I − D −1A
MGS = (D − E)−1F = I − (D − E)−1A
MSOR = (D − ωE)−1(ωF + (1 − ω)D)
= I − (ω −1D − E)−1A
MSSOR = I − ω(2 − ω)(D − ωF )−1D(D − ωE)−1A
(2) If iteration converges for any f and x(0) then ρ(G) < 1.
Example:
Richardson’s iteration
x(k+1) = x(k) + α(b − Ax(k))
- Assume Λ(A) ⊂ R. When does the iteration converge?
11-21
Csci 8314 – March 5, 2014
A few well-known results
ä Jacobi and Gauss-Seidel converge for diagonal dominant matrices,
i.e., matrices such that
P
|aii| > j6=i |aij |, i = 1, · · · , n
ä SOR converges for 0 < ω < 2 for SPD matrices
ä The optimal ω is known in theory for an important class of
matrices called 2-cyclic matrices or matrices with property A.
Csci 8314 – March 5, 2014
11-22
ä A matrix has property A if it can be (symmetrically) permuted
into a 2 × 2 block matrix whose diagonal blocks are diagonal.
P AP
T
D1 E
=
E T D2
ä Let A be a matrix which has property A. Then the eigenvalues
λ of the SOR iteration matrix and the eigenvalues µ of the Jacobi
iteration matrix are related by
(λ + ω − 1)2 = λω 2µ2
ä The optimal ω for matrices with property A is given by
ωopt =
2
p
1 + 1 − ρ(B)2
where B is the Jacobi iteration matrix.
11-23
Csci 8314 – March 5, 2014
11-24
Csci 8314 – March 5, 2014
An observation Introduction to Preconditioning
ä The iteration x(k+1) = M x(k) + f is attempting to solve
(I − M )x = f . Since M is of the form M = I − P −1A this
system can be rewritten as
P −1Ax = P −1b
where for SSOR, we have
PSSOR = (D − ωE)D −1(D − ωF )
referred to as the SSOR ‘preconditioning’ matrix.
In other words:
Relaxation iter.
11-25
⇐⇒ Preconditioned Fixed Point Iter.
Csci 8314 – March 5, 2014