CS 450 - Lecture 3
Some notes and examples are from:
- Heath, M. T. (2018). Scientific computing: An introductory survey (Revised Second Edition). SIAM. https://doi.org/10.1137/1.9781611975581
Vector Norms
A vector norm should satisfy:
- $\lVert x \rVert \ge 0$
- $\lVert x \rVert = 0$ if and only if $x = 0$
- $\lVert \alpha x \rVert = |\alpha| \lVert x \rVert$ for all $\alpha \in \mathbb{R}$
- $\lVert x + y \rVert \le \lVert x \rVert + \lVert y \rVert$ (triangle inequality)
A norm is uniquely determined by its unit ball, which is the set of all vectors $x$ such that $\lVert x \rVert = 1$.
$p$-norms are defined as:
$$ \lVert x \rVert_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} $$
- 1-norm: $\lVert x \rVert_1 = \sum_{i=1}^n |x_i|$. In 2D, the unit ball is a diamond.
- 2-norm: $\lVert x \rVert_2 = \sqrt{\sum_{i=1}^n |x_i|^2}$. In 2D, the unit ball is a circle.
- $\infty$-norm: $\lVert x \rVert_\infty = \max_{i=1}^n |x_i|$. In 2D, the unit ball is a square.
In general, for any vector $v \in \mathbb{R}^n$, we have:
$$ \lVert v \rVert_1 \ge \lVert v \rVert_2 \ge \lVert v \rVert_\infty $$
We also have the following inequalities:
- $\lVert v \rVert_1 \le \sqrt{n} \lVert v \rVert_2$
- $\lVert v \rVert_2 \le \sqrt{n} \lVert v \rVert_\infty$
So, any of these norms differ by at most a factor of a constant.
Inner-Product Spaces
Inner-product $\langle x, y \rangle$ should satisfy:
- $\langle x, x \rangle \ge 0$
- $\langle x, x \rangle = 0$ if and only if $x = 0$
- $\langle x, y \rangle = \langle y, x \rangle$
- $\langle x, y + z \rangle = \langle x, y \rangle + \langle x, z \rangle$
- $\langle \alpha x, y \rangle = \alpha \langle x, y \rangle$ for all $\alpha \in \mathbb{R}$
Cauchy-Schwarz Inequality: $\lvert \langle x, y \rangle \rvert \le \sqrt{\langle x, x \rangle \langle y, y \rangle}$
For 2-norms, $\lvert x^T y \rvert \le \lVert x \rVert_2 \lVert y \rVert_2$.
Matrix Norms
A matrix norm should satisfy:
- $\lVert A \rVert \ge 0$
- $\lVert A \rVert = 0$ if and only if $A = 0$
- $\lVert \alpha A \rVert = |\alpha| \cdot \lVert A \rVert$ for all $\alpha \in \mathbb{R}$
- $\lVert A + B \rVert \le \lVert A \rVert + \lVert B \rVert$ (triangle inequality)
Frobenius norm: $\lVert A \rVert_F = \sqrt{\sum_{i, j} a_{ij}^2}$
Operator norm: Maximum amplification of any input vector $x$ to $Ax$:
$$ \lVert A \rVert_p = \max_{x \neq 0} \frac{\lVert Ax \rVert_p}{\lVert x \rVert_p} = \max_{x, \lVert x \rVert_p = 1} \lVert Ax \rVert_p $$
Theorem (from quiz 3): $\lVert A \rVert_\infty$ is the maximum 1-norm of the rows of $A$.
Proof.
Assume $A$ is $n \times m$. Let $1 \le i \le n$ and $v \in \mathbb{R}^m$ s.t. $\lVert v \rVert_\infty = 1$. Since $\max \\{|v_1|, \ldots, |v_m|\\} = 1$, we have
$$ |(A \cdot v)_i| \le \sum_{j=1}^m |A_{i,j}| = \lVert A_i \rVert_1 \tag{1} $$
Therefore,
$$ \lVert A \rVert_\infty \le \max\{\lVert A_1 \rVert_1, \ldots, \lVert A_n \rVert_1\} \tag{2} $$
Now, we show that a vector $v$ exists to make the equality happen.
Define vectors $v^1, \ldots, v^n$ such that $v^i_j = \text{sgn}(A_{i,j})$, where
$$ \text{sgn}(x) = \begin{cases} +1 \quad &x \ge 0 \\ -1 \quad &x < 0 \end{cases} $$
Then for all $1 \le i \le n$, $\lVert v^i \rVert_\infty = 1$ and
$$ (A \cdot v^i)_i = \sum_{j=1}^m |A_{i,j}| = \lVert A_i \rVert_1 \tag{3} $$
Choose $k = \text{argmax}_{i=1}^n \lVert A_i \rVert_1$. In case there are multiple such $k$, choose any.
Then using (1) and (3),
$$ \lVert A \cdot v^k \rVert_\infty = \lVert A_k \rVert_1 = \max\{\lVert A_1 \rVert_1, \ldots, \lVert A_n \rVert_1\} \tag{4} $$
Using 2 and 4, we conclude that $\lVert A \rVert_\infty$ is the maximum 1-norm of the rows of $A$.
Theorem (from quiz 3):
-
$\lVert A \rVert_1$ is the maximum 1-norm of the columns of $A$.
-
$\lVert A^T \rVert_1 = \lVert A \rVert_\infty$.
Note. The following properties hold for norms induced by p-norms, but may or may not hold for more general matrix norms:
-
$\lVert AB \rVert_p \le \lVert A \rVert_p \cdot \lVert B \rVert_p$
-
$\lVert Ax \rVert_p \le \lVert A \rVert_p \cdot \lVert x \rVert_p$
Example. For the matrix
$$ A = \begin{bmatrix} 1 & 0 & 3 \\ -1 & 2 & 0 \\ 4 & 0 & 1 \end{bmatrix} $$
we have: $$ \begin{align*} \lVert A \rVert_1 &= \max\{6, 2, 4\} = 6 \\ \lVert A \rVert_\infty &= \max\{4, 3, 5\} = 5 \end{align*} $$
2-norm is the largest singular value of $A$. We have:
$$ A^T A = \begin{bmatrix} 18 & -2 & 7 \\ -2 & 4 & 0 \\ 7 & 0 & 10 \end{bmatrix} $$
Its eigenvalues are $\lambda_1 \approx 22.22698, \lambda_2 \approx 6.33655, \lambda_3 \approx 3.43646$. So, $\lVert A \rVert_2 = \sqrt{\lambda_1} \approx 4.71$.
Induced Matrix Norms
We defined $\lVert A \rVert_2 = \max_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2$.
Now, the question is what is $\min_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2$? If $A$ grows this direction the least, then $A^{-1}$ grows it the most.
So,
$$ \min_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2 = \frac{1}{\lVert A^{-1} \rVert_2} $$
Matrix Condition Number
The matrix condition number bounds the worst-case amplification of the error in a matrix-vector product.
Let $x$ be a unit vector, and add some error $\delta x$ to it. In the worst case, $x$ is in the direction of the minimum amplification of $A$, and $\delta x$ is in the direction of the maximum amplification of $A$. That is, the solution shrinks and the error grows.
We want the condition number to capture this worst-case amplification of the error. So, we define it as: $\kappa(A) = \lVert A \rVert\ \cdot \lVert A^{-1} \rVert$. That is, the ratio of the maximum amplification to the minimum amplification.
By definition, $\kappa(A) \ge 1$.
If $Q$ is square and $\kappa(Q) = 1$ and $\lVert Q \rVert = 1$, then $Q$ is orthogonal. That is, $Q^TQ = I$. If $Q$ is orthogonal, then for all $v$, $\lVert Qv \rVert_2 = \lVert v \rVert_2$.
Example (from quiz 3):
$$ \begin{align*} \kappa(2A) &= \lVert 2A \rVert \cdot \lVert (2A)^{-1} \rVert \\ &= 2 \lVert A \rVert \cdot \frac{1}{2} \lVert A^{-1} \rVert \\ &= \kappa(A) \end{align*} $$
The condition number is a measure of how close a matrix is to being singular. If $\kappa(A)$ is large, then $A$ is close to being singular. Whereas if $\kappa(A)$ is close to 1, then $A$ is far from being singular.
From the definition, $\kappa(A) = \kappa(A^{-1})$. This means that if $A$ is close to being singular, then $A^{-1}$ is equally close to being singular.
Condition Number Estimation
The matrix norm $\lVert A \rVert$ is easily calculated as the maximum absolute column sum or row sum, depending on the norm used. However, calculating $\lVert A^{-1} \rVert$ is more difficult.
If $z$ is a solution to $Az = y$, then
$$ \lVert z \rVert = \lVert A^{-1} y \rVert \le \lVert A^{-1} \rVert \lVert y \rVert $$
Therefore,
$$ \frac{\lVert z \rVert}{\lVert y \rVert} \le \lVert A^{-1} \rVert $$
Thus maximizing $\dfrac{\lVert z \rVert}{\lVert y \rVert}$ gives a reasonable estimate of $\lVert A^{-1} \rVert$.
Finding such a maximum can be expensive. One strategy is to try few random vectors and take the maximum $\dfrac{\lVert z \rVert}{\lVert y \rVert}$.
Singular Value Decomposition (SVD)
Any matrix $A$ can be decomposed as $A = U \Sigma V^T$, where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix with non-negative entries.
If $A$ is invertible, then $A^{-1} = V \Sigma^{-1} U^T$.
Let $\sigma_{max}=\sigma_1, \sigma_2, \ldots, \sigma_n=\sigma_{min}$ be diagonal entries of $\Sigma$ sorted in descending order. Then $\lVert A \rVert_2 = \sigma_{max}$ and $\lVert A^{-1} \rVert_2 = \dfrac{1}{\sigma_{min}}$.
Therefore, $\kappa(A) = \dfrac{\sigma_{max}}{\sigma_{min}}$.
Example. Prove that if $\kappa_2(A) = 1$ where $A$ is $n \times n$, then $A=\alpha Q$ for some orthogonal $Q$ and scalar $\alpha$.
Proof.
Since $\kappa_2(A) = 1$, then
$$ \lVert A \rVert_2 = \dfrac{1}{\lVert A^{-1} \rVert}_2 \tag{5} $$
Consider the SVD of $A$: $A = U \Sigma V^T$, where $\Sigma = \text{diag}(\sigma_{max}, \ldots, \sigma_{min})$. Then $\lVert A \rVert_2 = \sigma_{max}$ and $\lVert A^{-1} \rVert_2 = \dfrac{1}{\sigma_{min}}$. Putting this together with (5), we have $\sigma_{max} = \sigma_{min}$. Therefore, all singular values are equal. Let $\alpha$ be the value of singular values. Then, $\Sigma = \alpha I$.
Then:
$$ \begin{align*} A &= U \Sigma V^T \\ &= U \cdot (\alpha I) \cdot V^T \\ &= \alpha U V^T \end{align*} $$
Since $U$ and $V^T$ are orthogonal, then $U V^T$ is also orthogonal. So, $A=\alpha Q$ for some scalar $\alpha$ and orthogonal $Q$.