CS 450 - Lecture 3

Some notes and examples are from:

Vector Norms

A vector norm should satisfy:

A norm is uniquely determined by its unit ball, which is the set of all vectors $x$ such that $\lVert x \rVert = 1$.

$p$-norms are defined as:

$$ \lVert x \rVert_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} $$

Unit balls: 1-norm (diamond) | 2-norm (circle) | ∞-norm (square)

In general, for any vector $v \in \mathbb{R}^n$, we have:

$$ \lVert v \rVert_1 \ge \lVert v \rVert_2 \ge \lVert v \rVert_\infty $$

We also have the following inequalities:

So, any of these norms differ by at most a factor of a constant.

Inner-Product Spaces

Inner-product $\langle x, y \rangle$ should satisfy:

Cauchy-Schwarz Inequality: $\lvert \langle x, y \rangle \rvert \le \sqrt{\langle x, x \rangle \langle y, y \rangle}$

For 2-norms, $\lvert x^T y \rvert \le \lVert x \rVert_2 \lVert y \rVert_2$.

Matrix Norms

A matrix norm should satisfy:

Frobenius norm: $\lVert A \rVert_F = \sqrt{\sum_{i, j} a_{ij}^2}$

Operator norm: Maximum amplification of any input vector $x$ to $Ax$:

$$ \lVert A \rVert_p = \max_{x \neq 0} \frac{\lVert Ax \rVert_p}{\lVert x \rVert_p} = \max_{x, \lVert x \rVert_p = 1} \lVert Ax \rVert_p $$

Theorem (from quiz 3): $\lVert A \rVert_\infty$ is the maximum 1-norm of the rows of $A$.

Proof.

Assume $A$ is $n \times m$. Let $1 \le i \le n$ and $v \in \mathbb{R}^m$ s.t. $\lVert v \rVert_\infty = 1$. Since $\max \\{|v_1|, \ldots, |v_m|\\} = 1$, we have

$$ |(A \cdot v)_i| \le \sum_{j=1}^m |A_{i,j}| = \lVert A_i \rVert_1 \tag{1} $$

Therefore,

$$ \lVert A \rVert_\infty \le \max\{\lVert A_1 \rVert_1, \ldots, \lVert A_n \rVert_1\} \tag{2} $$

Now, we show that a vector $v$ exists to make the equality happen.

Define vectors $v^1, \ldots, v^n$ such that $v^i_j = \text{sgn}(A_{i,j})$, where

$$ \text{sgn}(x) = \begin{cases} +1 \quad &x \ge 0 \\ -1 \quad &x < 0 \end{cases} $$

Then for all $1 \le i \le n$, $\lVert v^i \rVert_\infty = 1$ and

$$ (A \cdot v^i)_i = \sum_{j=1}^m |A_{i,j}| = \lVert A_i \rVert_1 \tag{3} $$

Choose $k = \text{argmax}_{i=1}^n \lVert A_i \rVert_1$. In case there are multiple such $k$, choose any.

Then using (1) and (3),

$$ \lVert A \cdot v^k \rVert_\infty = \lVert A_k \rVert_1 = \max\{\lVert A_1 \rVert_1, \ldots, \lVert A_n \rVert_1\} \tag{4} $$

Using 2 and 4, we conclude that $\lVert A \rVert_\infty$ is the maximum 1-norm of the rows of $A$.

Theorem (from quiz 3):

  • $\lVert A \rVert_1$ is the maximum 1-norm of the columns of $A$.

  • $\lVert A^T \rVert_1 = \lVert A \rVert_\infty$.

Note. The following properties hold for norms induced by p-norms, but may or may not hold for more general matrix norms:

  • $\lVert AB \rVert_p \le \lVert A \rVert_p \cdot \lVert B \rVert_p$

  • $\lVert Ax \rVert_p \le \lVert A \rVert_p \cdot \lVert x \rVert_p$

Example. For the matrix

$$ A = \begin{bmatrix} 1 & 0 & 3 \\ -1 & 2 & 0 \\ 4 & 0 & 1 \end{bmatrix} $$

we have: $$ \begin{align*} \lVert A \rVert_1 &= \max\{6, 2, 4\} = 6 \\ \lVert A \rVert_\infty &= \max\{4, 3, 5\} = 5 \end{align*} $$

2-norm is the largest singular value of $A$. We have:

$$ A^T A = \begin{bmatrix} 18 & -2 & 7 \\ -2 & 4 & 0 \\ 7 & 0 & 10 \end{bmatrix} $$

Its eigenvalues are $\lambda_1 \approx 22.22698, \lambda_2 \approx 6.33655, \lambda_3 \approx 3.43646$. So, $\lVert A \rVert_2 = \sqrt{\lambda_1} \approx 4.71$.

Induced Matrix Norms

We defined $\lVert A \rVert_2 = \max_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2$.

Now, the question is what is $\min_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2$? If $A$ grows this direction the least, then $A^{-1}$ grows it the most.

So,

$$ \min_{\lVert x \rVert_2 = 1} \lVert Ax \rVert_2 = \frac{1}{\lVert A^{-1} \rVert_2} $$

Matrix Condition Number

The matrix condition number bounds the worst-case amplification of the error in a matrix-vector product.

Let $x$ be a unit vector, and add some error $\delta x$ to it. In the worst case, $x$ is in the direction of the minimum amplification of $A$, and $\delta x$ is in the direction of the maximum amplification of $A$. That is, the solution shrinks and the error grows.

We want the condition number to capture this worst-case amplification of the error. So, we define it as: $\kappa(A) = \lVert A \rVert\ \cdot \lVert A^{-1} \rVert$. That is, the ratio of the maximum amplification to the minimum amplification.

By definition, $\kappa(A) \ge 1$.

If $Q$ is square and $\kappa(Q) = 1$ and $\lVert Q \rVert = 1$, then $Q$ is orthogonal. That is, $Q^TQ = I$. If $Q$ is orthogonal, then for all $v$, $\lVert Qv \rVert_2 = \lVert v \rVert_2$.

Example (from quiz 3):

$$ \begin{align*} \kappa(2A) &= \lVert 2A \rVert \cdot \lVert (2A)^{-1} \rVert \\ &= 2 \lVert A \rVert \cdot \frac{1}{2} \lVert A^{-1} \rVert \\ &= \kappa(A) \end{align*} $$

The condition number is a measure of how close a matrix is to being singular. If $\kappa(A)$ is large, then $A$ is close to being singular. Whereas if $\kappa(A)$ is close to 1, then $A$ is far from being singular.

From the definition, $\kappa(A) = \kappa(A^{-1})$. This means that if $A$ is close to being singular, then $A^{-1}$ is equally close to being singular.

Condition Number Estimation

The matrix norm $\lVert A \rVert$ is easily calculated as the maximum absolute column sum or row sum, depending on the norm used. However, calculating $\lVert A^{-1} \rVert$ is more difficult.

If $z$ is a solution to $Az = y$, then

$$ \lVert z \rVert = \lVert A^{-1} y \rVert \le \lVert A^{-1} \rVert \lVert y \rVert $$

Therefore,

$$ \frac{\lVert z \rVert}{\lVert y \rVert} \le \lVert A^{-1} \rVert $$

Thus maximizing $\dfrac{\lVert z \rVert}{\lVert y \rVert}$ gives a reasonable estimate of $\lVert A^{-1} \rVert$.

Finding such a maximum can be expensive. One strategy is to try few random vectors and take the maximum $\dfrac{\lVert z \rVert}{\lVert y \rVert}$.

Singular Value Decomposition (SVD)

Any matrix $A$ can be decomposed as $A = U \Sigma V^T$, where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix with non-negative entries.

If $A$ is invertible, then $A^{-1} = V \Sigma^{-1} U^T$.

Let $\sigma_{max}=\sigma_1, \sigma_2, \ldots, \sigma_n=\sigma_{min}$ be diagonal entries of $\Sigma$ sorted in descending order. Then $\lVert A \rVert_2 = \sigma_{max}$ and $\lVert A^{-1} \rVert_2 = \dfrac{1}{\sigma_{min}}$.

Therefore, $\kappa(A) = \dfrac{\sigma_{max}}{\sigma_{min}}$.

Example. Prove that if $\kappa_2(A) = 1$ where $A$ is $n \times n$, then $A=\alpha Q$ for some orthogonal $Q$ and scalar $\alpha$.

Proof.

Since $\kappa_2(A) = 1$, then

$$ \lVert A \rVert_2 = \dfrac{1}{\lVert A^{-1} \rVert}_2 \tag{5} $$

Consider the SVD of $A$: $A = U \Sigma V^T$, where $\Sigma = \text{diag}(\sigma_{max}, \ldots, \sigma_{min})$. Then $\lVert A \rVert_2 = \sigma_{max}$ and $\lVert A^{-1} \rVert_2 = \dfrac{1}{\sigma_{min}}$. Putting this together with (5), we have $\sigma_{max} = \sigma_{min}$. Therefore, all singular values are equal. Let $\alpha$ be the value of singular values. Then, $\Sigma = \alpha I$.

Then:

$$ \begin{align*} A &= U \Sigma V^T \\ &= U \cdot (\alpha I) \cdot V^T \\ &= \alpha U V^T \end{align*} $$

Since $U$ and $V^T$ are orthogonal, then $U V^T$ is also orthogonal. So, $A=\alpha Q$ for some scalar $\alpha$ and orthogonal $Q$.