Saturday, May 4, 2019

Why are V and U orthogonal in singular-value decomposition?

Singular-value decomposition is the process of factoring a matrix $A$ into $A = U\Sigma V^T$, where $\Sigma$ is a diagonal matrix that contains the singular values.

$V$ is a matrix whose columns are the eigenvectors of $A^TA$. A useful property of this decomposition is that $V$ and $U$ are both guaranteed to be orthogonal matrices. So why is this true?

First, it is helpful to know that symmetric matrices always have orthogonal eigenvectors for distinct eigenvalues. This can be shown:

For $\lambda_1 \neq \lambda_2$, and symmetric matrix $B$ with eigenvectors $\vec{x}$ and $\vec{v}$
$\lambda_1 \vec{x} \cdot \vec{v} = B \vec{x} \cdot \vec{v} = (\vec{x}^T B^T)v = \vec{x}^T B\vec{v} = \lambda_2 \vec{x}^T \vec{v} = \lambda_2 \vec{x} \cdot \vec{v}$

Therefore $\vec{x} \cdot \vec{v} = 0$; they are orthogonal. This argument only works for distinct eigenvalues of $A$. How might you show that identical eigenvalues also correspond to orthogonal eigenvectors for symmetric matrices?

This explains why $V$ is an orthogonal matrix, since we will choose to make $V$ from eigenvectors of $A^T A$ with length 1, and $A^T A$ is symmetric.

So what happens when we multiply some vector from $V$ by $A$?
$A \vec{v} = \vec{u}$
$A^T A \vec{v} = A^T \vec{u}$
$\lambda \vec{v} = A^T \vec{u}$
$\lambda A \vec{v} = AA^T \vec{u} = \lambda \vec{u}$

So then $\vec{u}$ is an eigenvector of $AA^T$ which is also symmetric, so we know all of its eigenvectors are orthogonal as well. Notice that $A^T A$ and $AA^T$ have the same eigenvalues, see the previous post for an illustration of this.

We will put all the $\vec{u}$ vectors into a matrix called $U$, and factor out the length of each vector into a the matrix $\Sigma$. We factor out the lengths so that each $\vec{u}$ will have length 1, making $U$ an orthogonal matrix.

We found that the orthogonal matrix $V$ will be transformed into the still-orthogonal $U \Sigma$.

$AV = U \Sigma$

So why is this a desirable property for the singular value decomposition?

Well, here's one way that I think about it.

$A = U \Sigma V^T = \sigma_1 \vec{u}_1 \vec{v}^T _1 + ... + \sigma_n \vec{u}_n \vec{v}^T _n$

In this form, it becomes clear that each singular value corresponds to some rank 1 matrix that add up to form the complete $A$. In some sense, each singular value represents the "importance" or amount of influence that that rank 1 matrix has on $A$.

Since $U$ and $V$ are orthogonal, we know that all the elements in their product will be in the interval $[0, 1]$. So the singular value multiplying each of them tells the "importance" of that particular rank 1 matrix, or how much influence it has on the matrix $A$.

Having $U$ and $V$ be orthogonal matrices gives us more confidence that the singular values express the actual importance of that rank 1 matrix since we know its contribution will not be "undone" by one of the other rank 1 matrices.

To show what I mean by "undone," let me illustrate adding two non-orthogonal bases to express point $P$.



Note the area in red that I've marked with a sad face. Since $\vec{v}_1$ and $\vec{v}_2$ are not orthogonal, there is some redundancy in the direction that they encode. As a consequence, both $\vec{v}_1$ and $\vec{v}_2$ must have larger coefficients to express point $P$, and those coefficients aren't as meaningful because some of the magnitude of each of the vectors only "cancels out" that of the other, meaninglessly inflating the coefficients.

On the other hand, here is the same point $P$ expressed using an orthogonal basis:


Now none of $\vec{v}_1$ or $\vec{v}_2$'s magnitude is cancelling out the other, but they solely tell us information about $P$. In this case, both coefficients decreased, but this won't necessarily be true when switching to an orthogonal basis. The point is to show that with an orthogonal basis, the coefficients carry real meaning so we can make more well informed choices about interpreting them.

In the case of singular value decomposition, it is important to know that the coefficients actually tell us how much that rank 1 matrix contributes to the final matrix. One application of this is image compression. If we know certain rank 1 matrices don't contribute much to the final image, we can discard them. This means the image will require less information to encode with little sacrifice to its visual quality.







No comments:

Post a Comment