Monday, April 22, 2019

Why do ATA and AAT have the same eigenvalues?

Why is it that $A^TA$ and $AA^T$ have the same non-zero eigenvalues? A symbolic proof is not hard to find, but as usual, I prefer to find a way to visualize it in order to gain a better mathematical intuition.

Let $\vec{x}$ be an eigenvector of $A^TA$.


We start with vector $\vec{x}$. $A$ transforms $\vec{x}$ into some arbitrary vector $A\vec{x}$. This is multiplied by $A^T$ resulting in $A^TA\vec{x}$. But remember, we defined $\vec{x}$ as an eigenvector of $A^TA$, so by definition $A^TA\vec{x} = \lambda \vec{x}$.

Now we're almost back to where we started, except $\vec{x}$ is being multiplied by a scalar! So if $\lambda \vec{x}$ undergoes another linear transformation, the result will be $\lambda$ times the transformation of $\vec{x}$.

So what if we choose to multiply $\lambda \vec{x}$ by $A$? We get $\lambda A \vec{x}$. But to get to this point, we multiplied $A \vec{x}$ by $AA^T$.

This means that $A \vec{x}$ is an eigenvector of $A^TA$ with eigenvalue $\lambda$! This is the same eigenvalue that we found by multiplying $\vec{x}$ by $A^TA$!

Update: This is actually true for any matrices $AB$ and $BA$, not only a matrix and its transpose. Thanks reddit user etzpcm for pointing this out!

Friday, April 19, 2019

Exploring the Dot Product

Let $\vec{v}_1 = \left[ \begin{matrix} a\\ b \end{matrix} \right]$ and $\vec{v}_2 = \left[ \begin{matrix} c\\ d \end{matrix} \right]$.

It is often taught that the dot product $\vec{v}_1 \cdot \vec{v}_2$ can be interpreted as projecting $\vec{v}_2$ onto $\vec{v}_1$ (or vice versa), which I will refer to as $\vec{v}_2'$, and then multiplying the lengths of $\vec{v}_2'$ and $\vec{v}_1$. Or equivalently,

$\vec{v}_1 \cdot \vec{v}_2 = \cos{\theta}\mid\vec{v}_1\mid\mid\vec{v}_2\mid = \mid\vec{v}_1\mid\mid\vec{v}_2'\mid$


But where did this interpretation come from? Why does multiplying each component of the vectors and adding them correspond to projecting then multiplying the lengths?

It helps me to think of it this way: the dot product is a measure of similarity between two vectors. Of course, their magnitudes play a role in the product as well, but that is not the point. In the case of projection, you trash the component of $\vec{v}_2$ that is orthogonal to $\vec{v}_1$, and you are only left with the part of $\vec{v}_2$ that is similar to $\vec{v}_1$.

Multiplying their corresponding components works in much the same way. You break each vector into its orthogonal components, and multiply them with the corresponding orthogonal components of the other vector.

$\vec{v}_1 \cdot \vec{v}_2 = (\vec{v}_a + \vec{v}_b) \cdot (\vec{v}_c + \vec{v}_d)$

$=\vec{v}_a \cdot \vec{v}_c + \vec{v}_a \cdot \vec{v}_d + \vec{v}_b \cdot \vec{v}_c +
\vec{v}_b \cdot \vec{v}_d$

$=\vec{v}_a \cdot \vec{v}_c + 0 + 0 + \vec{v}_b \cdot \vec{v}_d$

$=\vec{v}_a \cdot \vec{v}_c + \vec{v}_b \cdot \vec{v}_d$

$=ac + bd$

This can be visualized nicely:


In the image, it becomes apparent that multiplying each component of the two vectors really represents multiplying their "similarity" in the horizontal direction and the vertical direction then adding them up to sum to the "total similarity." This summation of the similarity in both directions really is exactly the same as the projection method-- because projecting a vector onto another is keeping only the "similar" components between the vectors.

It use to puzzle me how the computation of the dot product corresponds with the geometric interpretation, but this idea of similarity between the vectors has helped me gain some intuition. It also makes sense that the dot product of two orthogonal vectors is 0, since they share no similarity in the direction they point.

Often when multiplication is involved, it can be helpful to visualize with areas. So I will leave with this last visualization of the dot product in which the area of the two outer boxes sums to the area of the inner one.


Let me know if you have any other visualizations or ideas to help with gaining an intuition for the dot product.

Wednesday, April 17, 2019

Double Angle

Next time you come across $\sin{2\theta}$,
Forget about your trig sheet of unexplained data.
Recall how to rotate a vector by some degrees

$\left[\begin{matrix}\cos{\theta} & -sin{\theta}\\ sin{\theta} & cos{\theta}\end{matrix}\right]$


And to do it again, we'll need two of these.

$\left[\begin{matrix}\cos{\theta} & -sin{\theta}\\ sin{\theta} & cos{\theta}\end{matrix}\right]$$\left[\begin{matrix}\cos{\theta} & -sin{\theta}\\ sin{\theta} & cos{\theta}\end{matrix}\right]$

Let us find the matrix composition,

$\left[\begin{matrix}\cos^{2}{\theta}-sin^{2}{\theta} & -2sin{\theta}cos{\theta}\\ 2sin{\theta}cos{\theta} & cos^{2}{\theta}-sin^{2}{\theta}\end{matrix}\right]$

to guide our vector to its final position.

But wait!
Rotating by the same angle twice
is rotating by $2\theta$ but less concise!

$\left[\begin{matrix}\cos{2\theta} & -sin{2\theta}\\ sin{2\theta} & cos{2\theta}\end{matrix}\right] = \left[\begin{matrix}\cos^{2}{\theta}-sin^{2}{\theta} & -2sin{\theta}cos{\theta}\\ 2sin{\theta}cos{\theta} & cos^{2}{\theta}-sin^{2}{\theta}\end{matrix}\right]$

Hmm... So that means

$\cos{2\theta} = \cos^{2}{\theta} - \sin^{2}{\theta}$
$\sin{2\theta} = 2\sin{\theta}\cos{\theta}$