Motivation

Mainly inspired by this post by Daniela Witten when she was the author of the Women in stat twitter account. I will collect here tricks and ideas behind the SVD.

So yesterday I asked you all what you wanted to hear about from me this week, and one answer stood out from all the others: the SINGULAR VALUE DECOMPOSITION (SVD).https://t.co/9Nyqp2Kr9k

1/
— Women in Statistics and Data Science (@WomenInStat) July 21, 2020

The Model

$X=UDV^T$

$X \in \mathbb{R}^{n \times p}$, w.log $n > p$ in the statsitics context we can think of X as the data matrix, here in most cases we indeed have more observations then variables. $U$ is an $n \times p$ matrix, and $D$ and $V$ have dimension $p \times p.

decomposiition is unuqie

$U$ and $V$ are orthorgonal so it holds that $U^T U = I$, $V V^T = I$ and $V^T V = I$ (she writes $U U^T \neq I$ why? need to check this) the columns of $U$ and $V$ are called the left and right singular vectors

$D$ is diagonal with nonnegative and decreasing elements: $D_{11} \geq D_{22} \geq \ldots \geq D_{pp} \geq 0$ these are called singular values

In words the idea behind SVD is as follows. Given some matrix Recall that this is a mapping which takes a p dimensional vector as input and and transforms it into a n dimensional vector. Graphically it stretches the vector and rotates it. The SVD decomposes this operations in its seperate parts. To do this we take a orthonormal basis in the p dimensional space, apply the X mapping to get a new basis in n.

number non zero singular values is the rank

high colinearity if d_11 / d_pp is "large"

computing svds

$X'X=V \Sigma^2 V'$

$X'XV=V \Sigma^2$ (eigenvalue problem) -> $\lambda_j=\sigma_j^2$ and eigenvectors are $V$

$X X'=U=U \Sigma^2$

Intuition

Rewrite as $XV=UD$. Remember $U$ and $V$ are orthonormal basis. So X acts on its orthonormal system by stretching and rotating it. "XV \approx U$ captures the rotaing of the basis (up to to stretching) finally we stretch it to the correct position by ultipling with the singular values

Missing Value imputation

See gives a recipe to fill in missing values, assuming they are missing at random. The procedure is as follows:

Fill in missing values with mean of column
calc svd
get rank k approximation
replace the entries at the missing values with the rank k approxmation
repeat until convergence

Helper Functions

def rank_k_approx(A, k):
    """A is matrix to be approximated."""

    k = k -1 #for python indexing, if i want rank 1 approx need to index with 0


    U, D, V_T = svd(A,
                    full_matrices=False,
                    compute_uv=True,
                    hermitian=False)


    return U[:, :k] @ np.diag(D)[:k, :k] @ V_T[:k, :]

def svd_impute(df, k):

    df_work = df.copy()

    #impute with mean
    df_work.fillna(df_work.mean(), axis = 0, inplace = True)

    err = []
    while True:

        approx = rank_k_approx(df_work, k)

        approx = pd.DataFrame(approx)

        df_work_proposal = df.combine_first(approx)

        current_error = np.linalg.norm(df_work_proposal - df_work, ord = 'fro')

        err.append(current_error)


        if current_error < .005:

            return df_work_proposal, err

        else:

            df_work = df_work_proposal

Plot for the Blog Post

Sources

https://www.youtube.com/watch?v=EokL7E6o1AE&list=PLtnewl6Gh3udJIG9w1EqHqqTG0tH0YSCx&index=10&t=69s

Singular Value Decomposition