# Singular Value Decomposition (SVD): Tutorial Using Examples in R

If you have ever looked with any depth at statistical computing for multivariate analysis, there is a good chance you have come across the *singular value decomposition (SVD). *It is a workhorse for techniques that decompose data, such as *correspondence analysis *and *principal components analysis. *In this post I explain, at an intuitive level, how it works. I demonstrate this using examples in R. If you have not come across the SVD before, skip this post! It is only for that rare connoisseur, who has heard of it, wants to understand it a bit better, but is averse to lots of maths.

# A singular value decomposition case study in R

The table below shows the *standardized residuals *from a *contingency table *showing the relationship between education and readership of a newspaper. The R code used to generate the table is below. More about this data and R code, and why it is interesting, will be available in my forthcoming post about the maths of correspondence analysis.

education.by.readership = matrix(c(5, 18, 19, 12, 3, 7, 46, 29, 40, 7, 2, 20, 39, 49, 16), nrow = 5, dimnames = list( "Level of education" = c("Some primary", "Primary completed", "Some secondary", "Secondary completed", "Some tertiary"), "Category of readership" = c("Glance", "Fairly thorough", "Very thorough"))) O = education.by.readership / sum(education.by.readership) E = rowSums(O) %o% colSums(O) Z = (O - E) / sqrt(E)

# How to compute the SVD

The table above is a *matrix *of numbers. I am going to call it **Z**. The singular value decomposition is computed using the svd function. The following code computes the singular value decomposition of the matrix **Z**, and assigns it to a new object called SVD, which contains** **one vector, d, and two matrices, u and v. The vector, d, contains the *singular values. *The first matrix, u, contains the *left singular vectors, *and v contains the *right singular vectors. *The left singular vectors represent the rows of the input table, and the right singular vectors represent their columns.

SVD = svd(Z)

# Recovering the data

The singular value decomposition (SVD) has four useful properties. The first is that these two matrices and vector can be “multiplied” together to re-create the original input data, **Z**. In the data we started with (**Z**), we have a value of -0.064751 in the 5th row, 2nd column. We can work this out from the results of the SVD by multiplying each element of d with the elements of the 5th row of u and the 2nd row v.

That is: -0.064751 = 0.2652708*0.468524*(-0.4887795) + 0.1135421*(-0.0597979)*0.5896041 + 0*(-0.6474922)*(-0.6430097)

This can be achieved in R using the code:

sum(SVD$d * SVD$u[5, ] * SVD$v[2, ])

Better yet, if we want to recompute the whole table of numbers at once, we can use a bit of matrix algebra:

SVD$u %*% diag(SVD$d) %*% t(SVD$v)

Now, at first glance this property may not seem so useful. Indeed, it does not even seem very clever. We started with a table of 15 numbers. Now, we have one vector and two tables, containing a total of 27 numbers. We seem to be going backwards!

# Reducing the data

The second useful property of the SVD relates to the values in d. They are sorted in descending order (ties are possible). Why is this important? Take a look at the last value in d. It is 2.71825390754254E-17. In reality, this is 0 (computers struggle to compute 0 exactly). When recovering the data, we can ignore the last value of d, and also the last column of each of u and v. Their values are multiplied by 0 and thus are irrelevant. Now, we only have 18 numbers to look at. This is still more than the 15 we started with.

The values of d tell us the relative importance of each of the columns in u and v in describing the original data. We can compute the *variance* in the original data (**Z**) that is explained by the columns by first squaring the values in d, and then expressing these as proportions. If you run the following Rcode, it shows that the first *dimension* explains 85% of variance in the data.

variance.explained = prop.table(svd(Z)$d^2)

So, if we are happy to ignore 15% of the information in the original data, we only need to look at the first column in u and the first column in v. Now we have to look at less than half the numbers that we started with.

Halving the number of numbers to consider may not seem like a sufficient benefit. However, the bigger the data set, the bigger the saving. For example, if we had a table with 20 rows and 20 columns, we may only need to look at the first couple of columns, only needing to consider 10% of the number of values that we started with. This is the basic logic of techniques like principle components analysis and correspondence analysis. In addition to reducing the number of values we need to look at, this also allows us to chart the values, which saves more time. There is rarely a good way to chart 20 columns of data, but charting 2 columns is usually straightforward.

# Two more properties

The third property of the SVD is that the rows of u represents the row categories of the original table, and the rows of v represent the column categories. The fourth property is that the columns of u are orthogonal to each other, and the columns of v are orthogonal to each other. With these two properties combined, we end up with considerable simplicity in future analyses. For example, this allows us to compute uncorrelated principal components in principal components analysis and to produce plots of correspondence analysis. I will walk through this in detail in my forthcoming post on the math of correspondence analysis.

All the *R* code in this post has been run using Displayr. Anyone can explore SVD and the R code used in this post by logging into Displayr.

#### Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.

Could you show how to confirm that the columns of u are ortoghonal to each other? For example, the correlation between the first and second column is approximately 0 (cor(SVD$u[,1], SVD$u[,2])), but the correlation between the first and the third column is -0.127 (cor(SVD$u[,1], SVD$u[,3])).

Same happens with columns of v.

Ah, yes that is quite surprising isn’t it. In lots of common situations when variables are orthogonal this means that they are not correlated. For example, this is true in PCA. If I type into google it even tells me that ‘Uncorrelated means orthogonal’. However, where the variables averages are not 0, you can have variables that are orthogonal but are correlated. This is the case here. If you sign into Displayr and look at the document, you will see I have shown that they are orthogonal on the fourth page, but that they are correlated on the fifth. If you google ‘The American Statistician Linearly Independent, Orthogonal, and Uncorrelated Variables’ you will find a nice paper on the topic.

Cheers,

Tim

Thanks for the clarification! I was not aware of this distinction.

Thank you for a very straightforward and clear explanation.

cheers, Emil