# R.

Correspondence analysis is a popular tool for visualizing the patterns in large tables. To many practitioners it is probably a black box. Table goes in, chart comes out. In this post I explain the mathematics…

Continue reading

If you have ever looked with any depth at statistical computing for multivariate analysis, there is a good chance you have come across the singular value decomposition (SVD). It is a workhorse for techniques that decompose data, such as correspondence analysis and principal…

Continue reading

In an earlier post I discussed how to avoid overfitting when using Support Vector Machines. This was achieved using cross validation. In cross validation, prediction accuracy is maximized by varying the cost parameter. Importantly, prediction accuracy is…

Continue reading

In this post I explore two different methods for computing the relative importance of predictors in regression: Johnson’s Relative Weights and Partial Least Squares (PLS) regression. Both techniques solve a problem with Multiple Linear Regression, which can perform poorly when there are correlations…

Continue reading

Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. Relative importance analysis is a general term applied to any technique used for…

Continue reading

Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. Relative importance analysis is a general term applied to any technique used for…

Continue reading

This post describes the single biggest time saving technique that I know about for highlighting significant results on a table. The table below, which shows the results of a MANOVA, illustrates the trick. The coloring…

Continue reading

Gradient boosting is a technique attracting attention for its prediction speed and accuracy, especially with large and complex data. Don’t just take my word for it, the chart below shows the rapid growth of Google…

Continue reading

This post discusses a number of options that are available in R for analyzing data from MaxDiff experiments, using the package flipMaxDiff. For a more detailed explanation of how to analyze MaxDiff, and what the outputs…

Continue reading

This post shows how to use correspondence analysis to compare sub-groups. It focuses on one of the most interesting types of sub-groups: data at different points in time. This is variously known as trend, tracking, longitudinal and time series data. The end-goal…

Continue reading

Correspondence analysis is a popular data science technique. It takes a large table, and turns it into a seemingly easy-to-read visualization. Unfortunately, it is not quite as easy to read as most people assume. In How…

Continue reading

Creating the experimental design for a max-diff experiment is easy in R. This post describes how to create and check a max-diff experimental design. If you are not sure what this is, it would be best to read A beginner’s guide to max-diff first.

Continue reading