## R.

In my sordid past, I was a data science consultant. One thing about data science that they don’t teach you at school is that senior managers…

Continue reading

If you have tried to communicate research results and data visualizations using R, there is a good chance you will have come across one of…

Continue reading

Google Trends shows the changes in the popularity of search terms over a given time (i.e., number of hits over time). It can be used…

Continue reading

The seven steps to create an online choice simulator are detailed in the post. In addition, an interactive example of a choice simulator is provided…

Continue reading

If you’ve ever wanted a deeper understanding of what’s going on behind the scenes of correspondence analysis, then this post is for you. Correspondence analysis…

Continue reading

If you have ever looked with any depth at statistical computing for multivariate analysis, there is a good chance you have come across the singular value decomposition…

Continue reading

In an earlier post I discussed how to avoid overfitting when using Support Vector Machines. This was achieved using cross validation. In cross validation, prediction accuracy is…

Continue reading

In this post I explore two different methods for computing the relative importance of predictors in regression: Johnson’s Relative Weights and Partial Least Squares (PLS) regression. Both techniques solve a problem with…

Continue reading

Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. Relative importance analysis…

Continue reading

Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. Relative importance analysis…

Continue reading

This post describes the single biggest time saving technique that I know about for highlighting significant results on a table. The table below, which shows…

Continue reading

Gradient boosting is a technique attracting attention for its prediction speed and accuracy, especially with large and complex data. Don’t just take my word for…

Continue reading

This post discusses a number of options that are available in R for analyzing data from MaxDiff experiments, using the package flipMaxDiff. For a more detailed…

Continue reading

This post shows how to use correspondence analysis to compare sub-groups. It focuses on one of the most interesting types of sub-groups: data at different points…

Continue reading

Correspondence analysis is a popular data science technique. It takes a large table, and turns it into a seemingly easy-to-read visualization. Unfortunately, it is not quite…

Continue reading

Creating the experimental design for a max-diff experiment is easy in R. This post describes how to create and check a max-diff experimental design. If you are not sure what this is, it would be best to read A beginner’s guide to max-diff first.

Continue reading

You can take your correspondence analysis plots to the next level by including images. Better still, you don’t need to paste in the images after…

Continue reading

The rhtmlLabeledScatter R package on GitHub that attempts to solve three challenges with labeled scatter plots: readability with large numbers of labels and bubbles, and the use…

Continue reading