Using Displayr
| 30 March 2017 | by Tim Bock

Assigning Respondents to Clusters/Segments in New Data Files in Displayr

Once you have created segments or clusters, it is often useful to assign people in other data sets to the segments (this is also known as segment tagging and scoring). For example, you may want to tag a customer database with predicted segment memberships. Or, you may want to assign respondents in a tracker to segments. When doing this, there are two basic approaches:

  1. You can assign people to segments in the new data file using the same variables as used when forming the segments, or,
  2. You can predict segment membership based on a different set of variables.

Before proceeding with any of these approaches, it is a good idea to take a copy of your project and make your changes in the copy.

The basic principle underlying all of these approaches is that you create a model in one data set, and then import a revised data set, but making sure that the model does not update to reflect the new data. Then, you use the existing model to make predictions in the new data set with the new variables as inputs.

 


 

Assigning people to segments in the new data file using the same variables

The best way to do this depends on whether we have used latent class analysis (Insert > Groups/Segments (Analysis) or k-means cluster analysis (Insert > More (Analysis) > Segment > K-Means Cluster Analysis).

 


 

Segments formed using latent class analysis

A three-segment latent class solution is shown below. This has been based on a sample size of 400. To allocate people in a new data file using these segments:

  1. Click on the data set in the Data Tree.
  2. Press Update in the Object Inspector and select the new data file. You will see some warnings. Ignore them (i.e., do not follow the suggestion about modifying the segments, as this will re-run the segments on a new data file).
  3. The Groups/Segments … variable, which is in the Data Tree, has now automatically been updated, allocating people in the new data file to the segments.

 


 

Segments formed using k-means

 

A three-cluster k-means solution is shown above. To allocate people in a new data file using these segments:

  • Click on the k-means solution and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
  • Take a copy of line 2 of the code. In my example, it looks like this:
 
kmeans = KMeans(data.frame(understand, shop, key, value, interested), 
  • Click on the data set in the Data Tree.
  • Press Update in the Object Inspector and select the new data file.
  • From the Ribbon, select Insert > R (Variables) > Numeric Variable.
  • In the R Code box in the Object Inspector, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are kmeans or whatever it has been changed to and the variable names):
 
predict(kmeans, newdata = data.frame(understand, shop, key, value, interested))
  • Give the variable an appropriate Name and Label.
  • Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Properties > Inputs).
  • Press Value attributes (below Structure) and enter any labels you desire and press OK.

 


 

Predict segment membership using a different set of variables

In this scenario, segments have been formed and then a predictive model is used to predict segment membership on either:

  • A completely different set of variables (e.g., demographics, or some other data available in a customer database).
  • A subset of the variables used to create the segments. (Tip: if you are building a predictive model based on exactly the same variables as used to create segments, you are making a mistake, and should instead use the approach described in the previous section).

 

The output above from a multinomial logit (MNL) model (Insert > More (Analysis) > Regression >  Multinomial Logit), predicting segment membership based on firmographics. The goal is to now predict segment membership in a new data file, that contains the same predictor variables.

  • Click on the model output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
  • Take a copy of the line of code that looks similar to this (with different variable names):
 
glm = Regression(segmentsGXVYS ~ q1 + q2 + q3 + q4 + q5,
  • Click on the data set in the Data Tree.
  • Press Update in the Object Inspector and select the new data file.
  • Form the Ribbon, select Insert > R (Variables) > Numeric Variable.
  • In the R Code box in the Object Inspector, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are glm or whatever it has been changed to and the variable names):
 
predict(glm, newdata = data.frame(q1, q2, q3, q4, q5))
  • Give the variable an appropriate Name and Label.
  • Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Properties > Inputs).
  • Press Value attributes (below Structure) and enter any labels you desire and press OK.

Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.


Share
Twitter
Facebook
LinkedIn
GOOGLE
https://www.displayr.com/assigning-respondents-clusterssegments-new-data-files-displayr/">
RSS
Follow by Email
follow us in feedly
Recent Posts



No comment. Share your thoughts.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? *

Keep updated with the latest in data science.