For Q users, we will shortly upgrade the Q weighting tool to work exactly like what I am showing you here in Displayr.
We will start by looking at whether to weight or not.
Then we will look at how weights work.
Most of our time will be spent discussing how to create weights.
At the end we will touch on statistical testing and some more exotic topics.
We weight data when there is a discrpency betwen results and facts.
For example, 12% of US adults live in California.
If our survey shows 25%, we have a discrepancy.
And, we've also got a bit of jargon. The variable in our data that is used to capture the result that we compare with the fact is referred as the adjustment variable. So, state is our adjustment variable.
We need the discrpency to be caused by having interviewed too many or two few people in population subgroups.
in this example, we've interviewed too many Californians, or, equivalently, too few non-Californians.
And, it needs to be the case that the discrpencies mean that if we don't fix them, other results will be inaccurate.
For example, if we have a study on the environment, and have too many Californians in it, this will cause the overall study to be biased.
We now know when to weight, but what is a weight?
Let's start by thinking of a a very simple survey with only 5 people in it.
We asked them what brand they purchased last.
2 said Coke
3 said Pepsi.
So, as a percentage, Coke is 40%.
OK, now let's say we also asked them how many of each brand they purchased. The first Coke buyer said 10, the second 5, and so on.
What's Coke's percentage now? That's right it's 75%.
To use the jargon, we have now done a weighted calculation, where the number purchased is the weight variable, and each of its 5 values is a weight.
Now, just a little math thing to keep in your head. If we replace these 5 weights each with the number 1, our weighed calculation will give the same answer as our unweighted calculation. You'll soon understand why I'm making this point.
In this simple example, our weight variable is the data from a question in the survey.
However, our focus today is on how to create new weights to rectify discrepancies between facts and results, so let's look into that.
How to create weights
We will work our way though a seven step process for creating weights.
The first step is to find facts that you can compare to your survey results.
Here' an example of the kind of facts we can use. Have a read.
Earlier I talked about how we focus on discrepancies that are caused by having interviewed too many or few people in a group in the population.
Can you spot any facts that don't meet this criteria?
Household income is probably one. If your survey finds that the average household income is $100,000, but the census tells us it is $62,000, it is possible that we have just interviewed too many rich people.
But it is also possible that the discrepancy is caused by people telling the government a lower number than they tell us in the survey. That is, the discrepancy may be an example of measurement error, so we wouldn't want to weight by such data unless we had no choice.
So, any facts with relatively high measurement error should not be used as targets when weighting.
And we have some more jargon. The facts are known as Targets.
But, note that we can often do clever things to make facts less susceptible to measurement error. For example, while there will be a high measurement error if we ask people how many gallons of Coke they consume a year, we can convert this to market share and remove most of the measurement error.
Create a preliminary weight
Step three to create a preliminary weight.
We will start with a very simple example to explore how it all works.
This page is a bit busy, so bear with me.
We've got a survey of 10 people.
80% are female. That is, 8 of the 10 people.
80% of people tell us their favourite celebrity is Billie Eillish.
With the gender data, we have facts at hand, and know that 50% should be male and 50% female.
On the bottom right we have a crosstab that shows us that all the women preferred Billie, and all the men like Jennifer. This is important. It tells us that the over-representation of women is also skewing our estimate of preference for Billie versus Jay Lo. If we fix the over-representation of women in the survey, we will end up changing the results.
So, we need to be weighting.
Here's the same info from the previous page, but I've added the raw data here. So, we can see what each individual said.
When we don't explicitly weight data, we are implicitly assuming that each person has a weight of 1.
The question we need to solve is, what weights should we use to remove the discrepancy between the results and the facts?
One simple solution is this.
Enter 0 for first 6.
We set the weight as 0 for six of the women.
OK, so we have now created a preliminary weight.
Check that the preliminary weight ...
So, step 4 is to check that the preliminary weight does fix the discrepancy between facts and results.
I've used weighted calculations of the results, and it is now showing 50% Male and 50% female. So, we have removed the discrepancy.
Check that the .. Changes key conclusions
Now, we want to check that the key conclusions have changed.
Before Billy had 80% preference share. After weighting she and Jay Lo are tied at 50%. So, that's all worked.
Does this mean we are all set and are finished?
We need to take time to look at the variation between the weights.
A man of influence
If you follow polling, you will know that this issue is a pretty serious one. Here's a snippet from the New York imes in 2016, which I'll give you a chance to read.
Remember how weighted calculations work. If we give a much higher weight to one person, we are in essence giving much more credence to that person's data. And, that can cause a lot of results to become very very odd. SO, we need to take care.
Here we have given some respondents a weight of 0. Others have a weight of 1, so they are in practice infinitely more influential in the analysis.
The key number to look at here is the effective sample size. The weighting we have used basically throws out the data from 6 people, and it is as if we only have a sample of 4 people. To use some more jargon, our effective sample size becomes only 4!
So, our final step is we want to create some better weights, where better means we want to reduce the variation.
There is no correct level of variation. The ratio of maximum to minimum can be even 100 in some situations.
The trick is to make sure that when it is large, it is large because we have a good reason. To use some more jargon, this a the classic tradeoff of statistics, between precision and bias.
Let's get Displayr to automatically create some weights.
Click + in Data Sets > Toy Example > Weight
Our adjustment variable is sex
Adjustment variable: Sex
Our targets are 50% and 50%
Enter 50% and 50%
Displayr shows us in the bottom right corner the same diagnostics we used before. Remember, before we had weights of 1 and 0. Displayr has computed weights between 0.625 and 2.5, where the ratio of the maximum to the minimum is 4, which is much less than the infinity we had before.
Note also that the effective sample size has increased, and applying this weight will be like having a sample size of 6.4, which is better than the sample size of 4 we had before. But, it's not as good as if we had a sample of 10 where no weighting was required. That is, weighting is going to cost us some precision in our results. Usually, weighting increases the amount of sampling error.
+ New weight
Let's have a look at the weights.
Drag across Weight:Sex and release into raw data as 3rd variable
Properties > APPEARANCE > Increase decimals by 2
So each of the females has a weight of 0.625, and each of the men 2.5.
Note that this is actually the same numbers as the discrepancies we calculated earlier. That's not a coincidence.
With a single categorical adjustment variable, the discrepancy is the same as the weight.
Let's now apply the weight to our tables and check it still works
Select two results
Weight: Weight: Sex
Great. Our results now match the facs.
And, Billy and Jay Lo are tied at 50%.
In the example we just did, our adjustment variable is categorical.
We can also have a numeric adjustment variable.
Here we will use height as the adjustment variable.
The key variable we are focused on here is whether people will or will not pay $200 extra for more leg room on a flight.
We have a small discrepancy, with our survey result showing an average height of 166.3 CM, compared to the fact that the average height is 168.
But, looking at the table on the right, we see that people that are willing to pay more are much taller. So, this tells us that weighting will change the results.
So, as with before, the unweighted results are equivalent to having a weight of 1 for each of the 10 respondents.
One simple way of increasing the average height is to increase the weight we assign to the tallest person.
Let's give them a weight of 2.
OK. Now the result is higher than the fact.
Let is try 1.5
OK, what about 1.75?
OK. We've done it. And note that our key result has changed a little, with 53% of people now estimated to pay $200.
Let's look at the diagnostics. Our maximum weight is 1.75 times as big as our small one. Let's use Displayr's algorithm rather than trying to guess. Remember this number. 1.75
+ > New weight > Height: 168
Now, here we get a maximum that is 1.4317 times bigger than the minimum. This is smaller than the 1.75 we got by trial and error, so the algorithm has done a better job.
There are a few other options we can apply to tweak things. By default, an algorithm called raking has been used. We can change algorithm and see if that improves things. And, I would stress, this is just trial and error. There's no correct algorihtm. So, remember our best is 1.4317
This gives us the same answer.
Linear. OK this is a bit better.
We can also manually specify minimum and maximum values.
Minimum weight: 0.9.
Note that we have changed the minimum, but the ratio actually got worse, as the maximum rose to compensate.
Let's try and reduce the maximum to 1.1
Maximum weight: 1.1
Mmm.. So 1.1 is not possible.
Playing around with the minimum and maximum is called trimming. It can be useful, but is usually pretty marginal.
We will look at some better strategies shortly.
Here our algorithm is attempting to find weights that fit out targets and out specified minimums and maximums.
Sometimes researchers recode the final variable to force it to be trimmed, causing a discrepancy between the survey results and facts. I am not a fan of this as it’s hard to explain the resulting weight to users. I prefer to instead change my targets explicitly, as I will soon illustrate.
Drag across Weight: Height (CM) as third variable.
So, the estimated weights increase a bit with height. They are correlated.
OK, so let's now apply the weight to the tables
Weight: Weight: Height (CM)
So, the weight makes the result match the facts.
And, and our best estimate is that 56% of people will pay $200.
In commercial work, it is very common to use Age, gender, and geography to weight.
In countries where race is a factor, like the US and South Africa, race is commonly used as a weighting variable.
In countries where class is important, like the UK, then socio-economic status is used.
Here we will do a more realistic example. We are weighting some US data by region, age, and geography.
If you look at the results and the facts, you will see we have some big discrepancies.
For example, 18 to 29 year old Females in the survey are only 0.7%%, but they are 1.8% in the population.
Let's create a weight.
+ > Weight >
I've got the targets here in a spreadsheet. They are projections from the US census.
Ah, we get an error. Our problem relates to DC. We don't have any respondents in some of the cells here, such as 18 to 29 males in DC. If there is nobody, you can't give them a weight.
As you can see, we have a lot of 0s here.
There are a few strategies for solving this.
The simplest is to merge categories, combining DC with the South.
OK, this has worked.
Note that our maximum weight is 3.34 times our smaller weight. And, our effective sample size is 94.47%.
+ New Weight
Currently our unweighted result tells us that 31% of people will say yes to whether gay marriage should be legalized.
Let's apply the weight to the results and see what happens.
Select the top two tables:
Weight: Weight: Age by Gender by Region
OK, so the targets now match, except for DC and south.
And, the Yes for marriage changes to 32%, so the weighting is not doing much.
Now, when we have created our weight, we merged DC into South as it solved a problem.
But, this may have been an unacceptable solution. Let's say our client was the district of Columbia. They aren't going to be happy if when they see the results they are always wrong for DC. How can we solve this problem.
Cells versus rims
Let's introduce some jargon.
Here's a table .
The NET row and columns are known as the rims of the table. Inside the rims are the cells. With the weighting we just did, we used the cells. But we can also use rims. This is called rim weighting.
+ Additional adjustment variable set
Note that there is a pretty large difference between our smallest and largest weight, with the ratio of the two being 5.77.
A lot of people ask what is the cutoff. They want a rule such as that the ratio should always be less than 5 or something. But, such rules would be nice in that they would be easy, but it's the wrong way to think about t.
It's always a judgment call. As we will see, the reason we are getting this large discrepency is because of DC. If we don't need to have separate targets for DC, then we can reduce this. But, if we really need to treat DC as special, then we need to live with this relatively large discrepency.
+ New Weight
Exploring the distribution of the weights
Drag across weight: Age by Gender + Region
Chart > Histogram
We can see that we have weights up to a little past 4, but most are near 1.
Drag across again, and choose Min, Median, Max and percentiles
So, nearly all the weights are very close to 1,
Drag across Age + Gender + Region
Crosstab by weight: Age by Gender + Region
Here we are showing the average weight in each of the categories. As we expected, it's DC that has the high weights and which is causing the distribution. If we can live without a specific weight for DC, we can reduce the amount of variation.
Weights and statistical testing
This is probably the single most common mistake in market research.
Remember our earlier example. We have two men in the study. Each has a weight of 2.5. So, our weighted sample size is 5.
The sample size is integral to how statistical testing works. The bigger the sample size, the more likely we are to find things significant.
But, most data analysis software makes a mistake when weights are used, and confuse the weighted sample size with the actual sample size.
There are exceptions. I'm sure you won't be surprised to learn that Q and Displayr work fine with weights.
If using R, you can get the correct answers if you us the survey package.
If using SPSS, you get the correct answers if you sue the complex samples module.
There are lots of different types of weights
Today we have focused on what is known in the statistical literature as post stratification. It's by far the most common and generally useful approach to weighting.
The example at the beginning where we looked at cola was an example of a volumetric weight.
But, there are lots of other types of weights.
These are the most common problems that come up in support tickets.
Weighting for tracking studies
If you want to create weights for tracking, there's an option so that you can set in a set of targets and have a new set of weights recomputed for every wave.
If you want to report the weighted sample size, rather than percentages, that's called using expansion weights. These are the settings.