In May, Kate and I went to AAPOR’s 70th Annual Conference in Hollywood, FL.  Kate did a more timely job of summarizing our learnings, but now that things have had some time to settle, I thought I’d discuss an issue that came up in several presentations, most memorably in Andy Peytchev’s presentation on Weighting Adjustments Using Substantive Survey Variables.  The issue is deciding which variables to use for weighting.  (And if I butcher the argument here, the errors are my own.)

Let’s take it from the top.  If your survey sample looks exactly like the population from which it was drawn, everything is peachy and there is no need for weighting.

the-hunt-for-the-last-respondentMost of the time, however, survey samples don’t look exactly like the populations from which they were drawn.  A major reason for this is non-response bias – which just means that some types of people are less likely to take the survey than other types of people.  To correct for this, and make sure that the survey results reflect the attitudes and beliefs of the actual population and not just the responding sample, we weight the survey responses up or down according to whether they are from a group that is over- or under-represented among the respondents.

So, it seems like the way to choose weighting variables would be to look for variables where the survey sample differs from the population, right?  Not so fast.  First we have to think about what weighting “costs” the margin of error for your survey.  Weights, in this situation, are measuring the extent of bias in the sample.  The size of the weights “costs” a proportional amount of expansion to the margin of error for the survey.  Meaning the precision of your estimates declines as your weighting effect increases.

What does that mean for selecting weighting variables?  It means you don’t want to do any unnecessary weighting.  Recall, the purpose of weighting is to ensure that survey results reflect the views of the population.  Let’s say the purpose of your survey is to measure preferences for dogs vs. cats in your population.  Before doing any weighting you look to see whether the proportion of dog lovers varies by age or gender or marital status or educational attainment (to keep it simple, let’s pretend you don’t have any complicated response biases, like all of the men in your survey are under 45).  If you find that marital status is correlated with preferences for dogs vs. cats, but age and gender and educational attainment aren’t, then you may want to weight your data by marital status, but not the other variables.

This makes sense, right?  If men and women don’t differ in their opinions on this topic, then it doesn’t matter whether you have a disproportionate number of women in your sample.  If you weight on gender when you don’t need to, you unnecessarily expand your margin of error for the survey without improving the accuracy of your results.  On the other hand, if married people have different preferences than single people, and your sample is skewed toward married people, by weighting on marital status you increase your margin of error, but compensate by improving the accuracy of your results.

The bottom line:  choose weighting variables that are correlated with your variables of interest as well as your non-response bias.

And that’s one to grow on!  (This blog felt reminiscent of an 80’s PSA, right?)