In a recent post, I talked about the problem of professional respondents, and specifically people who cheat to earn their incentive.  At the end of the post, I posed the question, “what can we do?”

Here I provide some basics on how to ensure the quality of your online data.

Survey Design

  1. Screeners. Screeners shouldn’t broadcast the type of respondent needed to qualify for the survey.
  2. Design. Is your survey engaging so people don’t want to “speed” through it?  Once respondents become bored, they’ll hurry to finish and/or lack focus to answer your questions accurately.
  3. Engagement. Does your survey make the respondent feel like they’re contributing?  That they’re able to tell you what they really think?  Are all the possible answer choices present so respondents don’t become frustrated that they cannot answer?
  4. Length. Is your survey sufficiently “short”?  Longer surveys will cause respondents to not complete the survey, or worse, speed through it.  What is sufficient, of course, depends on the type of respondent and the subject matter.
  5. Experience. Does the survey provide a good user experience?  Instructions should be clear, layout clean, and repetition of questions kept to a minimum.
  6. Reality checks. Include questions telling respondents which answer to select to test that they’re reading the question.  For example, “Please check answer choice two in this question.”
  7. Consistency and opposite wording. Are respondents consistent with their responses on similar questions?  Ask two similar (thought sometimes reversed) questions – often at different spots in the survey.
  8. Red herrings. Does a respondent indicate they have done something or seen something that does not exist?  Include nonexistent choices in your response options.

Sample Development (as it mostly relates to using panels)

  1. Joining. How was the panel developed?  Can anyone join?  The obvious answer to this is no.  The panel should recruit from invitations instead of allowing anyone to join or a snowball method where friends of members can join.  Additionally, the panel should be recruited as “randomly” as possible for the given population.
  2. Frequency. How often do the panel members participate in surveys?  (While it is presumed that someone taking too many surveys isn’t a preferred method, how many is too many is not exactly known currently.)
  3. House cleaning. How is the panel “cleaned”?  Does the panel filter for duplicate information (people who are registered twice)?  Do they remove known cheaters?
  4. Overlap. If you’re using multiple panels for one project, is there overlap?  Can you filter for duplicates?
  5. Personalized. How personalized is your invitation?  This can help the respondent know their contribution is important.
  6. Tokens or other personal codes. Does your invitation only allow a respondent to take the survey once and prevent him/her from passing the survey on to others? (See another recent post on this issue here.)  Some type of an individual code should be provided, ideally hidden so the respondent cannot alter the code.

Data Cleaning

  1. Compare survey responses. Are different surveys exactly the same in your database?  While not necessarily evidence enough to discard the data – depending on the survey it may be likely that two people responded the same – combined with other tests below, identically completed surveys may create the case to discard data.
  2. Digital fingerprinting. Was the same machine used to take multiple surveys?  Digital fingerprinting can vary on what data is collected, but often includes at a minimum browser settings, IP address, and view settings.
  3. Speed. How quickly did the respondent finish?  If the respondent took exceptionally little time to complete the survey (as often determined by the time it took other survey takers), then the survey should be flagged.
  4. Patterns. Are there pretty pictures in your data?  Data should be checked for straight-lining.  (Asking divergent questions, as stated above can also help.)

This is by no means a comprehensive list of tools to prevent and catch cheaters – with the dynamic nature of the Internet, what works today will likely fail to protect your data tomorrow.

What else have you done to ensure high quality data?

Photo from http://www.wikihow.com/Draw-Using-Scantrons