Sign up for the Corona Observer quarterly e-newsletter

Stay current with Corona Insights by signing up for our quarterly e-newsletter, The Corona Observer.  In it, we share with our readers insights into current topics in market research, evaluation, and strategy that are relevant regardless of your position or field.

View our recent newsletter from July, 2017 here.

* indicates required




We hope you find it to be a valuable resource. Of course, if you wish to unsubscribe, you may do so at any time via the link at the bottom of the newsletter.

 


The drama of measuring strategic progress

Every week from August through January, millions of Americans tune in three nights a week to watch modern-day gladiators battle over a single objective – strategically progressing an oblong ball down the field in 10-yard increments towards the finish line at the 100-yard mark (the end zone, for the growing number of people not tuning in). We can use the word “strategically” here because the offensive team’s endeavor involves an objective (moving the ball into the end zone); a scope or domain (inside the touch lines and based on field position); and an advantage the offense will try to utilize (a dual-threat QB, for instance).

As we know at Corona from our strategic consulting experience, any effective strategy needs to have these three aspects clearly delineated to succeed.  Even the best strategies, though, can fail when it comes to measuring the progress of its execution. This is as true in football as it is in organizational strategic planning. What organizations and football teams don’t always account for in measuring the progress toward a strategic plan’s execution is the human experience—the engagement side of strategy execution.

In the case of measuring the strategic progress of an NFL team advancing (or not) towards the end zone, the engagement comes in the drama of the struggle between the two teams and the theatrics that go into measuring each team’s progress. With all of its resources as an organization, why else would the NFL continue to use measurement techniques—like having a referee “eyeball” the spot of the ball and then trotting out crews of men with 10-yard-long chains to verify said spot—that are technically neither precise nor accurate if not to engage viewers in the theater of strategy execution, of collectively progressing towards a clearly defined objective?

The lesson is not that precision and accuracy are unimportant—my coworkers would be especially unhappy with that conclusion.  Instead, it is that organizations should be imaginative and intentional not only in how they wish to engage customers and employees around an inspiring strategy, but also how they can incorporate techniques of measurement and naturally occurring data that encourage customers and employees to experience the drama that goes into measuring an organization’s progress towards accomplishing its strategic objective(s).

This gels with some of the research on millennials, who, in addition to being a substantial set of consumers both now and into the future of the American economy, exhibit several characteristics indicating their receptiveness to being engaged throughout the process of strategic planning, including in the measurement of its execution. As previously noted by Corona’s resident millennials expert, millennials often seek to be engaged in nearly every aspect of an organization – from co-creating products to prioritizing the experience of an organization over even the products themselves (pdf).

Though this might be an off year for the NFL, now is an ideal time for organizations to consider taking a page out of the NFL’s playbook and make an effort to engage customers and employees in the drama of measuring strategic progress.


Measuring Reactions to Your Ideas

Market research can be painful sometimes.  You may have poured your heart and soul into an idea and feel it’s really good, only to put it in front of your customers and hear all the things they hate about it.  But it’s better to know in advance than to find out after you’ve spent a ton of money and risked your brand equity for your idea.

It may not be as sexy as measuring customer satisfaction, prioritizing product features, or helping you optimize your pricing strategies, but sometimes market research is simply necessary to make sure that you haven’t overlooked something important when developing a product, service, or marketing campaign.  No matter how much we try to put ourselves in the shoes of our customers, it is impossible to be 100% sure that your own background and experiences have ensured that you fully understand the perspectives of customers who come in a huge variety of shapes and sizes.

In our own work, we frequently work with advertising agencies to help inform and evaluate ad campaigns and media before launch.  Considering the enormous amount of money required to reach a wide audience (though television, radio, online ads, etc.), it just makes sense to devote a small part of your budget to running the campaign by a variety of people in your audience to make sure you know how people might react.

In some cases, what you learn might be fairly minor.  You might not have even noticed that your ad lacks diversity.  You might not have noticed that your ad makes some people feel uncomfortable.  Or perhaps, your own world view has given you a blind spot to the fact that your ad makes light of sensitive issues, such as religion, major tragedies, or even date rape.

Unfortunately, we saw an example of this issue in Denver recently, where a local coffee chain’s attempt at humor infuriated the local neighborhood with a sign that read, “Happily gentrifying the neighborhood since 2014.”  From the perspective of someone less engaged in the neighborhood, you can understand what they were getting at – that good coffee was a sign of progress in the natural development of a thriving city.

However, the statement completely misses the fact that gentrification often results in people being forced from the homes they have lived in for years and the destruction of relationships across an entire neighborhood.  In this particular case, the coffee shop was located directly in the middle of a neighborhood that has been struggling with gentrification for the past decade or more, and tensions were already high.  The ad was like throwing gasoline on a fire and has resulted in protests, graffiti, and even temporary closure of the store.

It’s certainly easy to blame the company, the ad agency, and anyone else that didn’t see that this campaign would be a bad idea.  However, the reality is that all of us have our blind spots to sensitive issues, and no matter how much we feel like we understand people of different backgrounds, there will always be a chance you’ve missed something.

So, please, for the sake of your own sanity and those of your customers, do some research before you launch a marketing campaign.  At a minimum, run your ad by some people who might see it just to see how they react.  And if you want a more robust evaluation of your campaign, which can help to ensure that your advertising dollars have the biggest impact possible, we can probably help.


How do you measure the value of an experience?

When I think about the professional development I did last week, I would summarize it thusly: an unexpected, profound experience.

I was given the opportunity to attend RIVA moderator training and I walked away with more than I ever could have dreamed I would get. Do you know that experience where you think back to your original expectations and you realize just how much you truly didn’t understand what you would get out of something? That was me, as I sat on a mostly-empty Southwest plane (156 seats and yet only 15 passengers) flying home. While you can expect a RIVA blog to follow, I was struck by the following thought:

What does it mean to understand the impact your company, product, or service has on your customers?

I feel like I was born and raised to think quantitatively. I approach what I do with as much logic as I can (sometimes this isn’t saying much…) When I think about measuring the impact a company, product, or service has on its customers, my mind immediately jumps to numbers – e.g. who (demographically) and how satisfied are they with it. But am I really measuring impact? I think yes and no. I’m measuring an impersonal impact; one that turns people into consumers and percentages. The other kind of impact largely missed in quantitative research is the impact on the person.

If I were to fill out a satisfaction or brand loyalty survey for RIVA, I would almost be unhappy that I couldn’t convey my thoughts and feelings about the experience. I don’t want them to know just that I was satisfied. I want them to understand how profound this experience was for me. When they talk to potential customers about this RIVA moderator class, I want them to be equipped with my personal story. If they listen and understand what I say to them, I believe they would be better equipped to sell their product.

This is one of the undeniable and extremely powerful strengths of qualitative research. Interviews, focus groups, anything that allows a researcher to sit down and talk to people is creating some of the most valuable data that can be created. We can all think of a time where a friend or family member had such a positive experience with some company, product, or service that they just couldn’t help but gush about it. Qualitative research ensures that valuable of that feedback is captured and preserved. If you want to truly understand who is buying your product or using your service, I cannot stress the importance of qualitative research enough.


New Case Study: Summit County Health

We are excited to share a new case study about our work with Summit County, CO. This case study in particular is great for us to share as it showcases the research, and resulting use of the findings, to inform public information campaigns. Furthermore, the topic of the research, marijuana use and safety, is a relatively new area for public health research as legalization in various forms expands. Corona is proud to be a leader in this space, and more importantly, to be informing so many public campaigns.

You can view the case study here.

This research was also recently presented at the APHA Annual Conference in Atlanta, GA.


Questions? Conversation killers?

What happened to me? When did my conversation style stray into a revolving game of 20 Questions (a 1940s era TV show)? I’m a big believer in the power of questions but too much is too much. There, I’ve said it.

I suspect working with a bunch of wicked smart and curious researchers is part of the problem. I’m surrounding by people intent on determining the right questions to ask. And they like to think of questions by posing questions.

What happens when a question becomes the default? Have I lost my ability to speak in statements? To declare what I believe or think or know?

When something becomes an unintentional habit – a settled or regular tendency or practice, especially one that is hard to give up – it’s time to establish new patterns.

And thus I’ve commenced the somewhat clunky experience of rephrasing questions into statements. If you experience me fumbling about with words a bit more than normal, you’ll know I’m in the midst of behavior change.

Hey, I wonder if anyone has done any research on this?

PS – If 20 questions aren’t enough I can point to a list of 200 conversation starters.


Measurement Ideas in Evaluation

Kate Darwent and I are just back from the annual conference of the American Evaluation Association (AEA), which was held in Washington, DC this year.  We attended talks on a wide variety of topics, and attended business meetings for two interest groups (Needs Assessment and Independent Consulting).  Below, I discuss some of the current thinking around two very different aspects of measuring outcomes in evaluation.

Selecting Indicators from a Multitude of Possibilities

One session that I found particularly interesting focused on how to select indicators for an evaluation – specifically, what criteria should be used to decide which indicators to include in an evaluation. (This is a recurring topic of interest for me;  I mentioned the problem of too many indicators in a long ago blog, here.) In evaluation work, indicators are the measures of desired outcomes.  Defining an indicator involves operationalizing variables, or finding a way to identify a specific piece of data that will indicate whether an outcome has been achieved.  For example, if we want to measure whether a program increases empathy, we have to choose a specific survey question, or scale, or behavior that we will use to measure empathy at baseline and again after the intervention to see if scores go up over that period.  For any given outcome there are many possible indicators, and as a result, it is easy to get into a situation known as “indicator proliferation”.  At AEA, a team from Kimetrica gave a talk proposing a set of criteria for selecting indicators.  They proposed eight criteria that, if used, would result in indicators that would serve each of the five common evaluation paradigms. Their criteria feel intuitively reasonable to me; if you like these here’s the reference so you can give them full credit for their thinking (Watkins, B. & van den Heever, N. J. (2017, November). Identifying indicators for assessment and evaluation: A burdened system in need of a simplified approach. Paper presented at the meeting of the American Evaluation Association, Washington, DC.).  Their proposed criteria are:

  1. Comparability over time
  2. Comparability over space, culture, projects
  3. Standardized data collection/computation
  4. Intelligibility
  5. Sensitivity to context
  6. Replicability/objectivity
  7. Scalability and granularity
  8. Cost for one data point

Propensity Score Matching

In a very different vein, is the issue of how best to design an evaluation and analysis plan so that outcomes can be causally attributed to the program.  The gold standard for this is a randomized control trial, but in many situations that’s impractical, if not impossible to execute.  As a result, there is much thinking in the evaluation world about how to statistically compensate for a lack of random assignment of participants to treatment or control groups.

This year, there were a number of sessions on propensity score matching, which is a statistical technique used to select a control group that best matches a treatment group that was not randomly assigned.  For example, if we are evaluating a program that was offered to students who were already assigned to a particular health class, and we want to find other students in their grade who match them at baseline on important demographic and academic variables so that we can compare those matched students (i.e., “the controls”) to the students who got the program (i.e., “the treatment group”), propensity score matching can be used to find that set of best-matched students from the other kids in the grade who weren’t in the class with the program.

Propensity score matching is not a particularly new idea, but there are a variety of ways to execute it, and like all statistical techniques, requires some expertise to implement appropriately.  A number of sessions at the conference provided tutorials and best practices for using this analysis method.

In our work, one of the biggest challenges to using this method is simply the need to get data on demographic and outcome measures for non-participants, let alone getting all of the variables that are relevant to the probability of being a program participant.  But, assuming the necessary data can be obtained, it is still important to be aware that there are many options for how to develop and use propensity scores in an outcome analysis, there is some controversy about the effectiveness and appropriateness of various methods, and on top of it all, the process of finding a balanced model feels a lot like p-hacking, as does the potential for trying scores from multiple propensity score models in the prediction model.  So, although it’s a popular method, users need to understand the assumptions and limitations of the method, and do their due diligence to ensure they’re using it appropriately.

~

All-in-all, we had an interesting learning experience at AEA this year and brought back some new ideas to apply to our work. Attending professional conferences is a great way to stay on top of developments in the field and get energized about our work.


Breaking down the wall between quant and qual

Recently we had a project involving a large survey with numerous open-end questions. Taking the divide and conquer approach, it was all hands-on deck to quickly code the thousands of responses. As a qualitative researcher, coding survey responses can feel like a foreign process and I often found myself overthinking both my codes and the nuanced content of responses. When I had finished, I typed up a summary of my findings and even pulled out a few ‘rock star’ quotes that illustrated key trends and takeaways. The experience left me wondering—why is content analysis of survey open-ends not more common? It is qualitative data after all.

Simply put, the purpose of content analysis is the elicitation of themes or content in a body of written or other pointed media. Like many qualitative approaches, it does not produce numerical measurements; rather, content analysis measures patterns and trends in the data. Incorporating qualitative analysis techniques such as content analysis into traditionally quantitative studies better contextualizes survey results and produces greater insights.

Imagine a classic branding survey where participants are asked sentiment questions such as ‘what is your impression of Brand X’? Often, the questions are designed as a Likert scales with defined categories (e.g. very positive, somewhat positive, neutral, etc.). While this provides general insight into attitudes and impressions of the brand, it does not necessarily highlight the broader insights or implications of the research findings. When Corona does a brand survey, we regularly ask an open-end question for qualitative content analysis as a follow-up, such as ‘What specifically about Brand X do you find appealing?’ or, conversely, ‘What specifically about Brand X do you find unappealing?’. Inclusion of qualitative follow-up provides additional framing to the quantitatively designed Likert scale question and augments insights. Additionally, if the survey shows a sizeable negative sentiment towards a brand, incorporating qualitatively designed open-ends can uncover issues or problems that were unknown prior to the research, and perhaps outside of the original research scope.

Historically, quantitative and qualitative research has been bifurcated, both in design and in analysis. However, hybrid approaches such as the one described above are quickly gaining ground and the true value is being realized. Based on our experience here at Corona, for content analysis to be effectively done in a quantitative-dominant survey, it is best for this to be decided early in the research design phase.

A few things to keep in mind when designing open-ended questions for content analysis:

  • Clearly define research objectives and goals for the open-end questions that will be qualitative analyzed.
  • Construct questions with these objectives in mind and incorporate phrasing the invites nuanced responses.
  • Plainly state your expectations for responses and if possible, institute character minimums or maximums as needed.

In addition to the points mentioned above, it is important to note that there are some avoidable pitfalls. First off, this method is best suited for surveys with a smaller sample size, preferably under 1000 respondents. Also, the survey itself must not be too time intensive. It is well known that surveys which extend beyond 15 to 20 minutes often lead to participants dropping out or not fully completing the survey. Keep these time limits in mind and be selective about the number of open-ends to be include. Lastly, it is important to keep the participant engaged in the survey. If multiple open-ends are incorporated in to the survey, phrase the questions differently or ask them about different topics in an effort to keep participants from feeling as  though they are repeating themselves.

In an ideal world, quantitative and qualitative approaches could meld together seamlessly, but we all know this isn’t an ideal world. Time constraints, budgets, research objectives are just a handful of reasons why a hybrid approach such as the one discussed here may not be the right choice. If it is though, hybrid approaches provide participants an opportunity to think deeper about the topic at hand and also can create a sense of active engagement between the participant and the end-client. In other words—they feel like their voice is being heard and the end-client gains a better understanding of their customer.


The Four Cornerstones of Survey Measurement: Part 2

Part Two: Reliability and Validity

The first blog in this series argued that precision, accuracy, reliability, and validity are key indicators of good survey measurement.  It described precision and accuracy and how the researcher aims to balance the two based on the research goals and desired outcome.  This second blog will explore reliability and validity.

Reliability

In addition to precision and accuracy, (and non-measurement factors such as sampling, response rate, etc.) the ability to be confident in findings relies on the consistency of survey responses. Consistent answers to a set of questions designed to measure a specific concept (e.g., attitude) or behavior are probably reliable, although not necessarily valid.  Picture an archer shooting arrows at a target, each arrow representing a survey question and where they land representing the question answers. If the arrows consistently land close together, but far from the bulls-eye, we would still say the archer was reliable (i.e., the survey questions were reliable). But being far from the bulls-eye is problematic; it means the archer didn’t fulfill his intentions (i.e., the survey questions didn’t measure what they were intended to measure).

One way to increase survey measurement reliability (specifically, internal consistency) is to ask several questions that are trying to “get at” the same concept. A silly example is Q1) How old are you, Q2) how many years ago were you born, Q3) for how many years have you lived on Earth. If the answers to these three questions are the same, we have high reliability.

The challenge with achieving high internal reliability is the lack of space on a survey to ask similar questions. Sometimes, we ask just one or two questions to measure a concept. This isn’t necessarily good or bad, it just illustrates the inevitable trade-offs when balancing all indicators.  To quote my former professor Dr. Ham, “Asking just one question to measure a concept doesn’t mean you have measurement error, it just means you are more likely to have error.”

Validity

Broadly, validity represents the accuracy of generalizations (not the accuracy of the answers). In other words, do the data represent the concept of interest? Can we use the data to make inferences, develop insights, and recommend actions that will actually work? Validity is the most abstract of the four indicators, and it can be evaluated on several levels.

  • Content validity: Answers from survey questions represent what they were intended to measure.  A good way to ensure content validity is to precede the survey research with open-ended or qualitative research to develop an understanding of all top-of-mind aspects of a concept.
  • Predictive or criterion validity: Variables should be related in the expected direction. For example, ACT/SAT scores have been relatively good predictors of how students perform later in college. The higher the score, the more likely the student did well in college.  Therefore, the questions asked on the ACT/SAT, and how they are scored, have high predictive validity.
  • Construct validity: There should be an appropriate link between the survey question and the concept it is trying to represent.  Remember that concepts, and constructs, are just that, they are conceptual. Surveys don’t measure concepts, they measure variables that try to represent concepts.  The extent that the variable effectively represents the concept of interest demonstrates construct validity.

High validity suggests greater generalizability; measurements hold up regardless of factors such as race, gender, geography, or time.  Greater generalizability leads to greater usefulness because the results have broader use and a longer shelf-life.  If you are investing in research, you might as well get a lot of use out of it.

This short series described four indicators of good measurement.  At Corona Insights, we strive to maximize these indicators, while realizing and balancing the inevitable tradeoffs. Research survey design is much more than a list of questions, it’s more like a complex and interconnected machine, and we are the mechanics that are working hard to get you back on the road.


Keeping it constant: 3 things to keep in mind with your trackers

When conducting a program evaluation or customer tracker (e.g., brand, satisfaction, etc.), we are often collecting input at two different points in time and then measuring the difference. While the concept is straightforward, the challenge is keeping everything as consistent as possible so we can say that the actual change is NOT a result of how we conducted the survey.

Because we can be math nerds sometimes, take the following equation:

A change to any part of the equation to the left of the equal sign will result in changes to your results. Our goal then is to keep all the survey components consistent so any change can be attributed to the thing you want to measure.

These include:

  1. Asking the same questions
  2. Asking them the same way (i.e. research mode)
  3. And asking them to a comparable group

Let’s look at each of these in more detail.

Asking the same questions

This may sound obvious, but it’s too easy to have slight (or major) edits creep into your survey. The problem is, we then cannot say if the change we observed between survey periods is a result of actual change that occurred in the market, or if the change was a result of the changing question (i.e., people interpreted the question slightly differently).

Should you never add or change a question? Not necessarily. If the underlying goal of that question has changed, then it may need to be updated to get you the best information going forward. Sure, you may not be able to compare it looking back, but getting the best information today may outweigh the goal of measuring change on the previous question.

If you are going to change or add questions to the survey, try to keep them at the end of the survey so the experience of the first part of the survey is similar.

Asking them the same way

Just as changing the actual question can cause issues in your tracker, changing how you’re asking them can also make an impact. Moving from telephone to online, from in-person to self-administered, and so on can cause changes due to how respondents understand the question and other social factors. For instance, respondents may give more socially desirable answers when talking to a live interviewer than they will online. Reading a question yourself can lead to a different understanding of the question than when it is read to you.

 

Similarly, training your data collectors with consistent instructions and expectations makes a difference for research via live interviewers as well. Just because the mode is the same (e.g., intercept surveys, in-class student surveys, etc.) doesn’t mean it’s being implemented the same way.

Asking a comparable group

Again, this may seem obvious, but small changes in who you are asking can impact your results. For instance, if you’re researching your customers, and on one survey you only get feedback from customers who have contacted your help line, and on another survey you surveyed a random sample of all customers, the two groups, despite both being customers, are not in fact the same. The ones who have contacted your help line likely had different experiences – good or bad – that the broader customer base may not have.

~

So, that’s all great in theory, but we recognize that real-life sometimes gets in the way.

For example, one of the key issues we’ve seen is with changing survey modes (i.e., Asking them the same way) and who we are reaching (i.e., Asking a comparable group). Years ago, many of our public surveys were done via telephone. It was quick and reached the majority of the population at a reasonable budget. As cell phones became more dominant and landlines started to disappear, while we could have held the mode constant, the group we were reaching would change as a result. Our first adjustment was to include cell phones along with landlines. This increased costs significantly, but brought us back closer to reaching the same group as before while also benefiting from keeping the overall mode the same (i.e., interviews via telephone).

Today, depending on the exact audience we’re trying to reach, we’re commonly combining modes, meaning we may do phone (landline + cell), mail, and/or online all for one survey. This increases our coverage (http://www.coronainsights.com/2016/05/there-is-more-to-a-quality-survey-than-margin-of-error/), though it does introduce other challenges as we may have to ask questions a little differently between survey modes. But in the end, we feel it a worthy tradeoff to have a quality sample of respondents. When we have to change modes midway through a tracker, we work to diminish the possible downsides while drawing on the strengths to improve our sampling accuracy overall.


The Four Cornerstones of Survey Measurement: Part 1

Part One: Precision and Accuracy

Years ago, I worked in an environmental lab where I measured the amount of silt in water samples by forcing the water through a filter, drying the filters in an oven, then weighing the filters on a calibrated scale. I followed very specific procedures to ensure the results were precise, accurate, reliable, and valid; the cornerstones of scientific measurement.

As a social-science researcher today, I still use precision, accuracy, reliability, and validity as indicators of good survey measurement. The ability of decision makers to draw useful conclusions and make confident data-driven decisions from a survey depends greatly on these indicators.

To introduce these concepts, I’ll use the metaphor of figuring out how to travel from one destination to another, say from your house to a new restaurant you want to try. How would you find your way there? You probably wouldn’t use a desktop globe to guide you, it’s not precise enough. You probably wouldn’t use a map drawn in the 1600’s, it wouldn’t be accurate. You probably shouldn’t ask a friend who has a horrible memory or sense of direction, their help would not be reliable. What you would likely do is “Google It,” which is a valid way most of us get directions these days.

This two-part blog will unpack the meaning within these indicators. Let’s start with precision and accuracy. Part-two will cover reliability and validity.

Precision

Precision refers to the granularity of data and estimates. Data from an open-ended question that asked how many cigarettes someone smoked in the past 24 hours would be more precise than data from a similar closed-ended question that listed a handful of categories, such as 0, 1-5, 6-10, 11-15, 16 or more. The open-ended data would be more precise because it would be more specific, more detailed. High precision is desirable, all things being equal, but there are often “costs” associated with increasing precisions, such as increased time to take a survey, that might not outweigh the benefit of greater precision.

Accuracy

Accuracy refers to the degree that the data are true. If someone who smoked 15 cigarettes in the past 24 hours gave the answer ‘5’ to the open-ended survey question, the data generated would be precise but not accurate. There are many possible reasons for this inaccuracy. Maybe the respondent truly believed they only smoked five cigarettes in the past 24 hours, or maybe they said five because that’s what they thought the researcher wanted to hear. Twenty-four hours may have been too long of a time span to remember all the cigarettes they smoked, or maybe they simply misread the question. If they had answered “between 1 and 20,” the data would have been accurate, because it was true, but it wouldn’t have been very precise.

Trade-offs

Many times, an increase in precision can result in a decrease in accuracy, and vice-versa. Decision makers can be confident in accurate data, but it might not be useful. Precise data typically give researchers more utility and flexibility, especially in analysis. But what good is flexible data if there is little confidence in its accuracy. Good researchers will strive for an appropriate balance between precision and accuracy, based on the research goals and desired outcomes.

Now that we have a better understanding of precision and accuracy, the second blog in this series will explore reliability and validity.