RADIANCE BLOG

Category: Analytics

Co-creating Insights through Participatory Research

“We both know some things; neither of us knows everything. Working together we will both know more and we will both learn more about how to know”

~Patricia Maguire, in Doing Participatory Research

Do you need to hear from more than the usual suspects?  Do you want your research to engage and empower people, rather than just study them like lab rats? Are you willing to step out of your comfort zone to create transformational research that provokes action?

If you answered yes to these questions, you might be interested in embarking in participatory research…and Corona can help!

Participatory research is a collaborative research approach that generates shared knowledge.  The intention is to research with and for participants, rather than about them, and the process is as valuable as the results.

At its heart, participatory research involves engaging with a group of people, typically those who have experienced disenfranchisement, alienation, or oppression. Researchers are participants and participants are researchers; the research questions, methodologies, and analyses are co-created. Embedded in the process are cycles of new questions, reflections, negotiations, and research adjustments. In participatory research, knowledge and understanding are generated rather than discovered.

Language and context are keys to success. The language of participatory research can be informal, personal, and relative to the situation. Safe-spaces are created so that participants and researchers can speak freely and honestly, allowing for greater authenticity and reflection of reality. The contexts of the research, including the purpose, geography, and even funding source and sponsors, are made overt and are relevant to the interpretation.

Participatory research is not the most efficient process; it takes extra time to mutually align project goals and specify research questions.  Additionally, participatory research does not assume that the results are unbiased.  Indeed, it asserts that social research cannot avoid the bias that too often manifests unconsciously and goes unacknowledged. Instead, participatory researchers describe and accept their biases, drawing conclusions through this lens.

Why conduct participatory research?  One reason is that the risks are mutual and the results benefit the participants just as much as they benefit the research conductor/sponsor. Results can also provoke changes such as increased equity, community empowerment, and social emancipation. When done appropriately, participatory research gives a strong and authentic voice to the participants, and hopefully, a greater awareness of their situation will lead to positive transformational changes.


Subpopulations in Research

As I’m sure you know, we do a lot of survey research here at Corona. When we provide the results, we try to build the most complete picture for our clients, and that means looking at the data from every which way possible. One of the most effective ways to do this is by looking at subpopulations.

What is a subpopulation?

A subpopulation is essentially a fraction or part of the overall pool of the population you are surveying. A subpopulation can be defined many ways. For example, some of the most common subpopulations to examine in research are gender (e.g. male and female), age (e.g. <35, 35-54, 55+), race/ethnicity, location, etc.  You can effectively define a subpopulation using whatever criteria you like; for instance, you can have a subpopulation that is based on what type of dessert is preferred – those who like cake and heathens those who don’t like cake.

What does it mean to have subpopulations?

When you examine survey results by subpopulations, at a basic level respondents are simply split into the subpopulations or groups (commonly called breakouts) you defined. After being broken into these groups, the results for the survey are compiled for each individual group separately. For example, take the following survey question:

  1. About how many hours a week do you watch sports?
    1. 1 hour or less
    2. 2 to 4 hours
    3. 5 to 7 hours
    4. 8 hours or more

The results would typically have two components: top-level results (results compiled for all respondents to the survey) and breakouts (results by group for any subpopulations that have been defined). For the above example question, the results might look something like this:

In this completely made-up example, you can see the benefit of having subpopulations. While 21 percent of overall respondents watched five to seven hours of sports a week, you can see that male respondents accounted for a hefty chunk, as 26 percent of males watch that much sports, compared to only 16 percent of females. Breaking out questions by subpopulations allows you to more closely examine data and assists in finding those gems of information.

Getting the most out of your survey

Being prepared to utilize subpopulations in your survey analysis means putting your best foot forward and maximizing your investment. Many subpopulations are constructed using questions commonly asked in surveys (gender, age, etc.), but some questions might not otherwise be asked without the foresight of planning to break respondents into subpopulations. For example, a nonprofit might be building a questionnaire to survey their patrons on their messaging; by simply asking if a respondent has donated to the organization, they can examine survey results of donors separately from all patrons. The survey can now not only better inform messaging for the organization overall, but also allows them to better target and communicate to donors, specifically.

Conducting a survey can be a challenging experience, so the more you can get out of a single survey, the better. The next time you are designing a survey, ask around your workplace to see if a few questions can be added to better utilize the information you’re collecting. Now you’re one step closer to conducting the perfect survey!


Does This Survey Make Sense?

It’s pretty common for Corona to combine qualitative and quantitative research in a lot of our projects.  We will often use qualitative work to inform what we need to ask about in qualitative phases of the research, or use qualitative research to better understand the nuances of what we learned in the quantitative phase.  But did you know that we can also use qualitative research to help design quantitative research instruments through something called cognitive testing?

The process of cognitive testing is actually pretty simple, and we treat it a lot like a one-on-one interview.  To start, we recruit a random sample of participants who would fit the target demographic for the survey.  Then, we meet with the participants one-on-one and have them go through the process of taking the survey.  We then walk through the survey with them and ask specific follow-up questions to learn how they are interpreting the questions and find out if there is anything confusing or unclear about the questions.

In a nutshell, the purpose of cognitive testing is to understand how respondents interpret survey questions and to ultimately write better survey questions.  Cognitive testing can be an effective tool for any survey, but is particularly important for surveys on topics that are complicated or controversial, or when the survey is distributed to a wide and diverse audience.  For example, you may learn through cognitive testing that the terminology you use internally to describe your services are not widely used or understood by the community.  In that case, we will need to simplify the language that we are using in the survey.  Or, you may find that the questions you are asking are too specific for most people to know how to answer, in which case the survey may need to ask higher-level questions or include a “Don’t Know” response option on many questions.  It’s also always good to make sure that the survey questions don’t seem leading or biased in any way, particularly when asking about sensitive or controversial topics.

Not only does cognitive testing allow us to write better survey questions, but it can also help with analysis.  If we have an idea of how people are interpreting our questions, we have a deeper level of understanding of what the survey results mean.  Of course, our goal is to always provide our clients with the most meaningful insights possible, and cognitive testing is just one of the many ways we work to deliver on that promise.


Predictable Unknowns

Have you ever needed to know what the future will look like?

To create great strategic plans, our clients need to understand what their operating environment will look like in five, ten, or thirty years.  They want to know how the population, jobs, markets, homes, and infrastructure are expected to change. We help these clients by providing reliable projections, often through analysis of preexisting data.  Although we have no crystal ball that tells us exactly what the future holds, we can point clients in the right direction.  Here are a few ways we look at trends and projections to help solve our client’s problems.

Patterns from the Past:

We frequently commence research projects by reviewing the current population profile and looking for patterns from the past that show how we got here.  A common way we do this is by mining demographic data from the U.S. Census. We access tons of demographic estimates across a wide variety of geographies, such as zip codes, census tracts, towns, cities, counties, metro areas…you get the idea.  The amount of demographic information available is amazing.  While examining demographics is a cost-effective way to start to understand an area or population, there are critical limitation to demographics.  Data are a year or two years old by the time they are available to the public. More importantly, there is a problem assuming the future will represent the past. Demographics can get us started, but when we want to peer into the future, we move to other sources.

Forecasting the Future

Several data sources project key variables such as population, jobs, age profiles, homes, and transportation.  A good source for population projections in Colorado is the State Demography Office.  From this website, we can align previously collected population data with future projections to provide a nice continuation from past, to current, to future population trends.  Further, we can break apart the population trend with age profiles that show changes by generation.  We can create such analyses at the state or county levels or any region comprised of counties.  For example, below is the population of the Denver Metropolitan Statistical Area (MSA), which is comprised of ten counties (Adams, Arapahoe, Broomfield, Clear Creek, Denver, Douglas, Elbert, Gilpin, Jefferson, and Park).  You can see the rate of growth in Denver Metro is projected to steadily slow, although remain positive, from 2015 to 2050.

Sometimes our clients are more interested in understanding the future of job growth, including how many jobs are expected, what type of jobs, and where they will be located.  We use a few different sources to answer these questions.  If we are working in Colorado, we pull down job forecasts by county or region.  For example, here is the forecast for total jobs and job growth rate for Larimer County, Colorado.

Other times, our clients would like more detail than total jobs.  We pull occupation forecast data from the Colorado Department of Labor and Employment.  This website provides current and projected occupations by various geographies including counties and metro areas.  For example, a law school marketing department might be interested in projections of the number of lawyers working in various areas in Colorado.  The following table shows that the growth rate of lawyers is expected to be slightly higher in Denver-Aurora Metropolitan Area than in Boulder or Colorado Springs.

These are just a few examples of how we have helped our clients look at the past as well as understand what the future might look like. Of course, many clients have questions that are not so easily answered by secondary data that is already available.  In these cases, we build our own models to measure and predict all sorts of estimates, such as demand for child care, business relocation, and commuting patterns.

If you need to understand what the future might bring to your organization, give us a call and we will see how Corona can help solve your problem.


The Race to the Rockies – Colorado Migration Part 1

I admit, I am one of many in the horde of people who have recently migrated to Colorado. Indeed, there are tens of thousands of us moving here each year at one of the highest rates in the country. But who really is “us”? Who are the people moving into Colorado in droves? This blog will be part 1 of a 2-part blog series exploring who is moving into Colorado. In this first blog, we’ll be looking at generational migration patterns over time and migration by race and ethnicity.

Generational Movement in Colorado

Utilizing the Census Bureau’s Population Estimates for 2010 to 2015, I broke this question down by two basic demographics, age and sex. In the following graph, generations were grouped roughly using Pew Research Center’s definition for each generation1 Each generation has net migration (those moving to Colorado minus those who left) graphed from 2011 through 2015.

Unsurprisingly, we see that net migration has been positive for each generation since 2013. Most recently, each generation in Colorado had a net positive increase of 16,000 or more. Millennials have been moving in at the highest rate, with over 30,000 having moved into Colorado between 2014 and 2015. Baby Boomers also prefer moving into Colorado rather than out, with a net migration over 25,000.

As a non-native, what I find most interesting is how this has changed over time. From 2010 to 2012 we see more Generation Xers moving out of Colorado, with a net migration loss of nearly 10,000 having left the state in 2011. It wasn’t until 2013 where we saw more moving in, with a large uptick (over 10,000 more) in 2015. Also unexpected was the increase of nearly 40,000 Baby Boomers that occurred in 2011.

Millennials, on the other hand, have been consistently moving into Colorado, with 2015 seeing a strong increase in the number moving into the state. They are also the only generation which show a substantial difference between genders, with about 4,000 more males moving into Colorado than females in 2015. Needless to say, as a male Millennial who has moved into Colorado, I am thankful to already be married.

Race and Ethnicity Movement in Colorado

Using the same data source (Population Estimates), I looked at those moving into Colorado from 2010 to 2015 by race and Hispanic/non-Hispanic ethnicity.

The total percentage change in population from 2010 to 2015 was 8.5%, making Colorado the third ranking state by population growth rate since 2010. When looking at percentages, many Native Hawaiian and Other Pacific Islanders appear to be moving into the state, though 2015 saw an increase of only about 2,000 since 2010. Many Asians have also been moving into the state, with there being a 22 percent increase equating about 30,000 new residents. Those who identify as multi-racial have moved into Colorado at similar numbers.

Colorado also has a large Hispanic population. In fact, we are one of nine states that have a Hispanic population of over 1 million. Between 2010 and 2015, we saw an increase in Hispanic population of just over 12 percent, with an additional 125,000. The Hispanic population currently represents approximately 21% of Colorado’s total population.

Now that we have a better idea of the age, race, and ethnicity of those moving into Colorado, we can get a better idea of some of the characteristics behind our newest residents. In my second and final blog on the topic, I will explore these various characteristics to help complete the picture of these new Coloradans.

1 Due to the data available in the Population Estimates tables, some generations in the graph includes ages +/- 1 or 2 years from Pew’s definition, and the Silent Generation was combined with the Greatest Generation. The graph also doesn’t include those 19 and younger, though the age cutoff between Millennials and the following generation has not yet been determined.


Do you have kids? Wait – let me restate that.

Karla Raines and I had dinner with another couple last week that shares our background and interest in social research.  We were talking about the challenges of understanding the decisions of other people if you don’t understand their background, and how we can have biases that we don’t even realize.

It brought me back to the topic of how we design and ask questions on surveys, and my favorite example of unintentional background bias on the part of the designer.

A common question, both in research and in social conversations, is the ubiquitous, “Do you have kids?”  It’s an easy question to answer, right?  If you ask Ward and June Cleaver, they’ll immediately answer, “We have two, Wally and Beaver”.  (June might go with the more formal ‘Theodore’, but you get the point.)

When we ask the question in a research context, we’re generally asking it for a specific reason.  Children often have a major impact on how people behave, and we’re usually wondering if there’s a correlation on a particular issue.

But ‘do you have kids’ is a question that may capture much more than the classic Wally and Beaver household.  If we ask that question, the Cleaver family will answer ‘yes’, but so will a 75 year-old who has two kids, even if those kids are 50 years old and grandparents of their own.  So ‘do you have kids’ isn’t the question we want to ask in most contexts.

What if we expanded the question to ‘do you have children under 18’?  It gets a bit tricky here if we put ourselves in the minds of respondents, and this is where our unintentional background bias may come into play.  Ward and June will still answer yes, but what about a divorced parent who doesn’t have custody?  He or she may accurately answer yes, but there’s not a child living in their home.  Are we capturing the information that we think we’re capturing?

And what about a person who’s living with a boyfriend and the boyfriend’s two children?  Or the person who has taken a foster child into the home?  Or the grandparent who is raising a grandchild while the parents are serving overseas?  Or the couple whose adult child is temporarily back home with her own kids in tow?

If we’re really trying to figure out how children impact decisions, we need to observe and recognize the incredible diversity of family situations in the modern world, and how that fits into our research goal.  Are we concerned about whether the survey respondent has given birth to a child?  If they’re a formal guardian of a child?  If they’re living in a household that contains children, regardless of the relationship?

The proper question wording will depend on the research goals, of course.  We often are assessing the impact of children within a household when we ask these questions, so we find ourselves simply asking, “How many children under the age of 18 are living in your home?”, perhaps with a followup about the relationship where necessary.  But It’s easy to be blinded by our own life experiences when designing research, and the results can lead to error in our conclusions.

So the next time you’re mingling at a party, we suggest not asking “Do you have kids”, and offer that you should instead ask, “How many children under the age of 18 are living in your home?”  It’s a great conversation starter and will get you much better data about the person you’re chatting with.


Car vs. Bike

I like to ride my bike whenever I get a chance.  I ride to the store, to the park, to take my son to preschool, and sometimes just for fun.  While I’ve never been in an accident with a moving car, I’ve witnessed several bike vs. car accidents, and its something I want to avoid.

Do you know what Denver neighborhoods have the most bike vs. car accidents?  I wanted to find out.  Luckily, we can use special mapping software to analyze existing accident data to determine where these types of accidents are statistically more or less likely to happen.  Here’s how I did it.

  1. I downloaded all traffic accidents in the City and County of Denver from the last five years (accessed here).
  2. I filtered to all hit-and-run accidents involving a bike, a total of 345 incidents in their database.
  3. I mapped the accident locations using our mapping software.
  4. I added the City and County of Denver boundary to the map (I excluded the DIA neighborhood because it is disproportionally large compared to the area of bikeable road).
  5. To find locations around Denver with statistically higher or lower clusters of bike vs. car hit-and runs, I ran a hot-spot analysis using location point-counts and a fish-net grid.

The result was a large hot spot of accidents in Central Denver, stretching from Baker neighborhood to Sunnyside, and from Sloans Lake to Colorado Blvd.  It’s clear that a lot of accidents happen on Broadway/Lincoln Street and a lot are on Colfax Avenue.  Many Denver neighborhoods fall into the neutral category (light yellow), meaning accidents happen here, but we cannot find any clusters where accidents are statistically more or less likely.  If you look at the edges of the city limits, we find a handful of neighborhoods overlapping cold-spot clusters.  Specifically, Fort Logan (just south of Bear Valley), Hampden, University Hills, North Stapleton, and Gateway/Green Valley Ranch are all neighborhoods where hit-and-run bike vs. car accidents are statistically uncommon, according to this analysis.

While its helpful to know that central Denver has a lot more accidents than the surrounding neighborhoods, this isn’t surprising considering a lot more bike and cars are simultaneously navigating around Denver’s core.  I wanted to get more specific and useful insights, so I re-ran the hotspot analysis focusing just on the area east of Federal, south of 49th, west of Colorado, and north of Alameda.  Downtown Denver still lights up as a hot spot for hit-and-run bike accidents, but again not to the scale that I found useful.  So I zoomed in once more, this time exploring the area within Blake, Speer, 6th, and Downing.

The result: the greatest concentration of these accidents happen in the Central Business District (especially along 20th Street) and east-west along Colfax Avenue and 16th Street.  If I was in charge of reducing the number of car-bike accidents in Denver, I would prioritize these two areas.  Of course this analysis can’t suggest what type of actions would reduce accidents (e.g., new rules, more enforcement, better education), but it gives us a place to start.

I am interested in biking, so this dataset was of interest to me.  What interests you and your organization?  Do you want to know if your challenges (e.g., crime, complaints, reduced sales) or opportunities (complements, desired behavior, brand recognition) are really clustered? If so, give us a call and we can discuss how mapping may be a good way to gain a new perspective and help answer your important questions.

 

 


Nonprofit Data Heads

Here at Corona, we gather, analyze, and interpret data for all types of nonprofits.  While some of our nonprofit clients are a little data shy, many are data-heads like us!  Indeed, several nonprofits (many of which we have worked for or partnered with) have developed amazing websites full of easy to access datasets.

Here are 4 of my favorite nonprofit data sources…check them out!!

The Data Initiative at the Piton Foundation

Not only do they sponsor Mile High Data Day, but the Piton Foundation produces a variety of user friendly data interfaces.  I really like the creative ways they allow website visitors to explore data–not just static pie and bar charts. Instead, their interface is dynamic and extremely customizable. While their community facts tool pulls most (but not all) of its data from the US Census, this tool is very easy and fun to use.  Further, they have already defined and labeled neighborhoods across the Denver Metro area, making it easy for users to compare geographies without trying to aggregate census tract or block group numbers. This is an invaluable feature for data users who don’t have access to GIS. I also appreciate the option to display margin of error on bar charts when its available.

Highlights:

  • Easy to use from novice to expert data user
  • Data available by labeled neighborhood
  • 7-County Denver Metro focus

Explore

OpenColorado

With over 1,500 datasets, OpenColorado is a treasure trove of raw data.  While this site doesn’t have a fancy user interface, it does provide access to data in many different file types, making it a great website for the intermediate to advanced data user with access to software such as GIS, AutoCAD, or Google Earth.  Most data on OpenColorado is from Front Range cities (e.g., Arvada, Boulder, Denver, Westminster) and counties (e.g., Boulder, Denver, Clear Creek), but unfortunately it is far from a comprehensive list, so you’d need to look elsewhere if your searching for information from Arapahoe County, for example.

There are over 200 datasets specific to the City and County of Denver.  I opened a few that caught my eye, including the City’s “Checkbook” dataset that shows every payment made from the City (by City department) to payees by year.  I give kudos to Denver and OpenColorado for facilitating this type of fiscal transparency.  I also downloaded a dataset (CSV) of all Denver Police pedestrian and vehicle stops for the past four years, which included the outcome of each stop along with the address, latitude and longitude.  For a GIS user, this is especially helpful if you want to search for patterns of police activity compared to other social and geographic factors.  Even without access to spatial software, this dataset is useful because it includes neighborhood labels.  I created a quick pivot table in Excel to see the top ten neighborhoods for cars being towed (so don’t park your car illegally in these neighborhoods).

Highlights:

  • Tons of raw data
  • Various file types, including shapefiles and geodatabases that are compatible with GIS, and KML files that are compatible with GoogleEarth
  • Search for data by geography, tags, or custom search words

Kids Count from the Colorado Children’s Campaign

Kids Count is a well-respected data resource for all things kids.  Each year, the Colorado Children’s Campaign (disclaimer, they are also our neighbor, working just two floors below us) produces the Kids Count in Colorado report, which communicates important child well-being indicators and indices statewide and by county when available.  The neat thing about Kids Count is that it’s also a national program, so you can compare how indicators in a specific county compare to the state and nation. In addition to the full report available as a PDF, you can also interact with a state map and point and click to access a summary of indicators by county.  Mostly, their data is not available in raw form, but their report does explain how they calculated their estimates and provides tons of contextual information that makes their key findings much more insightful.

Highlights:

  • Compare county data to state and national trends
  • Reports include easy to understand analysis and interpretation of data
  • Learn about trends overtime and across demographic groups

Outdoor Foundation

If you’re looking for information about outdoor recreation of any type in any state, there is probably an Outdoor Foundation report that has the data you’re seeking.  Based in Boulder, Colorado, the Outdoor Foundation’s most common reports communicate studies of participation rates by activity type, both at a top level and also by selected activity types such as camping, fishing, and paddle sports (haven’t yet heard of stand-up paddle boarding?  It’s one of the fastest growing in terms of participation).  The top-line reports show trends over the past ten years, while the more detailed Participation Reports break out participation, and other factors such as barriers to participation, by various demographics.  Multiple other special reports, focusing on topics such as youth and technology, round out what’s available from this site.

The participation and special reports are helpful, but I’m most impressed with the Recreation Economy reports, which are available nationwide and within each state.  These reports estimate the economic contribution of outdoor recreation, including jobs supported, tax revenue, and retail sales.  For example, the outdoor recreation economy supported about 107,000 jobs in Colorado in 2013.  Unfortunately, the raw data is not available for further analysis, but the summary results are still interesting and helpful.

Explore:


Your Baby Is Increasingly Special and Unique, Apparently

It seems like when I’m in the mall and hear parents talking to their kids, I hear unusual names more and more often.  I’ve been developing a theory that parents are enjoying creativity more and valuing tradition less when that birth certificate rolls around, so in keeping with Corona Insights tradition, I thought I’d explore it a little more with some data analysis.  Off I went to the Social Security Administration website to put together a database of names.

I took a look at the most popular baby names in 2014, and compared them with those of 2004, 1994, 1984, and so on, all the way back to 1884.  Are unusual names more common in 2014?  It was straightforward to analyze, even if it meant sifting through a lot of data.

First, I looked at the 30 most popular names in each decade, and compared them to the total number of babies born.  If there’s a trend toward giving babies more unusual names, then we would expect a smaller concentration of babies with the most common names.

And wow, is that true, particularly for girls.  Let’s examine female names first.

If we look at the 30 most common female baby names, they constituted 41 percent of baby girl names in 1884.  There was some variation over the next 70 years but not much, ranging from 36 to 43 percent.  In 1954, the figure still stood at 40 percent for the girls destined to duck under their desks in the Cold War.  (As an important methodological note, recognize that these aren’t the same 30 names that were most common in 1884 – I adjusted the top 30 in each decade to reflect the most popular names of each particular decade.  This holds true throughout the analysis – I’m not tracking the popularity of a specific set of names, but rather I’m examining the likelihood of parents following popular trends in naming.)

But then something happened.  By 1964, the figure had declined to 32 percent.  It stayed roughly at that level until 1994, when it dropped further to 24 percent.  And since then, it had declined dramatically to 18 percent in 2004 and 16 percent in 2014.  The most common female names in 2014 are not very widespread.

If the most common names are less widely used, the next question is what other names are being used?  Are parents merely spreading their wings a little to other relatively well-recognized names, or are they pushing the boundaries of names?  To test this, I broadened my analysis and looked at the 100 most common female names.  In 1884, the most common 100 names covered 70 percent of girls born that year.  Moving forward in time, we see a very similar pattern that we saw for the top 30.  The figure declined slightly through 1954 (65%), and then those hippies from the 60s started becoming parents.  The figure dropped 58% by 1964, 51% by 1974, and continues to decline.  In 2014, the top 100 female names covered only 31 percent of births.

So how much dispersion do we actually have here?  Let’s look at the top 500 female names in each decade.  Most of us probably couldn’t even come up with 500 different names, so surely they’re covering almost the entire female population, right?

Well, that certainly used to be the case.  In 1884, the top 500 names covered 90 percent of the female baby population, and sure enough, it follows the same pattern as my earlier analyses.  The figure floated between 87 and 89 percent up until 1954, with remarkable consistency.  After all, who can’t find a favorite name among the top 500?

A lot of modern people, apparently.  The figure dropped to 85 percent in 1964, 75 percent in 1974, and currently stands at 58 percent.  Think about that for a moment.  42 percent of girls today have a name that does not fall into the top 500 most common names of her decade.

How does such a phenomenon happen?  One might speculate that this is due to a trend for adopting spelling variants.  Evelyn, for example, has branched into both Evelyn and Evelynn.  While I suspect that this is a significant factor, though, it appears to not be the main factor.  Instead, what we see among our top 500 names for 2014 is that many names appear to be newly created, or at least exceedingly rare in past decades because they’ve never appeared on a top-500 list until now.  Names like Brynlee and Cataleya and Myla and Phoenix have replaced more standard names.

Another theory that I can’t confirm at this point is that perhaps the United States has more diverse immigration these days, which could be producing a greater diversity of baby names.

Now let’s take a look at male names.

The first thing we see is that male names have historically been compressed relative to female names.  Looking across all of the decades since 1884, there are 1,286 male names that have placed in the top 500 in popularity, while there are 1,601 female names.  So are male names still more concentrated among fewer options?  We’ll repeat the analysis we just did for female names.

If we look at the 30 most common male baby names, they constituted 56 percent of baby boy names in 1884.  Per our earlier observation, this is much more concentrated than the 41 percent that we saw for females.  Similar to female trends, though, the proportion was relatively stable for decades afterwards, still standing at 54 percent in 1954.

The proportion began dropping in the 1960s, but was more stable than female names.  By 1964, the figure had declined gracefully to 51 percent, then 46 and 45 percent in the 1970s and 1980s.  The major decentralization for boys began in earnest in 1994, when the figure dropped to 35 percent, then 25 percent in 2004 and 20 percent in 2014, which isn’t notably higher than the female figure at this point.

An interesting difference by gender occurs when we examine the top 100 male names.  Whereas the distribution of female names was only minor through the 1950s, the distribution of male names actually decreased during that era.  In other words, the 100 most common names became slightly more concentrated for boys from 1884 to 1954.  Names became more dispersed through the 1920s, but the trend then reversed.  The proportion of boys with top-100 names dropped from 74 percent to 69 percent between 1884 and 1924, then rose back to 76 percent by 1954.  Perhaps during hard times of depression and war, parents get more conservative when naming boys.  Or maybe mothers working on World War II assembly lines became enamored with mass production.

However, from 1954 on, male names paralleled the diffusion of female names, dropping steadily to only 42 percent today.  This is still more concentrated than the 31 percent figure for females, but is far lower today than at any time in the past 130 years.

Finally, we look at the top 500 male names.  Have males had the same dispersal as females?

Contrary to other findings, male names were actually slightly more dispersed among the top 500 than female names in 1884.  The 500 most popular male baby names constituted 89 percent of births, compared to 90 percent for females.  But this discrepancy didn’t last long.  While the top 500 female names dispersed slightly from 1884 through 1954, male names actually converged, reaching a high point of 94 percent in 1954.  So while parents were practicing more creativity in female names over this period, they were becoming less daring with male names, choosing more often to follow popular trends.

However, creativity took hold soon thereafter.  Male convergence dropped slightly to 93 percent by 1964, then dropped steadily to a figure of 71 percent in 2014.  So again, parents are increasingly choosing uncommon names for their babies in modern times, though to a much greater extend with boys than with girls.  As with the girls, these boys’ names appear to be a combination of new spellings and also new names that have never before shown up in the top 500, names such as Daxton and Finnegan and Kasen.

This is all well and interesting, but what does it mean?

I’m first interested in the differences for women versus men?  Why do parents feel greater freedom to give a female child an uncommon name?  Do they feel a greater need to make a female child stand out from the crowd, and if so, why?  Are males better situated to succeed with a more traditional name, or do more men simply get named after their fathers or other family members?  Is the difference sexism in a very indirect form, or is there some logical reason?  I’m at a loss to come up with a logical reason that doesn’t reflect different attitudes toward girl babies than boy babies, but I’d love to hear your theories.

While the level of standardization differs between males and females, though, the patterns are moving in the same direction, and doing so strongly.  Why are babies – both boys and girls – increasingly likely to be given uncommon names?  One can surmise that it describes a society where individualism is being sought out more and more.  It may also point toward a lesser desire or obligation to pass down family names and a lesser emphasis on tradition.  So are we increasingly a nation of creative individualists or are we increasingly lost and rootless?  Or both?


DIY Tools: Network Graphing

Analyzing Corona’s internal data for our annual retreat is one of my great joys in life.  (It’s true – I know, I’m a strange one.)  For the last few years I’ve included an analysis of teamwork at Corona.  Our project teams form organically around interests, strengths, and capacity, so over the course of a year most of us have worked with everyone else at the firm on a project or two, and because of positions and other specializations some pairs work together more than others.  Visualizing this teamwork network is useful for thinking about efficiencies that may have developed around certain partnerships, and thinking about cross-training needs, and so on.  The reason I’m describing this is that I’ve tried out a few software tools in the course of this analysis that others might find useful for their data analysis (teamwork or otherwise).

For demonstration purposes, I’ve put together a simple example dataset with counts of shared projects.  In reality, I prefer to use other metrics like hours worked on shared projects because our projects are not all of equal size, and I might have worked with someone on one big project where we spent 500 hours each on it, and meanwhile I worked on 5 different small projects with another person where we logged 200 hours total.

But to keep it simple here, I start with a fairly straightforward dataset.  I have three columns: the first two are the names of pairs of team members (e.g., Beth – Kate, though I’m using letters here to protect our identities), and the third column has the number of projects that pair has worked on together in the last year.  To illustrate:

My dataset contains all possible staff pairs.  We have 10 people on staff, so there are 45 pairs.  I want to draw a network graph where each person is a vertex (or node), and the edge (or line) between them is thicker or thinner as a function of either the count of shared projects or the hours on shared projects.

This year I used Google Fusion Tables to create the network graph.  This is a free web application from Google.  I start by creating a fusion table and importing my data from a google spreadsheet.  (You can also import an Excel file from your computer or start with a blank fusion table and enter your data there.)  The new file opens with two tabs at the top – one called Rows that looks just like the spreadsheet I imported and the other called Cards that looks like a bunch of notecards each containing the info in one row of data.  To create the chart, I click the plus button to the right of those tabs and select “Add chart”.   In the new tab I select the network graph icon in the lower left, and then ask to show the link between “Name 1” and “Name 2” and weight by “Count of Shared Projects”.  It looks like this:

There are a few things I don’t love about this tool.  First, it doesn’t seem to be able to show recursive links (from me back to me, for example).  We have a number of projects that are staffed by a single person, and being able to add a weighted line indicating how many projects I worked on by myself would be helpful.  As it is, those projects aren’t included in the graph (I tried including rows in the dataset where Name 1 and Name 2 are the same, but to no avail).  As a result, the bubble sizes (indicating total project counts) for senior staff tend to be smaller on average, because more senior people have more projects where they work alone, and those projects aren’t represented.  Also, the tool doesn’t have options for 2D visualizations, so if you need a static image you are stuck with something like the above which is quite messy.

However, the interactive version is quite fun as you can click and drag the nodes to spin the 3D network around and highlight the connections to a particular person.

Another tool option that I’ve used in the past (and that is able to show recursive links and 2D networks) is an Excel template called NodeXL.  You can download the template from their website – you’ll need to install it (which requires a restart of your computer) – and then to use it just open your Windows start menu and type NodeXL. Instructions here.  I had some difficulties using it with Office 2016, but in Office 2013 it worked quite well.

If you try these out, share your examples with us!