RADIANCE BLOG

Category: Uncategorized

Big data is not required for big insights

 

You’ve probably heard a lot about Big Data. Big Data is going to change the world. Big Data is going to change how organizations are run. Big Data is going to clean our garage and walk our dog.

Big Data vs. Small/Medium Data

And maybe Big Data will do that–for big organizations. If you’re Coke or the Fermilab or the National Security Agency, your products or services or spying naturally produce a lot of data. Tapping into and harvesting massive streams of continuously created data, which is the hallmark of Big Data, is a natural thing to do.

But for many of us who work at small and medium organizations, Big Data is an abstraction at best. We simply don’t have massive, ongoing data streams that we can dive into to learn about our markets, our products or services, our clients, or our organization. We’re not big enough to have Big Data. But that doesn’t mean we can’t learn from the principles behind this phenomenon and use them to our advantage.

The hype around Big Data is the data itself: massive, previously unattainable and unimaginable rivers of data pouring through your world. But the philosophy behind Big Data is actually more important. It’s about looking around to identify where those data flows are in your own environment and then tapping into them to gain insight. You don’t need Big Data to do that. It works just as well with Medium Data or Small Data, especially if you’re a medium or small organization. We too can tap into and harvest data; it just flows in smaller quantities at our scale.

Three sources from which to harvest data

So how can we start this harvesting? What can we collect? There are three main sources to consider, though we’ll concentrate mostly on the third one.

First, you can harvest data that already exists outside your organization and is updated regularly. For example, there are lots of federal surveys and data collection efforts out there, and they’re very cost-effective to retrieve if you know about them. The right ones can help you understand your environment.

Second, you can create data via ongoing special efforts, such as conducting a regular survey or instituting a special data collection effort that is not part of your daily operations. This is a bit of a different concept from harvesting data, but still falls within the realm of a streaming source of data you can use for analysis.

But the third concept is the core of where Small Data can help you. It’s the implementation of a system to collect and harvest on an ongoing basis the data that we produce in our daily operations. Or more precisely, it’s data that we do or could produce easily in our daily operations.

Focus on the third

Thinking about that third concept, we all have opportunities to gather data on a daily basis. Most likely, we already do to some extent, even if it’s as simple as our client names or time sheets. So we’re already in the habit of creating data. But how are we using that data? As examples, I’m always surprised by the number of organizations that record their clients’ ZIP codes but then never use that data to examine their clients’ demographic and geographic makeup I’m also surprised by the number of nonprofits that don’t do research on their donor databases to identify their demographic sweet spots. This data is often collected but not often analyzed and leveraged to their full extent.

Beyond harvesting data that already exists, is there other data that we can efficiently build into our routines that can add value, either in understanding our clients, serving our clients, or improving our internal operations and efficiency? My company, for example, began tracking the origins of our consulting engagements a few years ago, and it has been very effective both in identifying inefficient means of marketing and effective ones. Our minor investment in that effort has paid itself back many times over.

There is value in data. We all know that. The key, of course, is to manage the process so you’re gathering valuable data in an efficient manner and then actually using it to your benefit. If you think about evaluating a program, a general rule of thumb is that 5 to 15 percent of the budget should be invested in evaluation, depending on the size of program. If you would make that investment in a program, why not follow the same rule for your organization as a whole? It may pay off handsomely.

 

 

This blog was originally posted on CausePlaent.org, “Where nonprofit leaders get smarter faster”.


The Power of Numbers

Numbers are an interesting thing. We all have an innate sense of quantities, but numbers are a culturally agreed upon format for representing those quantities. When we are trying to convey quantitative information to other people, the choice between “7 days” vs. “1 week” or “100 out of 300” vs. “1 out of 3” often feels like an aesthetic preference. However, the way we represent quantitative information to other people can have a large impact on how people think about what those numbers represent. As data scientists, it is our job to be aware of the ways in which people represent quantities to ensure that survey respondents’ perception of numbers accurately represents reality.

In creating survey instruments, I am always acutely aware of these factors. At the annual meeting of the Society for Personality and Social Psychology that I recently attended, I was fascinated by new research by Andrew White and Virginia S.Y. Kwan on the matter. They hypothesized that a small number of large units (e.g. 1 week) feels more distant and less probable than a large number of small units (e.g. 7 days). In one study, they described a fictional disease and then told people that someone in their community had been infected. The location of the infected person was either described as 1 mile away or as 5280 feet away. They then asked people how willing they were to receive the vaccination for the disease. People were much more willing to receive the vaccine when the location of the infected person was described as 5280 ft relative to when the location was described as 1 mile. This research is fascinating because the two conditions described the exact same amount of physical distance; however, the units used to describe the distance changed people’s perception of the threat.

I think I could hypothesize that the reverse is also true. That is, people might use large numbers of smaller units to describe things that feel closer and more probable and, on the other hand, might use smaller numbers of large units to describe things that feel far away and less probable. So if someone thinks a deadline is urgent, she might describe it as only 14 days away, instead of describing it as two weeks away. When designing surveying and analyzing people’s responses, it is important to understand how numbers can convey more than just quantitative information. As data scientists, we design research methods that capture not only data, but also the nuances of quantitative information.


Part 2: The NFL’s Talent Pool and Expansion

In our previous blog, we came to the conclusion that the if you average the ratios over each decade, we end up with an average talent pool of 9.62 million people per team, which is almost exactly the current ratio. Therefore, the fact that we have 32 teams right now means that the league’s expansion has merely kept pace with long-term growth. They aren’t overexpanding and they aren’t underexpanding in terms of keeping the talent pool constant.

Using an average talent pool of 9.627 million people per team to keep our on-field talent consistent, we see that new teams should be added each decade as the American population grows, as shown below.

 

Year

Population

Teams

Population Per Team

1920

106,021,537

13

8,155,503

1930

123,202,624

11

11,200,239

1940

132,164,569

10

13,216,457

1950

151,325,798

13

11,640,446

1960

179,323,175

21

8,539,199

1970

203,302,031

26

7,819,309

1980

226,542,199

28

8,090,793

1990

248,709,873

28

8,882,495

2000

281,421,906

31

9,078,126

2010

308,745,538

32

9,648,298

2020

341,387,000

35

9,627,086

2030

373,504,000

39

9,627,086

2040

405,655,000

42

9,627,086

2050

439,010,000

46

9,627,086

 

To keep the talent level consistent, we should add 3 new teams by 2020, 4 more new teams by 2030, 3 more teams by 2040, and 4 more teams by 2050.  This would add 14 new teams by 2050, giving us a 46-team league.

Now, where should those teams go?

No authoritative body develops state-level population projections for every state, so we cheated a little bit. We examined state populations in 2000 and 2010, and applied that growth rate to each subsequent decade. This allowed us to develop some very rough projections of the population of each state for the years 2020, 2030, 2040, and 2050, assuming that current growth rates persist over the next 40 years.  (This is unlikely, but come on, this is a blog post, not a dissertation.)

We then lumped the states (and Washington DC and Puerto Rico) into 16 regions to better account for regional fan bases. This conglomeration was arbitrary, but is likely not overly controversial.  For each of the sixteen regions, we then calculated the number of current teams and the number of teams that the region should have based on population, and added teams to the areas that were most underrepresented. This model therefore takes into account the number of teams already present, the current population, and population growth trends.

So what do we see as we look into our demographic crystal ball?  Well, a proper expansion of the NFL should be as follows, and we’ll let the NFL determine how to account for having an odd number of teams.

The list below includes the region that would get the new team, and the likely city location if the new franchise is placed in the largest metro area.

Year 2020 – Add three teams, one each in:

California – Los Angeles (Does this surprise anybody?)

The Great Plains (OK, KS, NE, SD, ND) – Oklahoma City is the largest metro area in the region.  This is probably a surprise, but this area has no team and a pretty good-sized population.  (And for you geographically impaired readers, the Kansas City Chiefs are located in Missouri, not Kansas.)

Texas – San Antonio is up, as the Cowboys lose a little of their fan base.

Year 2030 – Add four teams, one each in:

Texas again – It should go to Fort Worth based on demographics, but given their proximity to Dallas, would that happen?  If not, Austin would be the next city in line.

The desert SW (NV, UT, AZ, NM) – Las Vegas is the largest city without a team.  And can you imagine going to a road game in Vegas?  Talk about a home field advantage.

The Atlantic South (NC, SC, GA) – Raleigh is the largest city without a team.  Move over, Panthers; there’s a new team in the Carolinas.

The deep South (AR, AL, MS, LA) – Birmingham is the largest city with a team, and come on, this is a town that would support an NFL team.

Year 2040 – Add three teams, one each in:

The Coastal Pacific (AK, HI, WA, OR) – Portland is the obvious site based on size.

California – Riverside is the largest metro area without a team at this point, so it’s time for the Inland Empire to join the NFL.

The Atlantic South (NC, SC, GA) – Yep, another one.  This area is growing fast.  Columbia, SC, barely beats out Greensboro, NC for this coveted franchise.

2050 – Add four more teams, one each in:

Texas – At this point Texas will have five teams.  Fort Worth still deserves it if they want to part ways with the Cowboys, but if not, the team goes to El Paso.

The desert SW (NV, UT, AZ, NM) – It’s time for Salt Lake to get a clean-living team.  Perhaps they could play in the same division as the Las Vegas franchise?

The Atlantic South (NC, SC, GA) – Would you believe a third team in North Carolina?  If they can pull it off, Greensboro gets this team.

The Tropical South (FL, Puerto Rico) – San Juan, Puerto Rico, would be first in line. If you think the team must be in a state and not a territory, then it would go to Orlando, but maybe Puerto Rico will be a state by 2050 anyway.

Now, if you don’t think that Puerto Rico’s population should be included since it’s not technically a state, the last team would go NOT to the Tropical South, but up north in New England, where it would likely be awarded to Providence, Rhode Island. However, let’s assume that San Juan gets it.

So over the next 40 years, we would add 14 new teams:

Three in Texas:  San Antonio, Austin, and either Forth Worth or El Paso

Three in the Carolinas:  Raleigh, Columbia, and Greensboro

Two in California – Los Angeles and Riverside

Two in the Desert Southwest:  Las Vegas and Salt Lake City

One in the Great Plains – Oklahoma City

One in the Deep South:  Birmingham

One in the Coastal Pacific:  Portland

One in the Tropical South:  San Juan or perhaps Orlando

 

What do you think about this future?  Would you change your favorite team?  Or will you grouse about how the talent pool is being diluted?  We think the smart money is to start putting an ownership team together in Oklahoma City or Birmingham.

Read the first blog in this two part blog series on the expansion of the NFL.