Sunday Data Blogging (08/26/07) - Employment

While digging around at work looking for something unrelated, I stumbled across this data table (PDF) from the U.S. Small Business Administration (data courtesy of the U.S. Census) and I thought it was pretty fascinating so I thought I would share. It tabulates total employment, payroll and number of firms for businesses of various sizes (more data here). Here's a plot of total employment for different-sized businesses (from 2004, in 23 bins of varying sizes, plotted on a log-scale).

First off: binning data in uneven bins (i.e. comparing 5-9 against 400-499, or whatnot) seriously distorts the distribution and is a no-no. Total employment from the largest firms dominates everything else, which may or may not be the case. I couldn't find the raw data to do it better, but we can at least plot the cumulative distribution of the given data, which solves the binning problem to a certain extent:

This is a much more useful plot. From this we can see that the 5.8 million businesses with less than 100 employees (which I would colloquially define as "small") account for about one-third of the employment in this country. Similarly, businesses with more than 2,500 employees (which I would colloquially define as "gargantuan" and of which there are only 3,500 nationally) are also about one-third of employment. Again, even the cumulative plot isn't perfect in this case; it would be better to break that final >2500 data point down even further to see what the trend really looks like. Still, most of us work for pretty big companies, and only a really small portion of the workforce works in Mom 'n' Pop joints.

For me, the plot raises the question: what is the socially optimal distribution of employment? And what did this plot look like historically? In a lot of ways, smaller is better when it comes to creating healthy communities (or at least that's my bias), so what policies can we implement that might foster more smaller, community-friendly businesses and fewer corporate behemoths?

I don't know, but I thought it was worth thinking about. Yay data!

