
I’m going to kick off a multi-part series on US Census data by offering a totally free download, in XLS or CSV format, of something strangely hard-to-Google: the 2010 US Census population by Zip code (technically, by ZCTA). Splitwise is offering these files free of charge and in the public domain, and I can’t believe how many other sites are charging for them!
But the difficulty I had in creating this data set and using the US Census website has inspired me to write a bit more about how to use one of the world’s most interesting open data sources.
The US Census data sets are incredibly valuable, despite their origin as a matter of mere political bookkeeping. Ancient astronomers watched the movements of the sky for the practical task of navigating ships, without knowing that Kepler and then Newton would use their recordings to discover the laws of gravity. In a similar way, while the US Census was created to turn the crank of representative democracy, its beautiful data sets have surely been the basis for countless demographic, civic, and business insights unforseen by the Census itself.
The US Census is both “big data” and “big science” – $13B for the 2010 Census (~$42 per capita), and the Census Bureau’s annual budget in 2012 was $1B. For comparison, the Large Hadron Collider at CERN, the world’s biggest Physics experiment, had a budget of $9B, and an annual operating cost of $1.1B in 2012.
Considering just the decennial census, and not any of the supplementary work, the summary tables alone contain 73,028 census tracts (not even the smallest geographic region in use) and 8940 different query variables, many as arcane as “P0410001, Concept P41: Grandchildren under 18 years living with grandparent householder.” Using US Census data can be very intimidating indeed, and my sense is that Census Bureau themselves, faced with a Herculean task, has only a limited understanding of what summary data products would be most useful to publish.
Luckily for normal people who don’t enjoy hunting through messy Excel with strange jargon, the US Census has at long last released a simple, lightweight, JSON API for pulling data out of these arcane databases. This makes using the data much, much more straightforward. (All that you have to do is untangle the jargon-filled API documentation.)
In a series of posts to follow, I will document my journey through the US Census data as a newcomer, and share the tools I used to make the data so much easier to work with. The posts will assume that you are “a normal data analyst or consultant,” by which I mean you are good at Excel and/or Google Docs, but don’t know much anything APIs and might not even know what an API is.
P.S. An FAQ on why the hell Splitwise is doing this
Q: Wait, what? I thought you made that bill-splitting app that my roommates and I use?
A: Well, yeah, but we’re also working on a new set of fairness calculators and needed to do some background research. And also, we’re just nerding out.
Thank you!
Belatedly, my pleasure! 🙂 This has gone so well, I can’t wait for post 2 now.
I am just now coming across this blog as I was searching for this type of Zip Code information… but I am not seeing where I can download the excel file????????
You’re right. I googled this for hours and could not find it. Glad I stumbled upon your link. Thanks a million!
My pleasure @Darrell. Sorry for not noticing these comments for a while 🙂 I’m grateful to others in the community for showing me how to use the API, not least of all the National Civic Day Of Hacking
Thanks a lot!!
My pleasure cbhutta!
Hi Jon – do you know how to pull out 2010 demographic information by zip code without pulling hair out of my own or other person’s heads?
@E – Sort of, yes. Some hair pulling is still kind of involved, but I am going to upload the Excel file to make it easier to replicate my queries for any particular variable you want to get at. Let me go post my second post this week…
-1′ or ‘3’=’3
Wait, what?
That looks like an attempt at a SQL injection attack 🙂
Cool data, though!!!
Thank you so much! Any way to get Population Density and unemployment quickly?
Well, only sort of easy. You are motivating me to get around to my next post!
The “quickest” was is only semi quick, but it’s so much better than hunting around the Census website that I feel obliged to write about it. Since you asked, I’ll try to use your population density and unemployment as examples – they are fairly common I’m sure yet not even super straight-forward, as both population density and unemployment both seem to only be found as derived quantities.
The method requires you to download another spreadsheet, lookup a variable or two and a series of commands, and then do a little math. Hold on tight! I’ll post a comment here and update this post when it’s ready.
Ok I posted the answer! It took me way longer than I thought, and I still don’t have a comprehensive post of my own, but here you go! https://blog.splitwise.com/2014/01/06/free-us-population-density-and-unemployment-rate-by-zip-code/
This post is awesome, you’re the best. Gonna remember that API – finding data on government websites make me cry. On a somewhat related note, I just started using Excellent Analytics to pull GA data to excel and it changed my life.
Cool, that sounds useful (sorry for the slow reply!). I keep meaning to post my full Census-puller excel sheet but documenting it is taking a lot of time, so I may just release it as a web-app instead.
Thank you for posting this – I scoured Google and couldn’t find much of anything other than your site. Cheers!
My pleasure – glad to be helpful Anthony!
Where are the Arcmap shapefiles to download?
Been searching for this since the last four hours… census website is extremely confusing…
Thank you so much!
Is it possible to find the demographic data (gender, age, ethnicity etc) ?
Can’t thank you enough for this data. Spent hours looking for it!
So glad, my pleasure!
Good lord. Thank you so much.
Unfortunately, ZCTA does not correspond to ZIP code. “The Census Bureau did not produce a 2000 ZIP Code to 2000 ZCTA relationship file. We created the ZCTAs specifically to address the inadequacies of ZIP Codes for census data tabulation.” (from http://www.census.gov/geo/reference/zctafaq.html)
Thanks for any other informative website. Where else
coul I get thqt type of innformation written in such an ideal means?
I’ve a challenge that I am just now running on, and I’ve been at the glance out
for such information.
Nice Statistics
Thank you!
Hi guys,
did you figure out how to get the population demographic by zip code? I got it by county but It’s not that precise for my project.
Thanks,
J
This is great information. Do you know how to pull data on zip codes below 01000 (e.g. Boston)? Thanks, Kyle
What about the zip codes that are not represented? Are they population of zero?
looking for this info as well… zip codes that start with 00… Thanks for what youve compiled.. Any help would be appreciated.
Asking questions arre genuinely pleasant thing if you are not unnderstanding anything
totally, however this post provides pleasant understanding yet.
Hello! Thank you for building the csv. I have a few questions regarding the data: I have found 103 entries of zip code where it matches multiple populations, such as zip code ‘02861’, ‘03579’ and so on. Can you please explain the difference between the two population value? Thank you again!
Likewise
Thanks for the database! Would you explain what your methodology was gaining this insight?
I am asking since Census mentions they have not provided any correlation between Zip Codes and ZCTAs. I tried to figure out a way by correlating ZCTA-County-Zip Code, but it is not going to work.
Best,
two questions – how do you use Splitwise to get the census data and is there a place to find the 2000 data by zip code – working on a project where I need to compare 2000 to 2010
Why do some zipcodes have duplicates with different population measures? See Zipcode 42223
I also noticed the duplicates…
These numbers are incorrect.
Came across this post first, but then found the 2010 Census Gazetteer Files– these seem to contain exactly what you’re looking for!
http://www.census.gov/geo/maps-data/data/gazetteer2010.html
See the last tab for ZCTAs. It has longitude and latitude too!
thank you
Thank you for this, and the “US Population Density And Unemployment Rate By Zip Code”. I noticed that you have a broken link at the bottom of that later post though, specificly where you link to the slides for the national civics day of hacking… the new link should be the following http://www.census.gov/newsroom/releases/pdf/NDoCH_FINAL.pdf
Thanks for sharing this post i was searching data which is relevant in this post.
The data has errors. There are 103 zips that appear multiple times in the data. For example: 02861 and 03579
I believe the issues is that zip codes can cross state lines.
Is the population data in 1000s ?
Almost 4 years later and this post is still helping people! Tried to figure out the census site for what I thought would be an easy ask. Glad I found this site!
thanks
very good
Thanks for sharing it. I was looking for it for my researching work.
Would really like to see it at the Zip+4 level of granularity, if that is even possible.
You have provided very good information and it is very useful.
Thank you so much for this
This is a gift that keeps on giving…..Thanks!!
how can i find a city and state for specific zip code (forget google)
can that information be extracted from thse databases ?
how can i find a city and state for specific zip code (forget google)
can that information be extracted from thse databases ? for financial help contact me via milton.norris4@gmail.com
Can you give me the query that generated the CSV file? Thanks!
is there a way to download all census data with zipcodes?