Tag Archives: homepage

CartoDB Walk Through

In 2011, the BLS published a map of fatal workplace injuries by state.

Are these colors continuous or categorical? Should they be? Is anyone surprised that CA and TX have a lot of workplace injuries? How can we improve on this?

So we’ll recreate it:
BLS Fatality Data (csv)
2011 Population Estimates (via census.gov)

Combine those two in a spreadsheet, first. Use “Save As” to create a new spreadsheet so you don’t clobber your data.

Use =find() to confirm that state names match. They will until ~New York

Use =G10/E10 to get fatalities per capita, talk about scientific notation, then =(G10/E10)*100000 for per 100,000

Upload to CartoDB

Upload the CSV to CartoDB. Now you need a shapefile. Get a US State file from NationalAtlas.gov or NOAA or this much, much smaller file from GeoCommons. (Use the KML, the shapefile doesn’t have a projection)

GoogleMaps will only take KML files, CartoDB will take ShapeFiles or KML, but don’t use Safari — a shapefile is a bundle of database files, you don’t want to unzip it.


CartoDB used to make merging harder but they’ve worked a few things out.

Keep both open: what columns can we use to combine these two?

CartoDB has a good explanation of the join process. It isn’t actually straightforward.

UPDATE fatalities_per_capita
SET the_geom =  us_states.the_geom
FROM us_states
WHERE us_states.state = fatalities_per_capita.state

This works for me because my BLS data is in a table called fatalities_per_capita, my state boundaries are us_states, and in both tables the state names are in a column called “state.”


Use the wizard first. Bubbles for population, color for percapita.

What do methods mean? Learn This was a little buggy but might be fixed by now.

Put it on a web page

Where to Find Boundary Files

How to Geocode

If you need to transform addresses into lat/lon pairs, you have a couple of options:

  • Fusion Tables will do it, but their terms of service say you have to use that data on a Google Map.
  • Geocoder.us Will do one address at a time, or you can pay for a batch
  • CartoDB gives you a bunch free and you can pay for more.
  • More suggestions

Homework Week 2 (Due Feb 14)

Pitches for your first story are due next week, and you have two spreadsheet exercises to power through. And everyone needs to sign up for a Festival of Data slot.

Answer a handful of specific questions using data that Slate published alongside How Many People Have Been Killed by Guns Since Newtown? and I download the CDC’s data on firearm deaths and find something to say about those numbers. Continue reading Homework Week 2 (Due Feb 14)

Pivot Tables

Spreadsheets are Handy, but pivot tables are incredibly useful.

Wells by County

The Department of Environmental Conservation publishes data on gas wells in New York State. Download it: How many wells are there per county?

  1. Start with Data > Pivot Table Report — look at the cells Excel proposes to use. Does that include all of your data?
  2. Add Row — Use “COUNTY” for the rows. You should see a list of county names.
  3. Add Value — Use “API_WELLNO” for now.
  4. Check the formula — should excel count values or sum them? Or find an average?

And there you have it. More things to play with:

  • Try adding “SLANT” as a Column — horizontal (as opposed to vertical) wells are particularly controversial. Are there any concentrations of horizontal wells?
  • How would you work out how much money each county is collecting in permit fees?
  • Can you see any trends in the average permit fee in each county?

Coalition Casualties

Last semester, Matt Surrusco found iCasualties.org — NYT has a nice profile of Michael White who trolls through news sites and official releases to build out a database of coalition forces deaths. Start with http://icasualties.org/OEF/OEF_US_Fatalities.xls and pivot by “Country of Death” and “Place of Death.”

This data needs some cleanup — we’ll work on that next week.

We used a function: =YEAR() to find the year of each death. We also had to do format > cells … and select general to correct wonky display issues. If you right-click in your pivot table, you’ll see a “Refresh Data” option — you might need that if your year column is not showing up.

Some questions you could answer:

  1. What’s the most common age of death
  2. How many deaths, and at what age?
  3. What’s the most common age of death for members of the CIA? the Army?
  4. What rank and branch had the greatest number of casualties?

Asking Good Questions

Asking Good Questions

If you challenge yourself (and you should challenge yourself), you’re bound to get stuck. If you aren’t hitting walls and getting stuck, you aren’t trying hard enough. Technology is changing constantly, so learning how to ask the right questions and get help with new tools is probably more important than actually learning how to use any one tool well today. Continue reading Asking Good Questions