Festival of Data – Keith

USA Today – The Ones That Get Away


If you scroll down a little bit you can find an interactive map of arrest warrant data. I really couldn’t figure out how I felt about this visualization, but I wanted to talk about it.

There appears to be some excellent data here and the drilldown is pretty slick. The information compliments the piece excellently and there is tons of data to play with. However, the starting point is blank and that really bugged me.

It’s just this empty slate until you input some information into it and I think this visualization loses a great deal of impact because of this. Could just be me.

Festival of Data – Reyna

Create your own personal dialect map! http://www.nytimes.com/interactive/2013/12/20/sunday-review/dialect-quiz-map.html The Times had this interactive graphic that allows you to answer a total of 25 questions to see how the way you speak is related to where you are from.

The first page you are brought to is the first question of the quiz, and when you see there are 25 total questions that can seem daunting…but after each question you answer, a small heat map appears that shows how your last answer compares nationally. Instant gratification is good! It’s also interesting to see what the possible answers are for each question, like what other expressions people might have to refer to rainfall while the sun is still shining—the wolf is giving birth?

At the conclusion of the quiz you get your own dialect map, here’s my result http://nyti.ms/1qmY4ho. It’s interesting b/c, yes, I’m from a suburb of Los Angeles. I also like that you can view the 3 cities that your dialect is least similar to.

For me the best part of this interactive graphic is simply the novelty of it—one of the most personalized interactive maps I’ve come across. But I don’t know how revealing/insightful the information is. It would probably have more of an effect on someone who realizes that the way they speak is completely different than the majority of people from where they are from.

CSVkit Walkthrough

Pre-reading: Installing CSVkit and Command Line Basics

We used CSVkit to whittle NYC property records down to manageable pieces. Take another stab (and think about how this might help with, say, 311 call data. Or DOB records.

I want to walk through a chunk of data that I helped someone in Tim’s class manipulate.


NYC’s Department of City Planning publishes incredibly useful property maps of NYC. Not for nothing, these are available on Socrata, but if you find them on NYC Open Data you’re way (way) better off going back to the agency that provides the data. Among other things, City Planning provides clear context for their data.

Today the link for the most up to date MapPLUTO data is http://www.nyc.gov/html/dcp/html/bytes/dwn_pluto_mappluto.shtml but that may change. Download the CSV format. You’ll see why in a moment.

Getting unstuck

controlc is the “kill” command — it will stop the current process. So if you lose your command prompt or you run something that is taking longer than it should, controlc will set you right. Keep in mind, however, that we chose the very smallest file to work on in class. The other four boroughs have more buildings, bigger data.

Getting Around

We all set up our computers so that we can open a terminal in any folder, just from the context (right click) menu, but you can also use pwd to see exactly where you are and cd ... to move up or down the folder tree. I recommend Zed Shaw if that’s sticky.

We also played with tab completion, and used * as a wildcard.

And, we used du -h ./* to check the sizes of our files.

Use wc and wc -l to get wordcounts of a file.

Using CSVkit

View column names with csvcut -n MN.csv

Search for a particular column by piping the output of that command to grep: csvcut -n MN.csv | grep Own

Find the column numbers for these columns:

  • LandUse
  • OwnerType
  • ZoneDist1-4
  • AssessTot
  • ExemptTot
  • Council
  • ZipCode
  • Address
  • CD
  • Lot
  • Block
  • XCoord
  • YCoord

Use csvcut -c 2,3,4 MN.csv to print columns 2, 3 and 4 to stdout. Challenge yourself: write the command to produce all of the columns we need.

Use a > to redirect csvcut‘s output from stdout to a new file:

csvcut -c 2,3,4 MN.csv > smaller_MN.csv

Remember: that isn’t a complete list of columns!

Use csvgrep to search for a specific value (11) in the land use column (in my example, it’s column 12:

csvgrep -c 12 -m 11 smaller_MN.csv > vacant_MN.csv

And then count the lines in your resulting file: wc -l vacant_MN.csv

To Do: Prep for May Jobs Report

Follow the coverage of the [latest jobs report](http://www.bls.gov/schedule/news_release/empsit.htm). Look back at the New York Times and Wall Street Journal coverage of jobs reports this winter. Take a look at the BLS data that we released last Friday and the figures that are being reported in the news and connect the dots. Come to class on April 11 with questions. Use the comments here to discuss questions.

Some reporting on the December jobs report, released [January 10, 2014](http://www.bls.gov/news.release/empsit.nr0.htm): [Wall Street Journal coverage](http://online.wsj.com/news/articles/SB10001424052702303848104579312263978373456) /[graphics](http://online.wsj.com/news/interactive/ECONOMYJP0111?ref=SB10001424052702303848104579312263978373456)), [NYTimes coverage](http://economix.blogs.nytimes.com/category/jobs-report-2/)